Hi, I see that the ExtractText processor extracts text using regex.
What about a processor that extracts text and metadata from incoming files? That doesn't seem to exist - but perhaps I didn't quite look in the right spots. If that doesn't exist I'd like to implement and commit it, using Apache Tika. There may also be a couple of related processors to that. Thoughts? Thanks, - Dmitry
