Hi,

I see that the ExtractText processor extracts text using regex.

What about a processor that extracts text and metadata from incoming
files?  That doesn't seem to exist - but perhaps I didn't quite look in the
right spots.

If that doesn't exist I'd like to implement and commit it, using Apache
Tika.  There may also be a couple of related processors to that.

Thoughts?

Thanks,
- Dmitry

Reply via email to