Re: Text and metadata extraction processor

Joe Witt Thu, 24 Mar 2016 08:38:26 -0700

Dmitry,

Another community member (Joe Skora) has a PR outstanding for
extracting metadata from media files using Tika.  Perhaps it makes
sense to broaden that to in general extract what Tika can find.  Joe -
perhaps you can discuss your ideas with Dmitry and see if broadening
is a good idea or if rather domain specific ones make more sense.


This concept of extracting metadata from documents/text files, etc..
using something like Tika is certainly useful as that then can drive
nice automated routing decisions.

Thanks
Joe

On Thu, Mar 24, 2016 at 9:28 AM, Dmitry Goldenberg
<[email protected]> wrote:
> Hi,
>
> I see that the ExtractText processor extracts text using regex.
>
> What about a processor that extracts text and metadata from incoming
> files?  That doesn't seem to exist - but perhaps I didn't quite look in the
> right spots.
>
> If that doesn't exist I'd like to implement and commit it, using Apache
> Tika.  There may also be a couple of related processors to that.
>
> Thoughts?
>
> Thanks,
> - Dmitry

Re: Text and metadata extraction processor

Reply via email to