Re: Tika Based Metadata Extraction

Lewis John Mcgibbney Fri, 03 Apr 2015 10:02:35 -0700

Hi Tom,

On Friday, April 3, 2015, Tom Barber <[email protected]> wrote:


> Hello Chaps and Chapesses,
>
> Somehow I've come this far and not done it but I was playing around with
> the crawler for my ApacheCon demo and came across the
> TikaCmdLineMetExtractor that Rishi I believe wrote a while ago.
> So I've put some stuff in a folder and can crawl and ingest it using the
> GenericFile element map, now in the past to map metadata I've written some
> class to pump the data around and add to that file,


To what file ?


> but I was wondering if, as I know what fields are coming out of Tika to
> just put them into the XML mapping file somehow so I can by pass having to
> write Java code?


Well Tika will make best effort to pull out as much metadata as possible.
Chris explains a good bit about this here

 https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help

I think that if custom extractions are required... You could most likely
extend the extractor interface and implement it but... This is Java code
which I assume you are trying to work around?


> This may be very obvious in which case I apologise but I can't find owt on
> the wiki so I figured I'd ask the gurus.
>
>



-- 
*Lewis*

Re: Tika Based Metadata Extraction

Reply via email to