Hi Tom, On Friday, April 3, 2015, Tom Barber <[email protected]> wrote:
> Hello Chaps and Chapesses, > > Somehow I've come this far and not done it but I was playing around with > the crawler for my ApacheCon demo and came across the > TikaCmdLineMetExtractor that Rishi I believe wrote a while ago. > So I've put some stuff in a folder and can crawl and ingest it using the > GenericFile element map, now in the past to map metadata I've written some > class to pump the data around and add to that file, To what file ? > but I was wondering if, as I know what fields are coming out of Tika to > just put them into the XML mapping file somehow so I can by pass having to > write Java code? Well Tika will make best effort to pull out as much metadata as possible. Chris explains a good bit about this here https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help I think that if custom extractions are required... You could most likely extend the extractor interface and implement it but... This is Java code which I assume you are trying to work around? > This may be very obvious in which case I apologise but I can't find owt on > the wiki so I figured I'd ask the gurus. > > -- *Lewis*
