Doesn’t look weird. Hmm. Can you generate a metadata file using TikaCmdLine extractor and then use that metadata file to ingest into File Manager by hand? Does that work?
— Chris Mattmann chris.mattm...@gmail.com -----Original Message----- From: Tom Barber <tom.bar...@meteorite.bi> Reply-To: <dev@oodt.apache.org> Date: Monday, November 23, 2015 at 7:43 AM To: "dev@oodt.apache.org" <dev@oodt.apache.org> Subject: Re: Crawling / Archiving binary data with Solr backend >Author: Alun Davis - Loudmouth >Content-Length: 3273160 >Content-Type: audio/mpeg >X-Parsed-By: org.apache.tika.parser.DefaultParser >X-TIKA:digest:MD5: 5f374012180e94778346619515152f74 >X-TIKA:digest:SHA256: >34d8bf9da8feb848922138eb7807c0d71ed92376422fb28c8cbbffe788574ab0 >channels: 2 >creator: Alun Davis - Loudmouth >dc:creator: Alun Davis - Loudmouth >dc:title: Teenage Baghead >meta:author: Alun Davis - Loudmouth >resourceName: Teenage Baghead.mp3 >samplerate: 44100 >title: Teenage Baghead >version: MPEG 3 Layer III Version 1 >xmpDM:album: >xmpDM:artist: Alun Davis - Loudmouth >xmpDM:audioChannelType: Stereo >xmpDM:audioCompressor: MP3 >xmpDM:audioSampleRate: 44100 >xmpDM:duration: 204577.046875 >xmpDM:genre: Pop >xmpDM:logComment: www.maimthattune.com for more! >xmpDM:releaseDate: 2001 > > >Nothing that should scare a parser in the mp3 at least. > >On Mon, Nov 23, 2015 at 3:33 PM, Chris Mattmann <chris.mattm...@gmail.com> >wrote: > >> yeah check the metadata. Any weird UTF-8 encoding? >> >> (aka run tika on the file outside of OODT what do you see?) >> >> — >> Chris Mattmann >> chris.mattm...@gmail.com >> >> >> >> >> >> >> -----Original Message----- >> From: Tom Barber <tom.bar...@meteorite.bi> >> Reply-To: <dev@oodt.apache.org> >> Date: Monday, November 23, 2015 at 7:23 AM >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> Subject: Re: Crawling / Archiving binary data with Solr backend >> >> >./crawler/bin/crawler_launcher --filemgrUrl http://localhost:9000 >> >--operation --launchMetCrawler --clientTransferer >> >org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >> >--productPath $OODT_HOME/data/staging --metExtractor >> >org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor >> >--metExtractorConfig >>/home/bugg/Projects/surrey100/oodt/data/met/tika.conf >> > >> >I'm running that. Which runs fine with the default lucene stuff, also >>runs >> >fine with a txt file, but doesn't run fine over a random picture I >>took or >> >over an mp3 I tested it on. >> > >> > >> >On Mon, Nov 23, 2015 at 3:12 PM, Mattmann, Chris A (3980) < >> >chris.a.mattm...@jpl.nasa.gov> wrote: >> > >> >> Encoding issues with the extracted metadata? What are you getting >> >> just running Tika on the files? >> >> >> >> The actual data shouldn’t matter since it’s not being ingested >> >> (are you doing it in place, or what data transferer are you using)? >> >> >> >> Cheers, >> >> Chris >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Chris Mattmann, Ph.D. >> >> Chief Architect >> >> Instrument Software and Science Data Systems Section (398) >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> Office: 168-519, Mailstop: 168-527 >> >> Email: chris.a.mattm...@nasa.gov >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> Adjunct Associate Professor, Computer Science Department >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> From: Tom Barber <tom.bar...@meteorite.bi> >> >> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> >> Date: Monday, November 23, 2015 at 6:36 AM >> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> >> Subject: Crawling / Archiving binary data with Solr backend >> >> >> >> >Hello, >> >> > >> >> >Looks like I've never tried it before with binary data. If I swap >>the >> >> >filemgr defaults to use solr then try and crawl my staging directory >> >>using >> >> >the Tika extractor I get a lot of >> >> > >> >> >org.apache.xmlrpc.XmlRpcException: java.lang.Exception: >> >> >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: >>Error >> >> >ingesting product >> >>[org.apache.oodt.cas.filemgr.structs.Product@62b19476] >> >> : >> >> >null >> >> >at >> >> >> >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpc >>>>>Cl >> >>>ie >> >> >ntResponseProcessor.java:104) >> >> >at >> >> >> >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcC >>>>>li >> >>>en >> >> >tResponseProcessor.java:71) >> >> >at >> >> >> >>>>>org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:7 >>>>>3) >> >> > >> >> > >> >> >Type things. >> >> > >> >> >Any ideas? >> >> > >> >> >Tom >> >> >> >> >> >> >>