Dumping a .met file and calling the filemgr client ingest routine works fine, so its something either broken or i'm doing wrong in the crawler it appears.
Tom On Mon, Nov 23, 2015 at 3:45 PM, Tom Barber <tom.bar...@meteorite.bi> wrote: > I'll give it a go. Thanks. > > On Mon, Nov 23, 2015 at 3:44 PM, Chris Mattmann <chris.mattm...@gmail.com> > wrote: > >> Doesn’t look weird. Hmm. Can you generate a metadata file >> using TikaCmdLine extractor and then use that metadata file >> to ingest into File Manager by hand? Does that work? >> >> — >> Chris Mattmann >> chris.mattm...@gmail.com >> >> >> >> >> >> >> -----Original Message----- >> From: Tom Barber <tom.bar...@meteorite.bi> >> Reply-To: <dev@oodt.apache.org> >> Date: Monday, November 23, 2015 at 7:43 AM >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> Subject: Re: Crawling / Archiving binary data with Solr backend >> >> >Author: Alun Davis - Loudmouth >> >Content-Length: 3273160 >> >Content-Type: audio/mpeg >> >X-Parsed-By: org.apache.tika.parser.DefaultParser >> >X-TIKA:digest:MD5: 5f374012180e94778346619515152f74 >> >X-TIKA:digest:SHA256: >> >34d8bf9da8feb848922138eb7807c0d71ed92376422fb28c8cbbffe788574ab0 >> >channels: 2 >> >creator: Alun Davis - Loudmouth >> >dc:creator: Alun Davis - Loudmouth >> >dc:title: Teenage Baghead >> >meta:author: Alun Davis - Loudmouth >> >resourceName: Teenage Baghead.mp3 >> >samplerate: 44100 >> >title: Teenage Baghead >> >version: MPEG 3 Layer III Version 1 >> >xmpDM:album: >> >xmpDM:artist: Alun Davis - Loudmouth >> >xmpDM:audioChannelType: Stereo >> >xmpDM:audioCompressor: MP3 >> >xmpDM:audioSampleRate: 44100 >> >xmpDM:duration: 204577.046875 >> >xmpDM:genre: Pop >> >xmpDM:logComment: www.maimthattune.com for more! >> >xmpDM:releaseDate: 2001 >> > >> > >> >Nothing that should scare a parser in the mp3 at least. >> > >> >On Mon, Nov 23, 2015 at 3:33 PM, Chris Mattmann < >> chris.mattm...@gmail.com> >> >wrote: >> > >> >> yeah check the metadata. Any weird UTF-8 encoding? >> >> >> >> (aka run tika on the file outside of OODT what do you see?) >> >> >> >> — >> >> Chris Mattmann >> >> chris.mattm...@gmail.com >> >> >> >> >> >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> From: Tom Barber <tom.bar...@meteorite.bi> >> >> Reply-To: <dev@oodt.apache.org> >> >> Date: Monday, November 23, 2015 at 7:23 AM >> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> >> Subject: Re: Crawling / Archiving binary data with Solr backend >> >> >> >> >./crawler/bin/crawler_launcher --filemgrUrl http://localhost:9000 >> >> >--operation --launchMetCrawler --clientTransferer >> >> >org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >> >> >--productPath $OODT_HOME/data/staging --metExtractor >> >> >org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor >> >> >--metExtractorConfig >> >>/home/bugg/Projects/surrey100/oodt/data/met/tika.conf >> >> > >> >> >I'm running that. Which runs fine with the default lucene stuff, also >> >>runs >> >> >fine with a txt file, but doesn't run fine over a random picture I >> >>took or >> >> >over an mp3 I tested it on. >> >> > >> >> > >> >> >On Mon, Nov 23, 2015 at 3:12 PM, Mattmann, Chris A (3980) < >> >> >chris.a.mattm...@jpl.nasa.gov> wrote: >> >> > >> >> >> Encoding issues with the extracted metadata? What are you getting >> >> >> just running Tika on the files? >> >> >> >> >> >> The actual data shouldn’t matter since it’s not being ingested >> >> >> (are you doing it in place, or what data transferer are you using)? >> >> >> >> >> >> Cheers, >> >> >> Chris >> >> >> >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> Chris Mattmann, Ph.D. >> >> >> Chief Architect >> >> >> Instrument Software and Science Data Systems Section (398) >> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >> >> Office: 168-519, Mailstop: 168-527 >> >> >> Email: chris.a.mattm...@nasa.gov >> >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> Adjunct Associate Professor, Computer Science Department >> >> >> University of Southern California, Los Angeles, CA 90089 USA >> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -----Original Message----- >> >> >> From: Tom Barber <tom.bar...@meteorite.bi> >> >> >> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> >> >> Date: Monday, November 23, 2015 at 6:36 AM >> >> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> >> >> Subject: Crawling / Archiving binary data with Solr backend >> >> >> >> >> >> >Hello, >> >> >> > >> >> >> >Looks like I've never tried it before with binary data. If I swap >> >>the >> >> >> >filemgr defaults to use solr then try and crawl my staging >> directory >> >> >>using >> >> >> >the Tika extractor I get a lot of >> >> >> > >> >> >> >org.apache.xmlrpc.XmlRpcException: java.lang.Exception: >> >> >> >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: >> >>Error >> >> >> >ingesting product >> >> >>[org.apache.oodt.cas.filemgr.structs.Product@62b19476] >> >> >> : >> >> >> >null >> >> >> >at >> >> >> >> >> >> >> >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpc >> >>>>>Cl >> >> >>>ie >> >> >> >ntResponseProcessor.java:104) >> >> >> >at >> >> >> >> >> >> >> >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcC >> >>>>>li >> >> >>>en >> >> >> >tResponseProcessor.java:71) >> >> >> >at >> >> >> >> >> >> >> >>>>>org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:7 >> >>>>>3) >> >> >> > >> >> >> > >> >> >> >Type things. >> >> >> > >> >> >> >Any ideas? >> >> >> > >> >> >> >Tom >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >