filed jira, i'll finish my UI and workflow off for wednesday then circle back to it when I have 10 minutes to debug and see if its a quick fix/config issue. Looks like its failing to decode binary data though to me.
Tom On Mon, Nov 23, 2015 at 7:18 PM, Tom Barber <tom.bar...@meteorite.bi> wrote: > Booooo > > On Mon, Nov 23, 2015 at 5:09 PM, Chris Mattmann <chris.mattm...@gmail.com> > wrote: > >> yep, agreed. >> >> — >> Chris Mattmann >> chris.mattm...@gmail.com >> >> >> >> >> >> >> -----Original Message----- >> From: Tom Barber <tom.bar...@meteorite.bi> >> Reply-To: <dev@oodt.apache.org> >> Date: Monday, November 23, 2015 at 9:06 AM >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> Subject: Re: Crawling / Archiving binary data with Solr backend >> >> >Dumping a .met file and calling the filemgr client ingest routine works >> >fine, so its something either broken or i'm doing wrong in the crawler it >> >appears. >> > >> >Tom >> > >> >On Mon, Nov 23, 2015 at 3:45 PM, Tom Barber <tom.bar...@meteorite.bi> >> >wrote: >> > >> >> I'll give it a go. Thanks. >> >> >> >> On Mon, Nov 23, 2015 at 3:44 PM, Chris Mattmann >> >><chris.mattm...@gmail.com> >> >> wrote: >> >> >> >>> Doesn’t look weird. Hmm. Can you generate a metadata file >> >>> using TikaCmdLine extractor and then use that metadata file >> >>> to ingest into File Manager by hand? Does that work? >> >>> >> >>> — >> >>> Chris Mattmann >> >>> chris.mattm...@gmail.com >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> -----Original Message----- >> >>> From: Tom Barber <tom.bar...@meteorite.bi> >> >>> Reply-To: <dev@oodt.apache.org> >> >>> Date: Monday, November 23, 2015 at 7:43 AM >> >>> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> >>> Subject: Re: Crawling / Archiving binary data with Solr backend >> >>> >> >>> >Author: Alun Davis - Loudmouth >> >>> >Content-Length: 3273160 >> >>> >Content-Type: audio/mpeg >> >>> >X-Parsed-By: org.apache.tika.parser.DefaultParser >> >>> >X-TIKA:digest:MD5: 5f374012180e94778346619515152f74 >> >>> >X-TIKA:digest:SHA256: >> >>> >34d8bf9da8feb848922138eb7807c0d71ed92376422fb28c8cbbffe788574ab0 >> >>> >channels: 2 >> >>> >creator: Alun Davis - Loudmouth >> >>> >dc:creator: Alun Davis - Loudmouth >> >>> >dc:title: Teenage Baghead >> >>> >meta:author: Alun Davis - Loudmouth >> >>> >resourceName: Teenage Baghead.mp3 >> >>> >samplerate: 44100 >> >>> >title: Teenage Baghead >> >>> >version: MPEG 3 Layer III Version 1 >> >>> >xmpDM:album: >> >>> >xmpDM:artist: Alun Davis - Loudmouth >> >>> >xmpDM:audioChannelType: Stereo >> >>> >xmpDM:audioCompressor: MP3 >> >>> >xmpDM:audioSampleRate: 44100 >> >>> >xmpDM:duration: 204577.046875 >> >>> >xmpDM:genre: Pop >> >>> >xmpDM:logComment: www.maimthattune.com for more! >> >>> >xmpDM:releaseDate: 2001 >> >>> > >> >>> > >> >>> >Nothing that should scare a parser in the mp3 at least. >> >>> > >> >>> >On Mon, Nov 23, 2015 at 3:33 PM, Chris Mattmann < >> >>> chris.mattm...@gmail.com> >> >>> >wrote: >> >>> > >> >>> >> yeah check the metadata. Any weird UTF-8 encoding? >> >>> >> >> >>> >> (aka run tika on the file outside of OODT what do you see?) >> >>> >> >> >>> >> — >> >>> >> Chris Mattmann >> >>> >> chris.mattm...@gmail.com >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> -----Original Message----- >> >>> >> From: Tom Barber <tom.bar...@meteorite.bi> >> >>> >> Reply-To: <dev@oodt.apache.org> >> >>> >> Date: Monday, November 23, 2015 at 7:23 AM >> >>> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> >>> >> Subject: Re: Crawling / Archiving binary data with Solr backend >> >>> >> >> >>> >> >./crawler/bin/crawler_launcher --filemgrUrl >> >>>http://localhost:9000 >> >>> >> >--operation --launchMetCrawler --clientTransferer >> >>> >> >> >org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory >> >>> >> >--productPath $OODT_HOME/data/staging --metExtractor >> >>> >> >org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor >> >>> >> >--metExtractorConfig >> >>> >>/home/bugg/Projects/surrey100/oodt/data/met/tika.conf >> >>> >> > >> >>> >> >I'm running that. Which runs fine with the default lucene stuff, >> >>>also >> >>> >>runs >> >>> >> >fine with a txt file, but doesn't run fine over a random picture I >> >>> >>took or >> >>> >> >over an mp3 I tested it on. >> >>> >> > >> >>> >> > >> >>> >> >On Mon, Nov 23, 2015 at 3:12 PM, Mattmann, Chris A (3980) < >> >>> >> >chris.a.mattm...@jpl.nasa.gov> wrote: >> >>> >> > >> >>> >> >> Encoding issues with the extracted metadata? What are you >> getting >> >>> >> >> just running Tika on the files? >> >>> >> >> >> >>> >> >> The actual data shouldn’t matter since it’s not being ingested >> >>> >> >> (are you doing it in place, or what data transferer are you >> >>>using)? >> >>> >> >> >> >>> >> >> Cheers, >> >>> >> >> Chris >> >>> >> >> >> >>> >> >> >> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> >> Chris Mattmann, Ph.D. >> >>> >> >> Chief Architect >> >>> >> >> Instrument Software and Science Data Systems Section (398) >> >>> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> >>> >> >> Office: 168-519, Mailstop: 168-527 >> >>> >> >> Email: chris.a.mattm...@nasa.gov >> >>> >> >> WWW: http://sunset.usc.edu/~mattmann/ >> >>> >> >> >> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> >> Adjunct Associate Professor, Computer Science Department >> >>> >> >> University of Southern California, Los Angeles, CA 90089 USA >> >>> >> >> >> >>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> >> >>> >> >> -----Original Message----- >> >>> >> >> From: Tom Barber <tom.bar...@meteorite.bi> >> >>> >> >> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> >>> >> >> Date: Monday, November 23, 2015 at 6:36 AM >> >>> >> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> >> >>> >> >> Subject: Crawling / Archiving binary data with Solr backend >> >>> >> >> >> >>> >> >> >Hello, >> >>> >> >> > >> >>> >> >> >Looks like I've never tried it before with binary data. If I >> >>>swap >> >>> >>the >> >>> >> >> >filemgr defaults to use solr then try and crawl my staging >> >>> directory >> >>> >> >>using >> >>> >> >> >the Tika extractor I get a lot of >> >>> >> >> > >> >>> >> >> >org.apache.xmlrpc.XmlRpcException: java.lang.Exception: >> >>> >> >> >> >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: >> >>> >>Error >> >>> >> >> >ingesting product >> >>> >> >>[org.apache.oodt.cas.filemgr.structs.Product@62b19476] >> >>> >> >> : >> >>> >> >> >null >> >>> >> >> >at >> >>> >> >> >> >>> >> >> >>> >> >>> >> >> >>>>>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(Xml >> >>>>>>>>Rpc >> >>> >>>>>Cl >> >>> >> >>>ie >> >>> >> >> >ntResponseProcessor.java:104) >> >>> >> >> >at >> >>> >> >> >> >>> >> >> >>> >> >>> >> >> >>>>>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlR >> >>>>>>>>pcC >> >>> >>>>>li >> >>> >> >>>en >> >>> >> >> >tResponseProcessor.java:71) >> >>> >> >> >at >> >>> >> >> >> >>> >> >> >>> >> >>> >> >> >>>>>>>>org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.jav >> >>>>>>>>a:7 >> >>> >>>>>3) >> >>> >> >> > >> >>> >> >> > >> >>> >> >> >Type things. >> >>> >> >> > >> >>> >> >> >Any ideas? >> >>> >> >> > >> >>> >> >> >Tom >> >>> >> >> >> >>> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >>> >> >>> >> >> >> >> >> >