I'll give it a go. Thanks. On Mon, Nov 23, 2015 at 3:44 PM, Chris Mattmann <chris.mattm...@gmail.com> wrote:
> Doesn’t look weird. Hmm. Can you generate a metadata file > using TikaCmdLine extractor and then use that metadata file > to ingest into File Manager by hand? Does that work? > > — > Chris Mattmann > chris.mattm...@gmail.com > > > > > > > -----Original Message----- > From: Tom Barber <tom.bar...@meteorite.bi> > Reply-To: <dev@oodt.apache.org> > Date: Monday, November 23, 2015 at 7:43 AM > To: "dev@oodt.apache.org" <dev@oodt.apache.org> > Subject: Re: Crawling / Archiving binary data with Solr backend > > >Author: Alun Davis - Loudmouth > >Content-Length: 3273160 > >Content-Type: audio/mpeg > >X-Parsed-By: org.apache.tika.parser.DefaultParser > >X-TIKA:digest:MD5: 5f374012180e94778346619515152f74 > >X-TIKA:digest:SHA256: > >34d8bf9da8feb848922138eb7807c0d71ed92376422fb28c8cbbffe788574ab0 > >channels: 2 > >creator: Alun Davis - Loudmouth > >dc:creator: Alun Davis - Loudmouth > >dc:title: Teenage Baghead > >meta:author: Alun Davis - Loudmouth > >resourceName: Teenage Baghead.mp3 > >samplerate: 44100 > >title: Teenage Baghead > >version: MPEG 3 Layer III Version 1 > >xmpDM:album: > >xmpDM:artist: Alun Davis - Loudmouth > >xmpDM:audioChannelType: Stereo > >xmpDM:audioCompressor: MP3 > >xmpDM:audioSampleRate: 44100 > >xmpDM:duration: 204577.046875 > >xmpDM:genre: Pop > >xmpDM:logComment: www.maimthattune.com for more! > >xmpDM:releaseDate: 2001 > > > > > >Nothing that should scare a parser in the mp3 at least. > > > >On Mon, Nov 23, 2015 at 3:33 PM, Chris Mattmann <chris.mattm...@gmail.com > > > >wrote: > > > >> yeah check the metadata. Any weird UTF-8 encoding? > >> > >> (aka run tika on the file outside of OODT what do you see?) > >> > >> — > >> Chris Mattmann > >> chris.mattm...@gmail.com > >> > >> > >> > >> > >> > >> > >> -----Original Message----- > >> From: Tom Barber <tom.bar...@meteorite.bi> > >> Reply-To: <dev@oodt.apache.org> > >> Date: Monday, November 23, 2015 at 7:23 AM > >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> > >> Subject: Re: Crawling / Archiving binary data with Solr backend > >> > >> >./crawler/bin/crawler_launcher --filemgrUrl http://localhost:9000 > >> >--operation --launchMetCrawler --clientTransferer > >> >org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory > >> >--productPath $OODT_HOME/data/staging --metExtractor > >> >org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor > >> >--metExtractorConfig > >>/home/bugg/Projects/surrey100/oodt/data/met/tika.conf > >> > > >> >I'm running that. Which runs fine with the default lucene stuff, also > >>runs > >> >fine with a txt file, but doesn't run fine over a random picture I > >>took or > >> >over an mp3 I tested it on. > >> > > >> > > >> >On Mon, Nov 23, 2015 at 3:12 PM, Mattmann, Chris A (3980) < > >> >chris.a.mattm...@jpl.nasa.gov> wrote: > >> > > >> >> Encoding issues with the extracted metadata? What are you getting > >> >> just running Tika on the files? > >> >> > >> >> The actual data shouldn’t matter since it’s not being ingested > >> >> (are you doing it in place, or what data transferer are you using)? > >> >> > >> >> Cheers, > >> >> Chris > >> >> > >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >> Chris Mattmann, Ph.D. > >> >> Chief Architect > >> >> Instrument Software and Science Data Systems Section (398) > >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> >> Office: 168-519, Mailstop: 168-527 > >> >> Email: chris.a.mattm...@nasa.gov > >> >> WWW: http://sunset.usc.edu/~mattmann/ > >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >> Adjunct Associate Professor, Computer Science Department > >> >> University of Southern California, Los Angeles, CA 90089 USA > >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> -----Original Message----- > >> >> From: Tom Barber <tom.bar...@meteorite.bi> > >> >> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org> > >> >> Date: Monday, November 23, 2015 at 6:36 AM > >> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> > >> >> Subject: Crawling / Archiving binary data with Solr backend > >> >> > >> >> >Hello, > >> >> > > >> >> >Looks like I've never tried it before with binary data. If I swap > >>the > >> >> >filemgr defaults to use solr then try and crawl my staging directory > >> >>using > >> >> >the Tika extractor I get a lot of > >> >> > > >> >> >org.apache.xmlrpc.XmlRpcException: java.lang.Exception: > >> >> >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: > >>Error > >> >> >ingesting product > >> >>[org.apache.oodt.cas.filemgr.structs.Product@62b19476] > >> >> : > >> >> >null > >> >> >at > >> >> > >> > >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpc > >>>>>Cl > >> >>>ie > >> >> >ntResponseProcessor.java:104) > >> >> >at > >> >> > >> > >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcC > >>>>>li > >> >>>en > >> >> >tResponseProcessor.java:71) > >> >> >at > >> >> > >> > >>>>>org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:7 > >>>>>3) > >> >> > > >> >> > > >> >> >Type things. > >> >> > > >> >> >Any ideas? > >> >> > > >> >> >Tom > >> >> > >> >> > >> > >> > >> > > >