Author: Alun Davis - Loudmouth Content-Length: 3273160 Content-Type: audio/mpeg X-Parsed-By: org.apache.tika.parser.DefaultParser X-TIKA:digest:MD5: 5f374012180e94778346619515152f74 X-TIKA:digest:SHA256: 34d8bf9da8feb848922138eb7807c0d71ed92376422fb28c8cbbffe788574ab0 channels: 2 creator: Alun Davis - Loudmouth dc:creator: Alun Davis - Loudmouth dc:title: Teenage Baghead meta:author: Alun Davis - Loudmouth resourceName: Teenage Baghead.mp3 samplerate: 44100 title: Teenage Baghead version: MPEG 3 Layer III Version 1 xmpDM:album: xmpDM:artist: Alun Davis - Loudmouth xmpDM:audioChannelType: Stereo xmpDM:audioCompressor: MP3 xmpDM:audioSampleRate: 44100 xmpDM:duration: 204577.046875 xmpDM:genre: Pop xmpDM:logComment: www.maimthattune.com for more! xmpDM:releaseDate: 2001
Nothing that should scare a parser in the mp3 at least. On Mon, Nov 23, 2015 at 3:33 PM, Chris Mattmann <chris.mattm...@gmail.com> wrote: > yeah check the metadata. Any weird UTF-8 encoding? > > (aka run tika on the file outside of OODT what do you see?) > > — > Chris Mattmann > chris.mattm...@gmail.com > > > > > > > -----Original Message----- > From: Tom Barber <tom.bar...@meteorite.bi> > Reply-To: <dev@oodt.apache.org> > Date: Monday, November 23, 2015 at 7:23 AM > To: "dev@oodt.apache.org" <dev@oodt.apache.org> > Subject: Re: Crawling / Archiving binary data with Solr backend > > >./crawler/bin/crawler_launcher --filemgrUrl http://localhost:9000 > >--operation --launchMetCrawler --clientTransferer > >org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory > >--productPath $OODT_HOME/data/staging --metExtractor > >org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor > >--metExtractorConfig /home/bugg/Projects/surrey100/oodt/data/met/tika.conf > > > >I'm running that. Which runs fine with the default lucene stuff, also runs > >fine with a txt file, but doesn't run fine over a random picture I took or > >over an mp3 I tested it on. > > > > > >On Mon, Nov 23, 2015 at 3:12 PM, Mattmann, Chris A (3980) < > >chris.a.mattm...@jpl.nasa.gov> wrote: > > > >> Encoding issues with the extracted metadata? What are you getting > >> just running Tika on the files? > >> > >> The actual data shouldn’t matter since it’s not being ingested > >> (are you doing it in place, or what data transferer are you using)? > >> > >> Cheers, > >> Chris > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Chris Mattmann, Ph.D. > >> Chief Architect > >> Instrument Software and Science Data Systems Section (398) > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 168-519, Mailstop: 168-527 > >> Email: chris.a.mattm...@nasa.gov > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Adjunct Associate Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > >> > >> > >> > >> -----Original Message----- > >> From: Tom Barber <tom.bar...@meteorite.bi> > >> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org> > >> Date: Monday, November 23, 2015 at 6:36 AM > >> To: "dev@oodt.apache.org" <dev@oodt.apache.org> > >> Subject: Crawling / Archiving binary data with Solr backend > >> > >> >Hello, > >> > > >> >Looks like I've never tried it before with binary data. If I swap the > >> >filemgr defaults to use solr then try and crawl my staging directory > >>using > >> >the Tika extractor I get a lot of > >> > > >> >org.apache.xmlrpc.XmlRpcException: java.lang.Exception: > >> >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error > >> >ingesting product > >>[org.apache.oodt.cas.filemgr.structs.Product@62b19476] > >> : > >> >null > >> >at > >> > >>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpcCl > >>>ie > >> >ntResponseProcessor.java:104) > >> >at > >> > >>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcCli > >>>en > >> >tResponseProcessor.java:71) > >> >at > >> > >>>org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:73) > >> > > >> > > >> >Type things. > >> > > >> >Any ideas? > >> > > >> >Tom > >> > >> > > >