I'll give it a go. Thanks.

On Mon, Nov 23, 2015 at 3:44 PM, Chris Mattmann <chris.mattm...@gmail.com>
wrote:

> Doesn’t look weird. Hmm. Can you generate a metadata file
> using TikaCmdLine extractor and then use that metadata file
> to ingest into File Manager by hand? Does that work?
>
> —
> Chris Mattmann
> chris.mattm...@gmail.com
>
>
>
>
>
>
> -----Original Message-----
> From: Tom Barber <tom.bar...@meteorite.bi>
> Reply-To: <dev@oodt.apache.org>
> Date: Monday, November 23, 2015 at 7:43 AM
> To: "dev@oodt.apache.org" <dev@oodt.apache.org>
> Subject: Re: Crawling / Archiving binary data with Solr backend
>
> >Author: Alun Davis - Loudmouth
> >Content-Length: 3273160
> >Content-Type: audio/mpeg
> >X-Parsed-By: org.apache.tika.parser.DefaultParser
> >X-TIKA:digest:MD5: 5f374012180e94778346619515152f74
> >X-TIKA:digest:SHA256:
> >34d8bf9da8feb848922138eb7807c0d71ed92376422fb28c8cbbffe788574ab0
> >channels: 2
> >creator: Alun Davis - Loudmouth
> >dc:creator: Alun Davis - Loudmouth
> >dc:title: Teenage Baghead
> >meta:author: Alun Davis - Loudmouth
> >resourceName: Teenage Baghead.mp3
> >samplerate: 44100
> >title: Teenage Baghead
> >version: MPEG 3 Layer III Version 1
> >xmpDM:album:
> >xmpDM:artist: Alun Davis - Loudmouth
> >xmpDM:audioChannelType: Stereo
> >xmpDM:audioCompressor: MP3
> >xmpDM:audioSampleRate: 44100
> >xmpDM:duration: 204577.046875
> >xmpDM:genre: Pop
> >xmpDM:logComment: www.maimthattune.com for more!
> >xmpDM:releaseDate: 2001
> >
> >
> >Nothing that should scare a parser in the mp3 at least.
> >
> >On Mon, Nov 23, 2015 at 3:33 PM, Chris Mattmann <chris.mattm...@gmail.com
> >
> >wrote:
> >
> >> yeah check the metadata. Any weird UTF-8 encoding?
> >>
> >> (aka run tika on the file outside of OODT what do you see?)
> >>
> >> —
> >> Chris Mattmann
> >> chris.mattm...@gmail.com
> >>
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Tom Barber <tom.bar...@meteorite.bi>
> >> Reply-To: <dev@oodt.apache.org>
> >> Date: Monday, November 23, 2015 at 7:23 AM
> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org>
> >> Subject: Re: Crawling / Archiving binary data with Solr backend
> >>
> >> >./crawler/bin/crawler_launcher     --filemgrUrl http://localhost:9000
> >> >--operation --launchMetCrawler     --clientTransferer
> >> >org.apache.oodt.cas.filemgr.datatransfer.InPlaceDataTransferFactory
> >> >--productPath $OODT_HOME/data/staging     --metExtractor
> >> >org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor
> >> >--metExtractorConfig
> >>/home/bugg/Projects/surrey100/oodt/data/met/tika.conf
> >> >
> >> >I'm running that. Which runs fine with the default lucene stuff, also
> >>runs
> >> >fine with a txt file, but doesn't run fine over a random picture I
> >>took or
> >> >over an mp3 I tested it on.
> >> >
> >> >
> >> >On Mon, Nov 23, 2015 at 3:12 PM, Mattmann, Chris A (3980) <
> >> >chris.a.mattm...@jpl.nasa.gov> wrote:
> >> >
> >> >> Encoding issues with the extracted metadata? What are you getting
> >> >> just running Tika on the files?
> >> >>
> >> >> The actual data shouldn’t matter since it’s not being ingested
> >> >> (are you doing it in place, or what data transferer are you using)?
> >> >>
> >> >> Cheers,
> >> >> Chris
> >> >>
> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> Chris Mattmann, Ph.D.
> >> >> Chief Architect
> >> >> Instrument Software and Science Data Systems Section (398)
> >> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >> Office: 168-519, Mailstop: 168-527
> >> >> Email: chris.a.mattm...@nasa.gov
> >> >> WWW:  http://sunset.usc.edu/~mattmann/
> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >> Adjunct Associate Professor, Computer Science Department
> >> >> University of Southern California, Los Angeles, CA 90089 USA
> >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> -----Original Message-----
> >> >> From: Tom Barber <tom.bar...@meteorite.bi>
> >> >> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org>
> >> >> Date: Monday, November 23, 2015 at 6:36 AM
> >> >> To: "dev@oodt.apache.org" <dev@oodt.apache.org>
> >> >> Subject: Crawling / Archiving binary data with Solr backend
> >> >>
> >> >> >Hello,
> >> >> >
> >> >> >Looks like I've never tried it before with binary data. If I swap
> >>the
> >> >> >filemgr defaults to use solr then try and crawl my staging directory
> >> >>using
> >> >> >the Tika extractor I get a lot of
> >> >> >
> >> >> >org.apache.xmlrpc.XmlRpcException: java.lang.Exception:
> >> >> >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException:
> >>Error
> >> >> >ingesting product
> >> >>[org.apache.oodt.cas.filemgr.structs.Product@62b19476]
> >> >> :
> >> >> >null
> >> >> >at
> >> >>
> >>
> >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpc
> >>>>>Cl
> >> >>>ie
> >> >> >ntResponseProcessor.java:104)
> >> >> >at
> >> >>
> >>
> >>>>>org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcC
> >>>>>li
> >> >>>en
> >> >> >tResponseProcessor.java:71)
> >> >> >at
> >> >>
> >>
> >>>>>org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:7
> >>>>>3)
> >> >> >
> >> >> >
> >> >> >Type things.
> >> >> >
> >> >> >Any ideas?
> >> >> >
> >> >> >Tom
> >> >>
> >> >>
> >>
> >>
> >>
>
>
>

Reply via email to