Encoding issues with the extracted metadata? What are you getting just running Tika on the files?
The actual data shouldn’t matter since it’s not being ingested (are you doing it in place, or what data transferer are you using)? Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ -----Original Message----- From: Tom Barber <tom.bar...@meteorite.bi> Reply-To: "dev@oodt.apache.org" <dev@oodt.apache.org> Date: Monday, November 23, 2015 at 6:36 AM To: "dev@oodt.apache.org" <dev@oodt.apache.org> Subject: Crawling / Archiving binary data with Solr backend >Hello, > >Looks like I've never tried it before with binary data. If I swap the >filemgr defaults to use solr then try and crawl my staging directory using >the Tika extractor I get a lot of > >org.apache.xmlrpc.XmlRpcException: java.lang.Exception: >org.apache.oodt.cas.filemgr.structs.exceptions.CatalogException: Error >ingesting product [org.apache.oodt.cas.filemgr.structs.Product@62b19476] : >null >at >org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpcClie >ntResponseProcessor.java:104) >at >org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcClien >tResponseProcessor.java:71) >at >org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:73) > > >Type things. > >Any ideas? > >Tom