David, is it failing on some particular file or always, never mind what goes on? POI hints that there is illegal offset, that probably is a cause of the error.
--Oleg On Wed, Dec 12, 2012 at 4:31 PM, David Morana (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/TIKA-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13529984#comment-13529984] > > David Morana commented on TIKA-1041: > ------------------------------------ > > after some research, I upgraded the POI jars to 3.9 ( I was at v3.8 beta) > but no luck I'm still getting the error above > > > Tika 1.2 universalcharset errors > > -------------------------------- > > > > Key: TIKA-1041 > > URL: https://issues.apache.org/jira/browse/TIKA-1041 > > Project: Tika > > Issue Type: Bug > > Affects Versions: 1.2 > > Environment: I'm running solr 4.0 with tika 1.2 on tomcat 7.0.8 > with manifoldcf v1.1dev > > Reporter: David Morana > > Fix For: 1.2, 1.3 > > > > > > This is somewhat confusing and frustrating. I successfully crawled > Opentext using all of the above. then I recrawled and it aborted almost > immediately. > > It choked on images, so I excluded them for now. > > but now it's choking on txt files! > > sometimes I get this error > > SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: > org/mozilla/universalchardet/CharsetListener > > and sometimes I get this one > > SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: > org/apache/tika/parser/txt/UniversalEncodingListener > > -- > This message is automatically generated by JIRA. > If you think it was sent incorrectly, please contact your JIRA > administrators > For more information on JIRA, see: http://www.atlassian.com/software/jira >
