[ https://issues.apache.org/jira/browse/SOLR-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445886#comment-13445886 ]
Jack Krupansky commented on SOLR-3775: -------------------------------------- Thanks for reporting the issue. Although it is true that the Solr project can't fix Tika/POI issues directly, it is very useful for us to be able to report to Solr/SolrCell users that MS Word 97 may encounter ingestion problems. Can you confirm whether none of your Word 97 files are being parsed, or is it just some of them? This may be this POI bug: https://issues.apache.org/bugzilla/show_bug.cgi?id=53380 Please comment on that bug directly if you feel it does match and indicate its level of importance to you. It does not appear to have seen any activity since it was reported back in June. > Unexpected RuntimeException > --------------------------- > > Key: SOLR-3775 > URL: https://issues.apache.org/jira/browse/SOLR-3775 > Project: Solr > Issue Type: Bug > Affects Versions: 4.0-BETA > Reporter: Alex C > Assignee: Uwe Schindler > > Hi. I'm using Solr 4.0 Beta (no modifications to default installation) to > index, and it's blowing up on Word *.DOC files: > {code}curl > "http://localhost:8983/solr/update/extract?literal.id=doc15&commit=true" -F > "myfile=@15.doc"{code} > Here's the exception. And the same files go through Solr 3.6.1 just fine. > {noformat} <?xml version="1.0" encoding="UTF-8"?> > <response> > <lst name="responseHeader"><int name="status">500</int><int > name="QTime">18</int > ></lst><lst name="error"><str > name="msg">org.apache.tika.exception.TikaException > : Unexpected RuntimeException from > org.apache.tika.parser.microsoft.OfficeParser > @328c62ce</str><str name="trace">org.apache.solr.common.SolrException: > org.apache.tika.exception.TikaException: Unexpected RuntimeException from > org.apache.tika.parser.microsoft.OfficeParser@328c62ce > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr > actingDocumentLoader.java:230) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co > ntentStreamHandlerBase.java:74) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl > erBase.java:129) > at > org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle > Request(RequestHandlers.java:240) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter > .java:454) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte > r.java:275) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet > Handler.java:1337) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java > :484) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j > ava:119) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandl > er.java:233) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandl > er.java:1065) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java: > 413) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandle > r.java:192) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandle > r.java:999) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.j > ava:117) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Cont > extHandlerCollection.java:250) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColl > ection.java:149) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper > .java:111) > at org.eclipse.jetty.server.Server.handle(Server.java:351) > at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(Abstrac > tHttpConnection.java:454) > at > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(Blockin > gHttpConnection.java:47) > at > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(Abstra > ctHttpConnection.java:890) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.header > Complete(AbstractHttpConnection.java:944) > at > org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642) > at > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) > at > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpCo > nnection.java:66) > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(So > cketConnector.java:254) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPoo > l.java:599) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool > .java:534) > at java.lang.Thread.run(Unknown Source) > Caused by: org.apache.tika.exception.TikaException: Unexpected > RuntimeException > from org.apache.tika.parser.microsoft.OfficeParser@328c62ce > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244 > ) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 > ) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1 > 20) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr > actingDocumentLoader.java:224) > ... 31 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.poi.util.LittleEndian.getInt(LittleEndian.java:163) > at > org.apache.poi.hwpf.model.Colorref.<init>(Colorref.java:81) > at > org.apache.poi.hwpf.model.types.SHDAbstractType.fillFields(SHDAbstrac > tType.java:56) > at > org.apache.poi.hwpf.usermodel.ShadingDescriptor.<init>(ShadingD > escriptor.java:38) > at > org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.unCompressCHPOpera > tion(CharacterSprmUncompressor.java:582) > at > org.apache.poi.hwpf.sprm.CharacterSprmUncompressor.uncompressCHP(Char > acterSprmUncompressor.java:65) > at > org.apache.poi.hwpf.model.StyleSheet.createChp(StyleSheet.java:288) > at > org.apache.poi.hwpf.model.StyleSheet.<init>(StyleSheet.java:121 > ) > at > org.apache.poi.hwpf.HWPFDocument.<init>(HWPFDocument.java:346) > at > org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.ja > va:77) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java > :185) > at > org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java > :160) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242 > ) > ... 34 more > </str><int name="code">500</int></lst> > </response>{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org