[
https://issues.apache.org/jira/browse/SOLR-11622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291483#comment-16291483
]
Tim Allison commented on SOLR-11622:
------------------------------------
[[email protected]], if it is ok with you and if I have time, I'll
try to submit a PR on SOLR-11701. If I don't have time, it will be all yours
after you return. :) Sound good...or do you want the glory?
For the last integration test I did, I put [these
documents|https://github.com/apache/tika/tree/master/tika-parsers/src/test/resources/test-documents]
in a directory and ran tika-app.jar against them. I then ran tika-eval.jar
and counted the number of files without exceptions to get a ground truth count
of how many files I'd expect to be in Solr.
I then used DIH to import the same directory, with skip on error, and made sure
there were the same # of documents in Solr. This uncovered several problems,
which we'll fix in this issue or SOLR-11701.
> Bundled mime4j library not sufficient for Tika requirement
> ----------------------------------------------------------
>
> Key: SOLR-11622
> URL: https://issues.apache.org/jira/browse/SOLR-11622
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Build
> Affects Versions: 7.1, 6.6.2
> Reporter: Karim Malhas
> Assignee: Karthik Ramachandran
> Priority: Minor
> Labels: build
> Attachments: SOLR-11622.patch, SOLR-11622.patch
>
>
> The version 7.2 of Apache James Mime4j bundled with the Solr binary releases
> does not match what is required by Apache Tika for parsing rfc2822 messages.
> The master branch for james-mime4j seems to contain the missing Builder class
> [https://github.com/apache/james-mime4j/blob/master/core/src/main/java/org/apache/james/mime4j/stream/MimeConfig.java
> ]
> This prevents import of rfc2822 formatted messages. For example like so:
> {{./bin/post -c dovecot -type 'message/rfc822' 'testdata/email_01.txt'
> }}
> And results in the following stacktrace:
> java.lang.NoClassDefFoundError:
> org/apache/james/mime4j/stream/MimeConfig$Builder
> at
> org.apache.tika.parser.mail.RFC822Parser.parse(RFC822Parser.java:63)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at
> org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:534)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
> at java.lang.Thread.run(Thread.java:748)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]