Canceling due to problems with Solr connector. Karl
On Fri, Sep 21, 2018 at 9:35 AM Julien Massiera < [email protected]> wrote: > Hi Karl, > > I understand that the piece of code involved is exactly the same as the > one in the SolrJ API, which is the "reference" way of coding. > > Let me explain again the different steps of my tests : > > 1) I configured a job to crawl a winshare repository containing 3 files > and ingesting them into a Solr 7.4.0 instance > > 2) The job ran and ended with a 'Done' status and the number of > processed documents was correct. > > 3) I checked the number of documents of my Solr instance and noticed > that it was 0 > > 4) I checked the Simple history of MCF and found the following error for > each of my 3 documents : > > 09-21-2018 11:49:09.362 document ingest (Solr) > file://///localhost/OCR/subfolder/test_file.txt > 400 61 118749 Error from server at > http://localhost:8983/solr/FileShare: missing content stream > > > 5) I then checked the logs of Solr and found the following error for > each of the document ingestions : > > ERROR 2018-09-21T11:51:04,100 (qtp952486988-21) - > Solr|Solr|solr.handler.RequestHandlerBase|[c:FileShare s:shard1 > r:core_node2 x:FileShare_shard1_replica_n1] o.a.s.h.RequestHandlerBase > org.apache.solr.common.SolrException: missing content stream > at > > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:63) > at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515) > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377) > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323) > at > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) > at > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257) > at > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) > at > > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) > at > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) > at > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) > at > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) > at > > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) > at > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) > at > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) > at > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219) > at > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126) > at > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at > > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) > at > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) > at org.eclipse.jetty.server.Server.handle(Server.java:531) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) > at > org.eclipse.jetty.io > .AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) > at org.eclipse.jetty.io > .ChannelEndPoint$2.run(ChannelEndPoint.java:118) > at > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) > at > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) > at > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) > at > > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) > at > > org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) > at > > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) > at > > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) > at java.lang.Thread.run(Thread.java:748) > > 6) I did a new crawl to debug the code and found that after the > following lines (in the > org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient:108) : > SolrParams params = request.getParams(); > RequestWriter.ContentWriter contentWriter = > requestWriter.getContentWriter(request); > Collection<ContentStream> streams = contentWriter == null ? > requestWriter.getContentStreams(request) : null; > > the 'streams' object is null > > So I checked the value of the contentWriter object and found that > it was not null. So it explains why the if statement attributed the null > value to the 'streams' object instead of the > requestWriter.getContentStreams(request) which, after checking it, is > correctly returning a ContentStream collection containing the input > stream of the incoming file. > > > In conclusion, I am as confused as you and, knowing that you used the > same piece of code than the SolrJ API, I am wondering wether we should > ask them some explanation ? > > Julien > > On 21/09/2018 15:04, Karl Wright wrote: > > Hi Julien, > > > > I verified that the integration test in question confirms the following: > > (a) that the right number of documents were processed, and that (b) there > > were no errors reported during the processing. So unless the failure is > > indeed a silent one, and documents are simply not getting transmitted to > > Solr at all, that test should be valid. > > > > Can you describe the actual failure that you are seeing please? > > > > Karl > > > > > > On Fri, Sep 21, 2018 at 8:52 AM Karl Wright <[email protected]> wrote: > > > >> Julien, > >> > >> Integration tests do cover indexing via SolrJ, and they do succeed. > >> (That's how I found the deletion bug FWIW). I therefore need more > >> information about the specific failure symptom you are seeing before > I'll > >> withdraw the candidate. If it's a silent failure that's one thing but > if > >> you are are seeing a ManifoldCF exception then something is different > >> between your setup and mine. > >> > >> Karl > >> > >> > >> On Fri, Sep 21, 2018 at 8:09 AM Julien Massiera < > >> [email protected]> wrote: > >> > >>> -1 ref : https://issues.apache.org/jira/browse/CONNECTORS-1533 > >>> > >>> Julien > >>> > >>> > >>> On 20/09/2018 10:38, Karl Wright wrote: > >>>> All tests pass, artifacts look good. > >>>> > >>>> +1 from me. > >>>> > >>>> Karl > >>>> > >>>> > >>>> On Wed, Sep 19, 2018 at 9:57 PM Karl Wright <[email protected]> > wrote: > >>>> > >>>>> Please vote on whether to release ManifoldCF 2.11, RC1. This release > >>>>> contains a number of fixes/improvements/additions, described in the > >>>>> CHANGES.txt file. In addition, it includes Tika 1.19, which has a > >>> number > >>>>> of fixes for classpath issues specifically requested by ManifoldCF. > >>>>> > >>>>> This fixes a SolrJ related problem with the Solr Connector found in > >>> RC1. > >>>>> All tests pass. > >>>>> > >>>>> The release artifact can be found at: > >>>>> > >>>>> > >>> > https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11 > >>>>> There is also a tag at: > >>>>> > >>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC1 > >>>>> > >>>>> Thanks again, > >>>>> Karl Wright > >>>>> > >>>>> > >>> -- > >>> Julien MASSIERA > >>> Directeur développement produit > >>> France Labs – Les experts du Search > >>> Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC > >>> www.francelabs.com > >>> > >>> > > -- > Julien MASSIERA > Directeur développement produit > France Labs – Les experts du Search > Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC > www.francelabs.com > >
