Hi Karl,

I understand that the piece of code involved is exactly the same as the one in the SolrJ API, which is the "reference" way of coding.

Let me explain again the different steps of my tests :

1) I configured a job to crawl a winshare repository containing 3 files and ingesting them into a Solr 7.4.0 instance

2) The job ran and ended with a 'Done' status and the number of processed documents was correct.

3) I checked the number of documents of my Solr instance and noticed that it was 0

4) I checked the Simple history of MCF and found the following error for each of my 3 documents :

09-21-2018 11:49:09.362 document ingest (Solr) file://///localhost/OCR/subfolder/test_file.txt 400 61 118749 Error from server at http://localhost:8983/solr/FileShare: missing content stream


5) I then checked the logs of Solr and found the following error for each of the document ingestions :

ERROR 2018-09-21T11:51:04,100 (qtp952486988-21) - Solr|Solr|solr.handler.RequestHandlerBase|[c:FileShare s:shard1 r:core_node2 x:FileShare_shard1_replica_n1] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: missing content stream     at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:63)     at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539)
    at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)     at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)     at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)     at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)     at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)     at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)     at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)     at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)     at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)     at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)     at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)     at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)     at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)     at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)     at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)     at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)     at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)     at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)     at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)     at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)     at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)     at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
    at org.eclipse.jetty.server.Server.handle(Server.java:531)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)     at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
    at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)     at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)     at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)     at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)     at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)     at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)     at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
    at java.lang.Thread.run(Thread.java:748)

6) I did a new crawl to debug the code and found that after the following lines (in the org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient:108) :
    SolrParams params = request.getParams();
    RequestWriter.ContentWriter contentWriter = requestWriter.getContentWriter(request);     Collection<ContentStream> streams = contentWriter == null ? requestWriter.getContentStreams(request) : null;

    the 'streams' object is null

    So I checked the value of the contentWriter object and found that it was not null. So it explains why the if statement attributed the null value to the 'streams' object instead of the requestWriter.getContentStreams(request) which, after checking it, is correctly returning a ContentStream collection containing the input stream of the incoming file.


In conclusion, I am as confused as you and, knowing that you used the same piece of code than the SolrJ API, I am wondering wether we should ask them some explanation ?

Julien

On 21/09/2018 15:04, Karl Wright wrote:
Hi Julien,

I verified that the integration test in question confirms the following:
(a) that the right number of documents were processed, and that (b) there
were no errors reported during the processing.  So unless the failure is
indeed a silent one, and documents are simply not getting transmitted to
Solr at all, that test should be valid.

Can you describe the actual failure that you are seeing please?

Karl


On Fri, Sep 21, 2018 at 8:52 AM Karl Wright <[email protected]> wrote:

Julien,

Integration tests do cover indexing via SolrJ, and they do succeed.
(That's how I found the deletion bug FWIW).  I therefore need more
information about the specific failure symptom you are seeing before I'll
withdraw the candidate.  If it's a silent failure that's one thing but if
you are are seeing a ManifoldCF exception then something is different
between your setup and mine.

Karl


On Fri, Sep 21, 2018 at 8:09 AM Julien Massiera <
[email protected]> wrote:

-1 ref : https://issues.apache.org/jira/browse/CONNECTORS-1533

Julien


On 20/09/2018 10:38, Karl Wright wrote:
All tests pass, artifacts look good.

+1 from me.

Karl


On Wed, Sep 19, 2018 at 9:57 PM Karl Wright <[email protected]> wrote:

Please vote on whether to release ManifoldCF 2.11, RC1.  This release
contains a number of fixes/improvements/additions, described in the
CHANGES.txt file.  In addition, it includes Tika 1.19, which has a
number
of fixes for classpath issues specifically requested by ManifoldCF.

This fixes a SolrJ related problem with the Solr Connector found in
RC1.
All tests pass.

The release artifact can be found at:


https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
There is also a tag at:

https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC1

Thanks again,
Karl Wright


--
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
www.francelabs.com



--
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
www.francelabs.com

Reply via email to