Hi Karl,
I understand that the piece of code involved is exactly the same as the
one in the SolrJ API, which is the "reference" way of coding.
Let me explain again the different steps of my tests :
1) I configured a job to crawl a winshare repository containing 3 files
and ingesting them into a Solr 7.4.0 instance
2) The job ran and ended with a 'Done' status and the number of
processed documents was correct.
3) I checked the number of documents of my Solr instance and noticed
that it was 0
4) I checked the Simple history of MCF and found the following error for
each of my 3 documents :
09-21-2018 11:49:09.362 document ingest (Solr)
file://///localhost/OCR/subfolder/test_file.txt
400 61 118749 Error from server at
http://localhost:8983/solr/FileShare: missing content stream
5) I then checked the logs of Solr and found the following error for
each of the document ingestions :
ERROR 2018-09-21T11:51:04,100 (qtp952486988-21) -
Solr|Solr|solr.handler.RequestHandlerBase|[c:FileShare s:shard1
r:core_node2 x:FileShare_shard1_replica_n1] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: missing content stream
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:63)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
at org.eclipse.jetty.server.Server.handle(Server.java:531)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)
at java.lang.Thread.run(Thread.java:748)
6) I did a new crawl to debug the code and found that after the
following lines (in the
org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrClient:108) :
SolrParams params = request.getParams();
RequestWriter.ContentWriter contentWriter =
requestWriter.getContentWriter(request);
Collection<ContentStream> streams = contentWriter == null ?
requestWriter.getContentStreams(request) : null;
the 'streams' object is null
So I checked the value of the contentWriter object and found that
it was not null. So it explains why the if statement attributed the null
value to the 'streams' object instead of the
requestWriter.getContentStreams(request) which, after checking it, is
correctly returning a ContentStream collection containing the input
stream of the incoming file.
In conclusion, I am as confused as you and, knowing that you used the
same piece of code than the SolrJ API, I am wondering wether we should
ask them some explanation ?
Julien
On 21/09/2018 15:04, Karl Wright wrote:
Hi Julien,
I verified that the integration test in question confirms the following:
(a) that the right number of documents were processed, and that (b) there
were no errors reported during the processing. So unless the failure is
indeed a silent one, and documents are simply not getting transmitted to
Solr at all, that test should be valid.
Can you describe the actual failure that you are seeing please?
Karl
On Fri, Sep 21, 2018 at 8:52 AM Karl Wright <[email protected]> wrote:
Julien,
Integration tests do cover indexing via SolrJ, and they do succeed.
(That's how I found the deletion bug FWIW). I therefore need more
information about the specific failure symptom you are seeing before I'll
withdraw the candidate. If it's a silent failure that's one thing but if
you are are seeing a ManifoldCF exception then something is different
between your setup and mine.
Karl
On Fri, Sep 21, 2018 at 8:09 AM Julien Massiera <
[email protected]> wrote:
-1 ref : https://issues.apache.org/jira/browse/CONNECTORS-1533
Julien
On 20/09/2018 10:38, Karl Wright wrote:
All tests pass, artifacts look good.
+1 from me.
Karl
On Wed, Sep 19, 2018 at 9:57 PM Karl Wright <[email protected]> wrote:
Please vote on whether to release ManifoldCF 2.11, RC1. This release
contains a number of fixes/improvements/additions, described in the
CHANGES.txt file. In addition, it includes Tika 1.19, which has a
number
of fixes for classpath issues specifically requested by ManifoldCF.
This fixes a SolrJ related problem with the Solr Connector found in
RC1.
All tests pass.
The release artifact can be found at:
https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.11
There is also a tag at:
https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.11-RC1
Thanks again,
Karl Wright
--
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
www.francelabs.com
--
Julien MASSIERA
Directeur développement produit
France Labs – Les experts du Search
Retrouvez-nous à l’Enterprise Search & Discovery Summit à Washington DC
www.francelabs.com