[
https://issues.apache.org/jira/browse/SOLR-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577630#comment-16577630
]
Shawn Heisey commented on SOLR-12659:
-------------------------------------
I don't think this problem can be fixed without a low-level change to how Solr
handles requests, and a corresponding change to how SolrJ puts the request
together.
Normally when a "URI Too Long" error is encountered, the fix is to change the
request from GET to POST, so that the parameters are moved into the request
body instead of being part of the URL. But here, the request is *already* a
POST. SolrJ uses POST requests for updates, and in this particular case, the
body is being used to transfer binary data.
The only way I can imagine this getting fixed is to change the ERH so that it
can handle a multi-part POST -- where one part is the binary data and one part
is the parameters. Then SolrJ will need an adjustment to create these parts
separately. I do not know if this capability is already present or not, but I
would suspect that it is not.
Side note: You are aware that we strongly recommend NOT using the Extracting
Request Handler in production, I hope? It can cause Solr to crash even if you
do everything right. The Tika software which extracts data from rich documents
should be run in a completely separate program, so that if it crashes, Solr
doesn't go down. This would indirectly fix the problem described here too --
because the input document(s) would be fully formed and wouldn't need
parameters like literal.XXXX to populate the data.
> HttpSolrClient.createMethod does not handle both stream and large data
> ----------------------------------------------------------------------
>
> Key: SOLR-12659
> URL: https://issues.apache.org/jira/browse/SOLR-12659
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: SolrJ
> Affects Versions: 6.6.2, 7.4
> Reporter: Eirik Lygre
> Priority: Major
>
> When using a ContentStreamUpdateRequest with stream data (through
> addContentStream()), all other parameters are passed on the URL, leading to
> the server failing with "URI is too large".
> The code below provokes the error using Solrj 7.4, but was first seen on Solr
> 6.6.2. The problem is in HttpSolrClient.createMethod(), where the presence of
> stream data leads to all other fields being put on the URL
> h2. Example code
> {code:java}
> String stringValue = StringUtils.repeat('X', 16*1024);
> SolrClient solr = new HttpSolrClient.Builder(BASE_URL).build();
> ContentStreamUpdateRequest updateRequest = new
> ContentStreamUpdateRequest("/update/extract");
> updateRequest.setParam("literal.id", "UriTooLargeTest-simpleTest");
> updateRequest.setParam("literal.field", stringValue);
> updateRequest.addContentStream(new
> ContentStreamBase.StringStream(stringValue));
> updateRequest.process(solr);
> {code}
> h2. The client sees the following error:
> {code:java}
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://server/solr/core: Expected mime type
> application/octet-stream but got text/html. <h1>Bad Message
> 414</h1><pre>reason: URI Too Long</pre>
> {code}
> h2. Error fragment from HttpSolrClient.createMethod
> {code}
> if(contentWriter != null) {
> String fullQueryUrl = url + wparams.toQueryString();
> HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST ==
> request.getMethod() ?
> new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]