[ 
https://issues.apache.org/jira/browse/SOLR-12659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16577630#comment-16577630
 ] 

Shawn Heisey commented on SOLR-12659:
-------------------------------------

I don't think this problem can be fixed without a low-level change to how Solr 
handles requests, and a corresponding change to how SolrJ puts the request 
together.

Normally when a "URI Too Long" error is encountered, the fix is to change the 
request from GET to POST, so that the parameters are moved into the request 
body instead of being part of the URL.  But here, the request is *already* a 
POST.  SolrJ uses POST requests for updates, and in this particular case, the 
body is being used to transfer binary data.

The only way I can imagine this getting fixed is to change the ERH so that it 
can handle a multi-part POST -- where one part is the binary data and one part 
is the parameters.  Then SolrJ will need an adjustment to create these parts 
separately.  I do not know if this capability is already present or not, but I 
would suspect that it is not.

Side note:  You are aware that we strongly recommend NOT using the Extracting 
Request Handler in production, I hope?  It can cause Solr to crash even if you 
do everything right.  The Tika software which extracts data from rich documents 
should be run in a completely separate program, so that if it crashes, Solr 
doesn't go down.  This would indirectly fix the problem described here too -- 
because the input document(s) would be fully formed and wouldn't need 
parameters like literal.XXXX to populate the data.


> HttpSolrClient.createMethod does not handle both stream and large data
> ----------------------------------------------------------------------
>
>                 Key: SOLR-12659
>                 URL: https://issues.apache.org/jira/browse/SOLR-12659
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrJ
>    Affects Versions: 6.6.2, 7.4
>            Reporter: Eirik Lygre
>            Priority: Major
>
> When using a ContentStreamUpdateRequest with stream data (through 
> addContentStream()), all other parameters are passed on the URL, leading to 
> the server failing with "URI is too large".
> The code below provokes the error using Solrj 7.4, but was first seen on Solr 
> 6.6.2. The problem is in HttpSolrClient.createMethod(), where the presence of 
> stream data leads to all other fields being put on the URL
> h2. Example code
> {code:java}
> String stringValue = StringUtils.repeat('X', 16*1024);
> SolrClient solr = new HttpSolrClient.Builder(BASE_URL).build();
> ContentStreamUpdateRequest updateRequest = new 
> ContentStreamUpdateRequest("/update/extract");
> updateRequest.setParam("literal.id", "UriTooLargeTest-simpleTest");
> updateRequest.setParam("literal.field", stringValue);
> updateRequest.addContentStream(new 
> ContentStreamBase.StringStream(stringValue));
> updateRequest.process(solr);
> {code}
> h2. The client sees the following error:
> {code:java}
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://server/solr/core: Expected mime type 
> application/octet-stream but got text/html. <h1>Bad Message 
> 414</h1><pre>reason: URI Too Long</pre>
> {code}
> h2. Error fragment from HttpSolrClient.createMethod
> {code}
>       if(contentWriter != null) {
>         String fullQueryUrl = url + wparams.toQueryString();
>         HttpEntityEnclosingRequestBase postOrPut = SolrRequest.METHOD.POST == 
> request.getMethod() ?
>             new HttpPost(fullQueryUrl) : new HttpPut(fullQueryUrl);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to