[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

Yury Kartsev (JIRA) Wed, 21 Sep 2016 09:06:50 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-9493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15510395#comment-15510395
 ]


Yury Kartsev commented on SOLR-9493:
------------------------------------

{quote}Solr can generate a UUID value, but it's essentially just a random 
number, and each value has no connection to the other data in the indexed 
document at all. {quote}
That's what I thought is false. That was the whole reason of why I wanted SOLR 
to generate it - to avoid that rare case when UUID matches the existing one. I 
thought SOLR uses some kind of algorithm that somehow eliminates such a case. I 
did not want to generate it on client side solely because of that reason - 
being afraid that one day it will generate an existing one. But if you're 
saying that that's what SOLR may do, then it make no difference form this point 
of view... Are you sure about "has no connection to the other data in the 
indexed document at all"? I.e. doesn't it have "counter" or "sequence-like" 
part in UUID generation algorithm?

> uniqueKey generation fails if content POSTed as "application/javabin" and 
> uniqueKey field comes as NULL (as opposed to not coming at all).
> ------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-9493
>                 URL: https://issues.apache.org/jira/browse/SOLR-9493
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Yury Kartsev
>         Attachments: 200.png, 400.png, Screen Shot 2016-09-11 at 16.29.50 
> .png, SolrInputDoc_contents.png, SolrInputDoc_headers.png
>
>
> I have faced a weird issue when the same application code (using SolrJ) fails 
> indexing a document without a unique key (should be auto-generated by SOLR) 
> in SolrCloud and succeeds indexing it in standalone SOLR instance (or even in 
> cloud mode, but from web interface of one of the replicas). Difference is 
> obviously only between clients (CloudSolrClient vs HttpSolrClient) and SOLR 
> URLs (Zokeeper hostname+port vs standalone SOLR instance hostname and port). 
> Failure is seen as "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id".
> I am using SOLR 5.1. In cloud mode I have 1 shard and 3 replicas.
> After lot of debugging and investigation (see below as well as my 
> [StackOverflow 
> post|http://stackoverflow.com/questions/39401792/uniquekey-generation-does-not-work-in-solrcloud-but-works-if-standalone])
>  I came to a conclusion that the difference in failing and succeeding calls 
> is simply content type of the POSTing requests. Local proxy clearly shows 
> that the request fails if content is sent as "application/javabin" (see 
> attached screenshot with sensitive data removed) and succeeds if content sent 
> as "application/xml; charset=UTF-8"  (see attached screenshot with sensitive 
> data removed).
> Would you be able to please assist?
> Thank you very much in advance!
> ------------------------
> Copying whole description and investigation here as well:
> ------------------------
> [Documentation|https://cwiki.apache.org/confluence/display/solr/Other+Schema+Elements]
>  states:{quote}Schema defaults and copyFields cannot be used to populate the 
> uniqueKey field. You can use UUIDUpdateProcessorFactory to have uniqueKey 
> values generated automatically.{quote}
> Therefore I have added my uniqueKey field to the schema:{code}<fieldType 
> name="uuid" class="solr.UUIDField" indexed="true" />
> ...
> <field name="id" type="uuid" indexed="true" stored="true" required="true" />
> ...
> <uniqueKey>id</uniqueKey>{code}Then I have added updateRequestProcessorChain 
> to my solrconfig:{code}<updateRequestProcessorChain name="uuid">
>     <processor class="solr.UUIDUpdateProcessorFactory">
>         <str name="fieldName">id</str>
>     </processor>
>     <processor class="solr.RunUpdateProcessorFactory" />
> </updateRequestProcessorChain>{code}And made it the default for the 
> UpdateRequestHandler:{code}<initParams path="/update/**">
>  <lst name="defaults">
>   <str name="update.chain">uuid</str>
>  </lst>
> </initParams>{code}
> Adding new documents with null/absent id works fine as from web-interface of 
> one of the replicas, as when using SOLR in standalone mode (non-cloud) from 
> my application. Although when only I'm using SolrCloud and add document from 
> my application (using CloudSolrClient from SolrJ) it fails with 
> "org.apache.solr.client.solrj.SolrServerException: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Document is missing mandatory uniqueKey field: id"
> All other operations like ping or search for documents work fine in either 
> mode (standalone or cloud).
> INVESTIGATION (i.e. more details):
> In standalone mode obviously update request is:{code}POST 
> standalone_host:port/solr/collection_name/update?wt=json{code}
> In SOLR cloud mode, when adding document from one replica's web interface, 
> update request is (found through inspecting the call made by web interface): 
> {code}POST 
> replica_host:port/solr/collection_name_shard1_replica_1/update?wt=json{code}
> In both these cases payload is something like:{code}{
>     "add": {
>         "doc": {
>                  .....
>         },
>         "boost": 1.0,
>         "overwrite": true,
>         "commitWithin": 1000
>     }
> }{code}
> In case when CloudSolrClient is used, the following happens (found through 
> debugging):
> Using ZK and some logic, URL list of replicas is constructed that looks like 
> this:{code}[http://replica_1_host:port/solr/collection_name/,
>  http://replica_2_host:port/solr/collection_name/,
>  http://replica_3_host:port/solr/collection_name/]{code}
> This code is called:{code}LBHttpSolrClient.Req req = new 
> LBHttpSolrClient.Req(request, theUrlList);
> LBHttpSolrClient.Rsp rsp = lbClient.request(req);
> return rsp.getResponse();{code}
> Where the second line fails with the exception.
> If to debug the second line further, it ends up calling HttpClient.execute 
> (from HttpSolrClient.executeMethod) for:{code}POST 
> http://replica_1_host:port/solr/collection_name/update?wt=javabin&version=2 
> HTTP/1.1
> POST 
> http://replica_2_host:port/solr/collection_name/update?wt=javabin&version=2 
> HTTP/1.1
> POST 
> http://replica_3_host:port/solr/collection_name/update?wt=javabin&version=2 
> HTTP/1.1{code}
> And the very first request returns 400 Bad Request with replica 1 logging 
> "Document is missing mandatory uniqueKey field: id" in the logs.
> The funny thing is that when I execute the same request using POSTMAN (but 
> with JSON instead of binary payload), it works! Am I doing something wrong 
> here? I assume it's definitely something in the way of how the request is 
> made...
> UPDATE:
> I have used local proxy in order to see the difference in these 2 requests 
> sent by my application in order to understand what is different there. Looks 
> like the only difference is content type. In case of cloud mode the payload 
> for POSTing document is sent as "application/javabin" while in standalone 
> mode it's sent as "application/xml; charset=UTF-8". Everything else is the 
> same. First request results in 400 while second is 200.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-9493) uniqueKey generation fails if content POSTed as "application/javabin" and uniqueKey field comes as NULL (as opposed to not coming at all).

Reply via email to