[
https://issues.apache.org/jira/browse/SOLR-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noble Paul resolved SOLR-6617.
------------------------------
Resolution: Fixed
Fix Version/s: Trunk
5.0
thanks [~thelabdude]
> /update/json/docs handler needs to do a better job with tweet like JSON
> structures
> ----------------------------------------------------------------------------------
>
> Key: SOLR-6617
> URL: https://issues.apache.org/jira/browse/SOLR-6617
> Project: Solr
> Issue Type: Improvement
> Reporter: Timothy Potter
> Assignee: Noble Paul
> Fix For: 5.0, Trunk
>
> Attachments: SOLR-6617.patch
>
>
> SOLR-6304 allows me to send in arbitrary JSON document and have Solr do
> something reasonable with it. I tried this with a simple tweet and got a
> weird error:
> {code}
> curl "http://localhost:8983/solr/tutorial/update/json/docs" -H
> 'Content-type:application/json' -d @sample_tweet.json
> {"responseHeader":{"status":400,"QTime":11},"error":{"msg":"Document contains
> multiple values for uniqueKey field: id=[14065694,
> 136447843652214784]","code":400}}
> {code}
> Here's the tweet I'm trying to index:
> {code}
> {
> "user": {
> "name": "John Doe",
> "screen_name": "example",
> "lang": "en",
> "time_zone": "London",
> "listed_count": 221,
> "id": 14065694,
> "geo_enabled": true
> },
> "id": "136447843652214784",
> "text": "Morning San Francisco - 36 hours and counting.. #datasift",
> "created_at": "Tue, 15 Nov 2011 14:17:55 +0000"
> }
> {code}
> The error is because the nested user object within the tweet also has an "id"
> field. So then I tried to map /user/id to user_id_s via:
> {code}
> curl
> "http://localhost:8983/solr/tutorial/update/json/docs?f=user_id_s:/user/id"
> -H 'Content-type:application/json' -d @sample_tweet.json
> {"responseHeader":{"status":400,"QTime":0},"error":{"msg":"Document is
> missing mandatory uniqueKey field: id","code":400}}
> {code}
> So then I added the mapping for id explicitly and it worked:
> curl
> "http://localhost:8983/solr/tutorial/update/json/docs?f=id:/id&f=user_id_s:/user/id"
> -H 'Content-type:application/json' -d @sample_tweet.json
> {"responseHeader":{"status":0,"QTime":25}}
> Working through this wasn't terrible but our goal with features like this is
> to have Solr make good decisions when possible to ease the new user's burden
> of getting to know Solr.
> I'm just wondering if the reasonable thing to do wouldn't be to map the user
> fields with user_ prefix? ie /user/id becomes user_id automatically.
> Lastly, I wanted to use field guessing with this so my JSON document gets
> indexed in a reasonable way and the only data that got indexed is:
> {code}
> {
> "user_id_s": "14065694",
> "id": "136447843652214784",
> "_version_": 1481614081193410600
> }
> {code}
> So I explicitly defined the /update/json/docs request handler in my
> solrconfig.xml as:
> {code}
> <requestHandler name="/update/json/docs" class="solr.UpdateRequestHandler">
> <lst name="defaults">
> <str name="update.chain">add-unknown-fields-to-the-schema</str>
> <str name="stream.contentType">application/json</str>
> </lst>
> </requestHandler>
> {code}
> Same result - no field guessing! (this is using the schemaless example config)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]