[
https://issues.apache.org/jira/browse/SOLR-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169749#comment-14169749
]
Timothy Potter commented on SOLR-6617:
--------------------------------------
Patch looks good [~noble.paul]. I applied this to my test scenario:
{code}
curl "http://localhost:8983/solr/tutorial/update/json/docs" -H
'Content-type:application/json' -d @sample_tweet.json
{code}
Resulted in:
{code}
{
"user.name": [
"Stewart Townsend"
],
"user.url": [
"http://www.stewarttownsend.com"
],
"user.description": [
"Developer Relations at Datasift (www.datasift.com) - Car racing
petrol head, all things social lover, co-founder of www.flowerytweetup.com"
],
"user.location": [
"iPhone: 53.852402,-2.220047"
],
"user.statuses_count": [
28247
],
"user.followers_count": [
3094
],
"user.friends_count": [
510
],
"user.screen_name": [
"stewarttownsend"
],
"user.lang": [
"en"
],
"user.time_zone": [
"London"
],
"user.listed_count": [
221
],
"user.id": [
14065694
],
"user.id_str": [
14065694
],
"user.geo_enabled": [
true
],
"id": "136447843652214784",
"text": [
"Morning San Francisco - 36 hours and counting.. #datasift"
],
"source": [
"<a href=\"http://www.tweetdeck.com\" rel=\"nofollow\">TweetDeck</a>"
],
"created_at": [
"Tue, 15 Nov 2011 14:17:55 +0000"
],
"_version_": 1481875073806631000
}
{code}
Which I'd say is very reasonable behavior on Solr's part. +1 for commit
> /update/json/docs handler needs to do a better job with tweet like JSON
> structures
> ----------------------------------------------------------------------------------
>
> Key: SOLR-6617
> URL: https://issues.apache.org/jira/browse/SOLR-6617
> Project: Solr
> Issue Type: Improvement
> Reporter: Timothy Potter
> Assignee: Noble Paul
> Attachments: SOLR-6617.patch
>
>
> SOLR-6304 allows me to send in arbitrary JSON document and have Solr do
> something reasonable with it. I tried this with a simple tweet and got a
> weird error:
> {code}
> curl "http://localhost:8983/solr/tutorial/update/json/docs" -H
> 'Content-type:application/json' -d @sample_tweet.json
> {"responseHeader":{"status":400,"QTime":11},"error":{"msg":"Document contains
> multiple values for uniqueKey field: id=[14065694,
> 136447843652214784]","code":400}}
> {code}
> Here's the tweet I'm trying to index:
> {code}
> {
> "user": {
> "name": "John Doe",
> "screen_name": "example",
> "lang": "en",
> "time_zone": "London",
> "listed_count": 221,
> "id": 14065694,
> "geo_enabled": true
> },
> "id": "136447843652214784",
> "text": "Morning San Francisco - 36 hours and counting.. #datasift",
> "created_at": "Tue, 15 Nov 2011 14:17:55 +0000"
> }
> {code}
> The error is because the nested user object within the tweet also has an "id"
> field. So then I tried to map /user/id to user_id_s via:
> {code}
> curl
> "http://localhost:8983/solr/tutorial/update/json/docs?f=user_id_s:/user/id"
> -H 'Content-type:application/json' -d @sample_tweet.json
> {"responseHeader":{"status":400,"QTime":0},"error":{"msg":"Document is
> missing mandatory uniqueKey field: id","code":400}}
> {code}
> So then I added the mapping for id explicitly and it worked:
> curl
> "http://localhost:8983/solr/tutorial/update/json/docs?f=id:/id&f=user_id_s:/user/id"
> -H 'Content-type:application/json' -d @sample_tweet.json
> {"responseHeader":{"status":0,"QTime":25}}
> Working through this wasn't terrible but our goal with features like this is
> to have Solr make good decisions when possible to ease the new user's burden
> of getting to know Solr.
> I'm just wondering if the reasonable thing to do wouldn't be to map the user
> fields with user_ prefix? ie /user/id becomes user_id automatically.
> Lastly, I wanted to use field guessing with this so my JSON document gets
> indexed in a reasonable way and the only data that got indexed is:
> {code}
> {
> "user_id_s": "14065694",
> "id": "136447843652214784",
> "_version_": 1481614081193410600
> }
> {code}
> So I explicitly defined the /update/json/docs request handler in my
> solrconfig.xml as:
> {code}
> <requestHandler name="/update/json/docs" class="solr.UpdateRequestHandler">
> <lst name="defaults">
> <str name="update.chain">add-unknown-fields-to-the-schema</str>
> <str name="stream.contentType">application/json</str>
> </lst>
> </requestHandler>
> {code}
> Same result - no field guessing! (this is using the schemaless example config)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]