[jira] [Commented] (SOLR-6304) Transforming and Indexing custom JSON data

Kelly Kagen (JIRA) Tue, 03 Nov 2015 14:20:52 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988279#comment-14988279
 ]


Kelly Kagen commented on SOLR-6304:
-----------------------------------

I'm having some difficulty while indexing custom JSON data using v5.3.1. I took 
the same example from the documentation, but it doesn't seem to be working as 
expected. Can someone validate if this is a bug or there's an issue with the 
procedure followed? The below are the scenarios.

Source: [Indexing custom JSON 
data|http://lucidworks.com/blog/2014/08/12/indexing-custom-json-data], 
[Transforming and Indexing Custom 
JSON|https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-TransformingandIndexingCustomJSON]

*Note:* The echo parameter has been added.

*Input:*
{code}
curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/exams'
'&f=first:/first'
'&f=last:/last'
'&f=grade:/grade'
'&f=subject:/exams/subject'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
'&echo=true'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'
{code}

*Output:*
{code}
{
  "error":{
    "msg":"Raw data can be stored only if split=/",
    "code":400
  }
}
{code}

Say I pass only '/' to the split parameter as reported, but with different 
field mappping, it doesn't seem to index the data per mentioned fields. Notice 
the suffix 'Name' added in the input JSON and also the field mapping.

*Input:*
{code}
curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/'
'&f=first:/firstName'
'&f=last:/lastName'
'&f=grade:/grade'
'&f=subject:/exams/subjectName'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
'&echo=true'
 -H 'Content-type:application/json' -d '
{
  "firstName": "John",
  "lastName": "Doe",
  "grade": 8,
  "exams": [
      {
        "subjectName": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'
{code}

*Output:*
{code}
{"responseHeader":{"status":0,"QTime":0},"docs":[{"id":"3c5fa5a0-ff71-4fef-b3e9-8e279cc0d724","_src_":"{
  \"firstName\": \"John\",  \"lastName\": \"Doe\",  \"grade\": 8,  \"exams\": [ 
     {        \"subjectName\": \"Maths\",        \"test\"   : \"term1\",        
\"marks\":90},        {         \"subject\": \",         \"test\"   : 
\"term1\",         \"marks\":86}      
]}","text":["John","Doe",8,"Maths",["term1","term1"],[90,86]]}]}
{code}

If there is a field named "id" is present then that reflects in the reponse, 
but all other fields are ignored for some reason.

*Input:*
{code}
curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/'
'&f=first:/firstName'
'&f=id:/lastName'
'&f=grade:/grade'
'&f=subject:/exams/subjectName'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
'&echo=true'
 -H 'Content-type:application/json' -d '
{
  "firstName": "John",
  "lastName": "Doe",
  "grade": 8,
  "exams": [
      {
        "subjectName": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'
{code}

*Output:*
{code}
{"responseHeader":{"status":0,"QTime":1},"docs":[{"id":"Doe","_src_":"{  
\"firstName\": \"John\",  \"lastName\": \"Doe\",  \"grade\": 8,  \"exams\": [   
   {        \"subjectName\": \"Maths\",        \"test\"   : \"term1\",        
\"marks\":90},        {         \"subject\": \",         \"test\"   : 
\"term1\",         \"marks\":86}      
]}","text":["John","Doe",8,"Maths",["term1","term1"],[90,86]]}]}
{code}

> Transforming and Indexing custom JSON data
> ------------------------------------------
>
>                 Key: SOLR-6304
>                 URL: https://issues.apache.org/jira/browse/SOLR-6304
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>             Fix For: 4.10, Trunk
>
>         Attachments: SOLR-6304.patch, SOLR-6304.patch
>
>
> example
> {noformat}
> curl 
> localhost:8983/update/json/docs?split=/batters/batter&f=recipeId:/id&f=recipeType:/type&f=id:/batters/batter/id&f=type:/batters/batter/type
>  -d '
> {
>               "id": "0001",
>               "type": "donut",
>               "name": "Cake",
>               "ppu": 0.55,
>               "batters": {
>                               "batter":
>                                       [
>                                               { "id": "1001", "type": 
> "Regular" },
>                                               { "id": "1002", "type": 
> "Chocolate" },
>                                               { "id": "1003", "type": 
> "Blueberry" },
>                                               { "id": "1004", "type": 
> "Devil's Food" }
>                                       ]
>                       }
> }'
> {noformat}
> should produce the following output docs
> {noformat}
> { "recipeId":"001", "recipeType":"donut", "id":"1001", "type":"Regular" }
> { "recipeId":"001", "recipeType":"donut", "id":"1002", "type":"Chocolate" }
> { "recipeId":"001", "recipeType":"donut", "id":"1003", "type":"Blueberry" }
> { "recipeId":"001", "recipeType":"donut", "id":"1004", "type":"Devil's food" }
> {noformat}
> the split param is the element in the tree where it should be split into 
> multiple docs. The 'f' are field name mappings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-6304) Transforming and Indexing custom JSON data

Reply via email to