[jira] [Resolved] (SOLR-6097) Posting JSON with < > results in lost information

Hoss Man (JIRA) Tue, 20 May 2014 10:32:59 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man resolved SOLR-6097.
----------------------------

    Resolution: Cannot Reproduce

Cannot Reproduce

Please post more details of your situation (inlcuding specifics on how exactly 
you are adding your data to Solr) to a new thread on the solr-user mailing list.

In the event that more details about your usage helps uncover 
reliable/reproducible steps to recreate the problem, we can re-open the issue 
with an updated summary.

Using hte 4.7.2 example configs...
{noformat}
hossman@frisbee:~$ curl -X POST -H "Content-Type: application/json" 
--data-binary '
{ 
    "add" : 
       { 
           "commitWithin" : 5000,
           "doc" : 
               {  
                   "id" : "12345",
                   "body_s" : "a < b > c"
               }
        }
}
' http://localhost:8983/solr/collection1/update
{"responseHeader":{"status":0,"QTime":24}}
hossman@frisbee:~$ curl 
'http://localhost:8983/solr/collection1/select?q=id:12345&wt=json&indent=true'
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "indent":"true",
      "q":"id:12345",
      "wt":"json"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"12345",
        "body_s":"a < b > c",
        "_version_":1468642762402299904}]
  }}
{noformat}

> Posting JSON with < > results in lost information
> -------------------------------------------------
>
>                 Key: SOLR-6097
>                 URL: https://issues.apache.org/jira/browse/SOLR-6097
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.7.2
>            Reporter: Kingston Duffie
>
> Post the following JSON to add a document:
> { 
>     "add" : 
>        { 
>            "commitWithin" : 5000,
>            "doc" : 
>                {  
>                    "id" : "12345",
>                    "body" : "a < b > c"
>                }
>         }
> }
> The body field is configured in the schema as:
>    <field name="body" type="text_hive" indexed="true" stored="true" 
> required="false" multiValued="false"/>
> and
>     <fieldType name="text_hive" class="solr.TextField" 
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" 
> preserveOriginal="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>               <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" 
> maxGramSize="15" side="front"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>               <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" 
> preserveOriginal="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
> The problem is this:  After submitting this post, if you go to the SOLR 
> console and find this document, the stored body will be missing the contents 
> between the less-than and greater-than symbols -- i.e., "a c".  
> If you encode the body (i.e.,  "a &lt; b &gt; c"), it will show up with < and 
> > symbols.  That is, it appears that SOLR is stripping out HTML tags even 
> though we are not asking it to.
> Note that it is not only the storage but also indexing that is affected (as 
> we originally found the issue because searching for "b" would not match this 
> document.
> I'm willing to believe that I'm doing something wrong, but I can't see 
> anywhere in any spec that suggests that strings inside JSON need to be 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SOLR-6097) Posting JSON with < > results in lost information

Reply via email to