[
https://issues.apache.org/jira/browse/SOLR-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man resolved SOLR-6097.
----------------------------
Resolution: Cannot Reproduce
Cannot Reproduce
Please post more details of your situation (inlcuding specifics on how exactly
you are adding your data to Solr) to a new thread on the solr-user mailing list.
In the event that more details about your usage helps uncover
reliable/reproducible steps to recreate the problem, we can re-open the issue
with an updated summary.
Using hte 4.7.2 example configs...
{noformat}
hossman@frisbee:~$ curl -X POST -H "Content-Type: application/json"
--data-binary '
{
"add" :
{
"commitWithin" : 5000,
"doc" :
{
"id" : "12345",
"body_s" : "a < b > c"
}
}
}
' http://localhost:8983/solr/collection1/update
{"responseHeader":{"status":0,"QTime":24}}
hossman@frisbee:~$ curl
'http://localhost:8983/solr/collection1/select?q=id:12345&wt=json&indent=true'
{
"responseHeader":{
"status":0,
"QTime":1,
"params":{
"indent":"true",
"q":"id:12345",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[
{
"id":"12345",
"body_s":"a < b > c",
"_version_":1468642762402299904}]
}}
{noformat}
> Posting JSON with < > results in lost information
> -------------------------------------------------
>
> Key: SOLR-6097
> URL: https://issues.apache.org/jira/browse/SOLR-6097
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.7.2
> Reporter: Kingston Duffie
>
> Post the following JSON to add a document:
> {
> "add" :
> {
> "commitWithin" : 5000,
> "doc" :
> {
> "id" : "12345",
> "body" : "a < b > c"
> }
> }
> }
> The body field is configured in the schema as:
> <field name="body" type="text_hive" indexed="true" stored="true"
> required="false" multiValued="false"/>
> and
> <fieldType name="text_hive" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> preserveOriginal="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="15" side="front"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> preserveOriginal="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> The problem is this: After submitting this post, if you go to the SOLR
> console and find this document, the stored body will be missing the contents
> between the less-than and greater-than symbols -- i.e., "a c".
> If you encode the body (i.e., "a < b > c"), it will show up with < and
> > symbols. That is, it appears that SOLR is stripping out HTML tags even
> though we are not asking it to.
> Note that it is not only the storage but also indexing that is affected (as
> we originally found the issue because searching for "b" would not match this
> document.
> I'm willing to believe that I'm doing something wrong, but I can't see
> anywhere in any spec that suggests that strings inside JSON need to be
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]