[
https://issues.apache.org/jira/browse/SOLR-6097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003845#comment-14003845
]
Stefan Matheis (steffkes) commented on SOLR-6097:
-------------------------------------------------
you didn't link those issues .. but since i saw SOLR-6098 after that one and
already commented on that .. i guess they are related?
> Posting JSON with < > results in lost information
> -------------------------------------------------
>
> Key: SOLR-6097
> URL: https://issues.apache.org/jira/browse/SOLR-6097
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.7.2
> Reporter: Kingston Duffie
>
> Post the following JSON to add a document:
> {
> "add" :
> {
> "commitWithin" : 5000,
> "doc" :
> {
> "id" : "12345",
> "body" : "a < b > c"
> }
> }
> }
> The body field is configured in the schema as:
> <field name="body" type="text_hive" indexed="true" stored="true"
> required="false" multiValued="false"/>
> and
> <fieldType name="text_hive" class="solr.TextField"
> positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> preserveOriginal="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
> maxGramSize="15" side="front"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"
> preserveOriginal="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> </fieldType>
> The problem is this: After submitting this post, if you go to the SOLR
> console and find this document, the stored body will be missing the contents
> between the less-than and greater-than symbols -- i.e., "a c".
> If you encode the body (i.e., "a < b > c"), it will show up with < and
> > symbols. That is, it appears that SOLR is stripping out HTML tags even
> though we are not asking it to.
> Note that it is not only the storage but also indexing that is affected (as
> we originally found the issue because searching for "b" would not match this
> document.
> I'm willing to believe that I'm doing something wrong, but I can't see
> anywhere in any spec that suggests that strings inside JSON need to be
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]