You have to correctly escape your xml-like HTML inside the XML you send to SOLR (using <!CDATA[ … ]]> or via escaping with < > "). Otherwise Solr would be attackable using HTML-injection.
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen <http://www.thetaphi.de/> http://www.thetaphi.de eMail: [email protected] From: Kevin Cunningham [mailto:[email protected]] Sent: Tuesday, October 01, 2013 12:07 AM To: [email protected] Subject: No longer allowed to store html in a 'string' type We have been using Solr for a while now, went from 1.4 -> 3.6. While running some tests in 4.4 we are no longer allowed to store raw html in a documents field with a type of ‘string’, which we used to be able to do. Has something changed here? Now we get the following error: Undeclared general entity \"nbsp\"\r\n at [row,col {unknown-source}]: [11,53] I understand what its saying and can change the way we store and extract it if it’s a must but would like to understand what changed. Sounds like something just became more strict to adhering to rules. <doc> <str name="rawcontent"> <p>Testing <a href="/sample_group/b/sample_weblog/archive/tags/bananas/default.aspx" class="tag hash-tag" data-tags="bananas">#bananas</a> tag</p> <p></p> <p>document document document document document document</p><div style="clear:both;"></div> </str> <str name="type">blog</str> </doc>
