On 6 Apr 2017, at 14.58, [email protected] wrote:
> 
> Hi,
> 
> i'm trying to resolve few problems with indexing 'From' headers using 
> FTS/Solr. I was tcpdumping the communication between Dovecot and Jetty/Solr 
> and noticed that 'From' headers, which includes also sender's name, are 
> double escaped. This is what was Dovecot sending to Solr:
> 
> </field><field name="from">Name Surname 
> &amp;lt;[email protected]&amp;gt;</field></doc></add>
> 
> As you can see, characters < and > were escaped to &lt; and &gt; which were, 
> again, escaped to &amp;lt; and &amp;gt;. This is doing problems while trying 
> to index whole e-mail address, as Solr sees it as '&lt;[email protected]&gt;'.
> 
> I spend hours trying to figure out why i'm able to search in all parts of 
> e-mail addresses but searching for full and exact e-mail address was 
> successfull ONLY for messages which doesn't include sender's name in 'From' 
> header. Finally, after i found this bug, this fixed all search problems:
> 
> <filter class="solr.PatternReplaceFilterFactory" pattern="&amp;lt;" 
> replacement=""/>
> <filter class="solr.PatternReplaceFilterFactory" pattern="&amp;gt;" 
> replacement=""/>
> 
> I hope that, at least, this bug, reported by me, will be fixed. Thank you.

The attached patch should also help.

Attachment: solr.diff
Description: Binary data

Reply via email to