[ 
https://issues.apache.org/jira/browse/SOLR-13242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766812#comment-16766812
 ] 

Edwin Yeo Zheng Lin edited comment on SOLR-13242 at 2/13/19 8:54 AM:
---------------------------------------------------------------------

We have tried to run it in a Java program by using the following code, and we 
can get the correct number of <br> in the output.

 

{{ public static void main(String[] args) {}}
{{      String str = "exalted \n \n\n Psalm 89:17 \n\n \n\n 3 Choa Chu Kang";}}
{{      str = str.replaceAll("(\\n\\s*)\{2,}", "<br><br>");}}
{{      System.out.println("str = " + str);}}
{{ }}}

 

 

*Output in Java (there are only 2 <br> for each set of \n, which is correct):*

str = exalted  <br><br>Psalm 89:17   <br><br>3 Choa Chu Kang

 

*This was the output that we get from Solr index (there are 4 <br> for the 
second set of \n):*

str = exalted  <br><br>Psalm 89:17   <br><br>  <br><br>3 Choa Chu Kang 


was (Author: edwinyeozl):
We have tried to run it in a Java program by using the following code, and we 
can get the correct number of <br> in the output.

 

      *public* *static* *void* main(String[] args)

{             String str = "exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 
Choa Chu Kang";             str = str.replaceAll("(\\n\\s*)

{2,}

", "<br><br>");

            System._out_.println("str = " + str);

      }

 

*Output in Java (there are only 2 <br> for each set of \n, which is correct):*

exalted  <br><br>Psalm 89:17   <br><br>3 Choa Chu Kang

 

*This was the output that we get from Solr index (there are 4 <br> for the 
second set of \n):*

exalted  <br><br>Psalm 89:17   <br><br>  <br><br>3 Choa Chu Kang 

> RegexReplaceProcessorFactory not making accurate replacement
> ------------------------------------------------------------
>
>                 Key: SOLR-13242
>                 URL: https://issues.apache.org/jira/browse/SOLR-13242
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.6
>            Reporter: Edwin Yeo Zheng Lin
>            Priority: Major
>              Labels: regex, solr
>
> We are using the RegexReplaceProcessorFactory with the following configuration
>  
>  <processor class="solr.RegexReplaceProcessorFactory">
>    <str name="fieldName">content</str>
>    <str name="pattern">(\s*\n)\{2,}</str>
>    <str name="replacement"><br><br></str>
>  </processor>
>  
> The regex pattern of (\s*\n)\{2,} and (\n\s*)\{2,} are working perfectly in 
> [regex101.com|http://regex101.com/], in which all the \n will be replaced by 
> only two <br>
> However, in Solr, there are cases (in Example 2 and 3 below) that has four 
> <br> in a row. This should not be the case, as we have already set it to 
> replace by two <br> regardless of how many \n are there in a row.
>  
>  
> Example 1: The sentence that the above regex pattern is working correctly 
> *Original content in EML [file:*|file://%2A/]  
> Dear Sir, 
>  
> I am terminating 
> *Original content:*    Dear Sir,  \n\n \n \n\n I am terminating
> *Index content:*     Dear Sir,  <br><br>I am terminating 
>  
> Example 2: The sentence that the above regex pattern is partially working (as 
> you can see, instead of 2 <br>, there are 4 <br>)
> *Original content in EML [file:*|file://%2A/]    
> _exalted_
> _Psalm 89:17_
>  
> 3 Choa Chu Kang Avenue 4    
> *Original content:* exalted  \n \n\n   Psalm 89:17   \n\n   \n\n  3 Choa Chu 
> Kang Avenue 4, Singapore
> *Index content:* exalted  <br><br>Psalm 89:17   <br><br>  <br><br>3 Choa Chu 
> Kang Avenue 4, Singapore
>  
> Example 3: The sentence that the above regex pattern is partially working (as 
> you can see, instead of 2 <br>, there are 4 <br>)
> *Original content in EML [file:*|file://%2A/]    
> [http://www.concordpri.moe.edu.sg/]
>  
>  
>  
>  
> On Tue, Dec 18, 2018 at 10:07 AM    
> *Original content:* [http://www.concordpri.moe.edu.sg/]   \n\n   \n\n \n \n\n 
> \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n \n\n\n \n\n\n  On Tue, Dec 18, 2018 
> at 10:07 AM 
> *Index content:* [http://www.concordpri.moe.edu.sg/]   <br><br>  <br><br>On 
> Tue, Dec 18, 2018 at 10:07 AM



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to