[
https://issues.apache.org/jira/browse/ACCUMULO-209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165315#comment-13165315
]
Jim Klucar commented on ACCUMULO-209:
-------------------------------------
You're welcome. Is it ok just to patch the existing RegExFilterTest.java or
should I create a new test class?
> RegExFilter does not properly regex when using multi-byte characters
> --------------------------------------------------------------------
>
> Key: ACCUMULO-209
> URL: https://issues.apache.org/jira/browse/ACCUMULO-209
> Project: Accumulo
> Issue Type: Bug
> Components: client
> Affects Versions: 1.3.5
> Reporter: Jim Klucar
> Assignee: Jim Klucar
> Fix For: 1.4.0, 1.5.0
>
> Attachments: accumulo-209.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> The current RegExFilter class uses a ByteArrayBackedCharSequence to set the
> data to match against. The ByteArrayBackedCharSequence contains a line of
> code that prevents the matcher from properly matching multi-byte characters.
> Line 49 of ByteArrayBackedCharSequence.java is:
> return (char) (0xff & data[offset + index]);
>
> This incorrectly casts a single byte from the byte array to a char, which is
> 2 bytes in Java. This prevents the RegExFilter from properly performing
> Regular Expressions on multi-byte character encoded values.
> A patch for the RegExFilter.java file has been created and will be submitted.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira