[jira] Updated: (HBASE-2323) filter.RegexStringComparator does not work with certain bytes

Benoit Sigoure (JIRA) Sun, 14 Mar 2010 03:11:54 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-2323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benoit Sigoure updated HBASE-2323:
----------------------------------

    Status: Patch Available  (was: Open)

The patches attached preserve backwards compatibility.

Only the last patch is potentially disruptive with backwards compatibility, but 
the extra {{if}} statement in {{readFields}} makes it work even with older 
clients that won't serialize the new charset attribute.

If the patch is included in 0.20.4, then we can remove the extra {{if}} 
statement as that release won't be backwards compatible with older clients 
anyway.

> filter.RegexStringComparator does not work with certain bytes
> -------------------------------------------------------------
>
>                 Key: HBASE-2323
>                 URL: https://issues.apache.org/jira/browse/HBASE-2323
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: filters
>    Affects Versions: 0.20.3
>            Reporter: Benoit Sigoure
>            Assignee: Benoit Sigoure
>         Attachments: 0001-HBASE-2323-Kill-some-trailing-whitespaces.patch, 
> 0002-HBASE-2323-Compile-the-pattern-with-DOTALL.patch, 
> 0003-HBASE-2323-Allow-the-client-to-specify-a-custom-char.patch
>
>
> I'm trying to use {{RegexStringComparator}} in conjunction with 
> {{RowFilter}}.  One of my row keys contained the byte 0xA, which turns out to 
> be the ASCII code for the newline character (\n).  When the row key is 
> converted to a string in order to use the regexp facility of the Java 
> standard library, it becomes a string containing two lines and my regexp does 
> not match.
> I believe the solution is to compile the regexp with the {{DOTALL}} flag.  
> Luckily, this flag can be "passed" by the client by prefixing the regexp with 
> {{(?s)}} so people working with an older version of HBase can work around 
> this issue without having to upgrade.
> Second problem: One of my row keys contained the sequence {{0x00 0x00 0x9D}} 
> ({{0x9D}} = -99 when stored in a Java {{byte}}) but in {{compareTo}} the row 
> key is transformed in a {{String}} using {{Bytes.toString}}, which just 
> assumes that the byte array is an UTF8 encoded string.  Java "cleverly" 
> substituted the 0x9D byte with 0x63 (character '?').  In my case, I want to 
> use encoding ISO-8859-1 as it preserves every byte when the byte array is 
> converted to a {{String}} and back to a byte array, unlike UTF-8 or ASCII.  
> Should we add a new method to {{RegexStringComparator}} to allow the user to 
> specify their own {{Charset}} instance?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2323) filter.RegexStringComparator does not work with certain bytes

Reply via email to