[ 
https://issues.apache.org/jira/browse/PHOENIX-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-1223:
----------------------------------
    Description: 
Since a VARBINARY can be any length with any bytes, we cannot know where it 
ends. Thus we only allow it at the end of the row key. With a BINARY, you're 
telling Phoenix how big it is, so it can occur anywhere in the PK constraint.

One way to solve this would be to the same way Orderly encodes a variable 
length blob [1]: 
{quote}Each encoded byte thereafter consists of a header bit followed by 7 bits 
of payload. A header bit of '1' indicates continuation of the encoding. A 
header bit of '0' indicates this byte contains the last of the payload.
{quote}

When encoding arrays of byte[]s Phoenix doesn't correctly encode the null-byte 
(0x00). Phoenix sees that as the terminating character for the element, but 
when you do something like org.apache.hadoop.hbase.util.Bytes.asBytes(int) it 
creates a byte[4] and sets bytes from the right to the left (so 1 would be 
converted to [0,0,0,1]), and then phoenix will see the leading 0-byte as the 
terminator the element and just return a null element

Instead, arrays of byte[]s need to include a length (probably prefix) so it 
knows how many bytes to read in. Its a bigger overhead than any other encoding 
type, but that may be the overhead if you want to do anything goes byte arrays. 

[1] 
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html#encodeBlobVar(org.apache.hadoop.hbase.util.PositionedByteRange,%20byte[],%20int,%20int,%20org.apache.hadoop.hbase.util.Order)


  was:
Since a VARBINARY can be any length with any bytes, we cannot know where it 
ends. Thus we only allow it at the end of the row key. With a BINARY, you're 
telling Phoenix how big it is, so it can occur anywhere in the PK constraint.

One way to solve this would be to the same way Orderly encodes a variable 
length blob [1]: 
{quote}Each encoded byte thereafter consists of a header bit followed by 7 bits 
of payload. A header bit of '1' indicates continuation of the encoding. A 
header bit of '0' indicates this byte contains the last of the payload.
{quote}

When encoding arrays of byte[]s Phoenix doesn't correctly encode the null-byte 
(0x00). Phoenix sees that as the terminating character for the element, but 
when you do something like org.apache.hadoop.hbase.util.Bytes.asBytes(int) it 
creates a byte[4] and sets bytes from the right to the left (so 1 would be 
converted to [0,0,0,1]), and then phoenix will see the leading 0-byte as the 
terminator the element and just return a null element

Instead, arrays of byte[]s need to include a length (probably prefix) so it 
knows how many bytes to read in. Its a bigger overhead than any other encoding 
type, but that may be the overhead if you want to do anything goes byte arrays. 


> Support VARBINARY declaration anywhere in the PK constraint
> -----------------------------------------------------------
>
>                 Key: PHOENIX-1223
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1223
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.1.0
>            Reporter: Jesse Yates
>
> Since a VARBINARY can be any length with any bytes, we cannot know where it 
> ends. Thus we only allow it at the end of the row key. With a BINARY, you're 
> telling Phoenix how big it is, so it can occur anywhere in the PK constraint.
> One way to solve this would be to the same way Orderly encodes a variable 
> length blob [1]: 
> {quote}Each encoded byte thereafter consists of a header bit followed by 7 
> bits of payload. A header bit of '1' indicates continuation of the encoding. 
> A header bit of '0' indicates this byte contains the last of the payload.
> {quote}
> When encoding arrays of byte[]s Phoenix doesn't correctly encode the 
> null-byte (0x00). Phoenix sees that as the terminating character for the 
> element, but when you do something like 
> org.apache.hadoop.hbase.util.Bytes.asBytes(int) it creates a byte[4] and sets 
> bytes from the right to the left (so 1 would be converted to [0,0,0,1]), and 
> then phoenix will see the leading 0-byte as the terminator the element and 
> just return a null element
> Instead, arrays of byte[]s need to include a length (probably prefix) so it 
> knows how many bytes to read in. Its a bigger overhead than any other 
> encoding type, but that may be the overhead if you want to do anything goes 
> byte arrays. 
> [1] 
> https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html#encodeBlobVar(org.apache.hadoop.hbase.util.PositionedByteRange,%20byte[],%20int,%20int,%20org.apache.hadoop.hbase.util.Order)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to