[
https://issues.apache.org/jira/browse/PHOENIX-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Taylor updated PHOENIX-1223:
----------------------------------
Description:
Since a VARBINARY can be any length with any bytes, we cannot know where it
ends. Thus we only allow it at the end of the row key. With a BINARY, you're
telling Phoenix how big it is, so it can occur anywhere in the PK constraint.
One way to solve this would be to the same way Orderly encodes a variable
length blob [1]:
{quote}Each encoded byte thereafter consists of a header bit followed by 7 bits
of payload. A header bit of '1' indicates continuation of the encoding. A
header bit of '0' indicates this byte contains the last of the payload.
{quote}
When encoding arrays of byte[]s Phoenix doesn't correctly encode the null-byte
(0x00). Phoenix sees that as the terminating character for the element, but
when you do something like org.apache.hadoop.hbase.util.Bytes.asBytes(int) it
creates a byte[4] and sets bytes from the right to the left (so 1 would be
converted to [0,0,0,1]), and then phoenix will see the leading 0-byte as the
terminator the element and just return a null element
Instead, arrays of byte[]s need to include a length (probably prefix) so it
knows how many bytes to read in. Its a bigger overhead than any other encoding
type, but that may be the overhead if you want to do anything goes byte arrays.
was:
When encoding arrays of byte[]s Phoenix doesn't correctly encode the null-byte
(0x00). Phoenix sees that as the terminating character for the element, but
when you do something like org.apache.hadoop.hbase.util.Bytes.asBytes(int) it
creates a byte[4] and sets bytes from the right to the left (so 1 would be
converted to [0,0,0,1]), and then phoenix will see the leading 0-byte as the
terminator the element and just return a null element
Instead, arrays of byte[]s need to include a length (probably prefix) so it
knows how many bytes to read in. Its a bigger overhead than any other encoding
type, but that may be the overhead if you want to do anything goes byte arrays.
> Support VARBINARY declaration anywhere in the PK constraint
> -----------------------------------------------------------
>
> Key: PHOENIX-1223
> URL: https://issues.apache.org/jira/browse/PHOENIX-1223
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.1.0
> Reporter: Jesse Yates
>
> Since a VARBINARY can be any length with any bytes, we cannot know where it
> ends. Thus we only allow it at the end of the row key. With a BINARY, you're
> telling Phoenix how big it is, so it can occur anywhere in the PK constraint.
> One way to solve this would be to the same way Orderly encodes a variable
> length blob [1]:
> {quote}Each encoded byte thereafter consists of a header bit followed by 7
> bits of payload. A header bit of '1' indicates continuation of the encoding.
> A header bit of '0' indicates this byte contains the last of the payload.
> {quote}
> When encoding arrays of byte[]s Phoenix doesn't correctly encode the
> null-byte (0x00). Phoenix sees that as the terminating character for the
> element, but when you do something like
> org.apache.hadoop.hbase.util.Bytes.asBytes(int) it creates a byte[4] and sets
> bytes from the right to the left (so 1 would be converted to [0,0,0,1]), and
> then phoenix will see the leading 0-byte as the terminator the element and
> just return a null element
> Instead, arrays of byte[]s need to include a length (probably prefix) so it
> knows how many bytes to read in. Its a bigger overhead than any other
> encoding type, but that may be the overhead if you want to do anything goes
> byte arrays.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)