[ 
https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799459#comment-15799459
 ] 

Enis Soztutar commented on PHOENIX-2565:
----------------------------------------

>From the experience of trying to use this for billions of rows and hundreds of 
>columns (where the schema is a regular RDBMS one), there are a couple of 
>problems that the array encoding has in terms of packing data efficiently. 
 - Array encoding uses all three of separators, and offsets / lengths, as well 
as nullability encoding. This means that there is a lot of unnecessary overhead 
for representing repetitive information. 
 - Run-length encoding-like null representation gets really expensive, if you 
have data like {{a, <null>, b, <null>, c, <null>}}. A simple bitset is easier 
and more efficien. Or, if you are already encoding the offsets, you do not have 
to re-encode nullability. If offset_i and offset_i+1 are equal, the field is 
null.   
 - The offsets are 4 or 2 bytes fixed length, not using varint encoding. This 
makes a difference for majority of data where expected num columns is <128. 

I think array encoding is this way because arrays can be part of the row key. 
However, for packing column values, we do not need the lexicographic sortable 
guarantee, meaning that we can do a way better job than the array encoding. The 
way forward for this I think is to leave the array encoding as it is, but 
instead do a PStructDataType that implements the new scheme. 

This is the exact problem that avro / PB and Thrift encodings solve already. 
However, the requirements are a little different for phoenix. 
 - First, we have to figure out how we are gonna deal with schema evolution. 
 - We need efficient way to access individual fields within the byte array 
without deserializing the whole byte[] (although notice that it is already read 
from disk and in-memory).
 - Nullability support. 
Looking at this, I think something like Flatbuffers / Capn proto looks more 
like the direction (especially with the requirement that we do not want to 
deserialize the whole thing). 

If we want to do a custom format with the given encodings, I think we can do 
something like this: 
{code}
<format_id><column_1><column_2>...<column_n> 
<offset_1><offset_2><offset_3><offset_start>
{code}
where 
 - {{format_id}}       : single byte showing the format of the data, 
 - {{column_n}}      : column data, NO separators 
 - {{offset_n}}         : byte offset of the nth column. It can be varint, if 
we can cache this data. Otherwise, can make this 1/2/4 bytes and encode that 
information at the tail. 
 - {{offset_start}}    : this is the offset of <offset_1>. The reader can find 
and cache how many columns are there in the encoded data by reading all of the 
offsets. Notice that we can only add columns to an existing table, and the 
schema is still in the catalog table. Columns not used anymore are always null. 
To read a column, you would find the offset of the column, and the length would 
be {{offset_n+1}} - {{offset_n}}. If a column is null, it is always encoded as 
0 bytes, and {{offset_n+1}} would be equal to {{offset_n}}. 




 




> Store data for immutable tables in single KeyValue
> --------------------------------------------------
>
>                 Key: PHOENIX-2565
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2565
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>         Attachments: PHOENIX-2565-v2.patch, PHOENIX-2565-wip.patch, 
> PHOENIX-2565.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never 
> update a column value, it'd be more efficient to store all column values for 
> a row in a single KeyValue. We could use the existing format we have for 
> variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, 
> you'd no longer be allowed to transition an existing table to/from being 
> immutable. I think the best approach would be to introduce a new IMMUTABLE 
> keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to