[ 
https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378757#comment-15378757
 ] 

Thomas D'Silva commented on PHOENIX-2565:
-----------------------------------------

[~enis]

I didn't add a design doc because we were planning on enabling this feature 
only for immutable tables and using the existing array serialization format, so 
the implementation seemed straightforward. 

All column values for a given column family are stored in a single KeyValue. A 
new StorageScheme (to be added as part of PHOENIX-1598) 
COLUMNS_STORED_IN_SINGLE_CELL is used to denote a table with columns stored in 
this format. Existing tables will have a StorageSchema of NON_ENCODED_COLUMN 
names and will work as before.  Once a table is stored with the 
COLUMNS_STORED_IN_SINGLE_CELL storage scheme you cannot transition a table 
to/from being immutable.

The existing serialization format used to store arrays (see PArrayDataType) 
will be used to serialize multiple columns into a single byte[]. An 
ArrayConstructor Expression will be constructed with the column values as 
LiteralExpressions and evaluated to generate the byte array.
A new column expression ArrayColumnExpression that stores the index at which 
the column is stored in the array will be used instead of KeyValueColumn 
expression. The getEncodedColumnQualifier() method of PColumn (to be added as 
part of PHOENIX-1598) will be used for the index. 

The remaining changes involved handling the new ArrayColumnExpression where 
previously we only used a KeyValueColumnExpression (for example in 
WhereCompiler.setScanFilter()). Currently when a column is deleted we don't 
remove the entry from the array as this would involve rewriting all KeyValues. 
We were thinking of investigating whether we could remove the deleted column 
values from the array during compaction.

[~jamestaylor] what do you think about allowing users to specify a subset of 
columns that are stored together in single KeyValue?

 


> Store data for immutable tables in single KeyValue
> --------------------------------------------------
>
>                 Key: PHOENIX-2565
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2565
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>             Fix For: 4.9.0
>
>         Attachments: PHOENIX-2565-wip.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never 
> update a column value, it'd be more efficient to store all column values for 
> a row in a single KeyValue. We could use the existing format we have for 
> variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, 
> you'd no longer be allowed to transition an existing table to/from being 
> immutable. I think the best approach would be to introduce a new IMMUTABLE 
> keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to