[
https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378757#comment-15378757
]
Thomas D'Silva commented on PHOENIX-2565:
-----------------------------------------
[~enis]
I didn't add a design doc because we were planning on enabling this feature
only for immutable tables and using the existing array serialization format, so
the implementation seemed straightforward.
All column values for a given column family are stored in a single KeyValue. A
new StorageScheme (to be added as part of PHOENIX-1598)
COLUMNS_STORED_IN_SINGLE_CELL is used to denote a table with columns stored in
this format. Existing tables will have a StorageSchema of NON_ENCODED_COLUMN
names and will work as before. Once a table is stored with the
COLUMNS_STORED_IN_SINGLE_CELL storage scheme you cannot transition a table
to/from being immutable.
The existing serialization format used to store arrays (see PArrayDataType)
will be used to serialize multiple columns into a single byte[]. An
ArrayConstructor Expression will be constructed with the column values as
LiteralExpressions and evaluated to generate the byte array.
A new column expression ArrayColumnExpression that stores the index at which
the column is stored in the array will be used instead of KeyValueColumn
expression. The getEncodedColumnQualifier() method of PColumn (to be added as
part of PHOENIX-1598) will be used for the index.
The remaining changes involved handling the new ArrayColumnExpression where
previously we only used a KeyValueColumnExpression (for example in
WhereCompiler.setScanFilter()). Currently when a column is deleted we don't
remove the entry from the array as this would involve rewriting all KeyValues.
We were thinking of investigating whether we could remove the deleted column
values from the array during compaction.
[~jamestaylor] what do you think about allowing users to specify a subset of
columns that are stored together in single KeyValue?
> Store data for immutable tables in single KeyValue
> --------------------------------------------------
>
> Key: PHOENIX-2565
> URL: https://issues.apache.org/jira/browse/PHOENIX-2565
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
> Assignee: Thomas D'Silva
> Fix For: 4.9.0
>
> Attachments: PHOENIX-2565-wip.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never
> update a column value, it'd be more efficient to store all column values for
> a row in a single KeyValue. We could use the existing format we have for
> variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also,
> you'd no longer be allowed to transition an existing table to/from being
> immutable. I think the best approach would be to introduce a new IMMUTABLE
> keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)