[
https://issues.apache.org/jira/browse/PHOENIX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336280#comment-16336280
]
Thomas D'Silva commented on PHOENIX-4550:
-----------------------------------------
[~jamestaylor]
Another way to deal with the sparsness of view data is to use a mapping from
column qualifier to array index. We could build this map when we create the
PTable of the view. This would allow us to have a compact array. For example if
the col qualifiers of the view cols are:
base table cols : 1-2
view cols : 100-102, 500
We could have a map from col qualifier to array index
1->1, 2->2, 100->3,101->4,102->5, 500->6
We would have to have to handle columns that are deleted by flagging them (we
already do this by using the LinkType.EXCLUDED_COLUMN for splittable system
catalog in PHOENIX-3534), so that we build the map in the PTable correctly. We
can actually drop the view column metadata when we rewrite the single cell
without the deleted column value during compaction.
> Allow declaration of max columns on base physical table
> -------------------------------------------------------
>
> Key: PHOENIX-4550
> URL: https://issues.apache.org/jira/browse/PHOENIX-4550
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
> Priority: Major
>
> By declaring the max number of columns on a base table, we can optimize the
> storage for SINGLE_CELL_ARRAY_WITH_OFFSETS by not storing null values for the
> columns preceding the initial column of a view. This will make a huge
> difference in storage when you have a base table with many views. For example:
> {code}
> -- Declare that the base table will have no more than 10 columns
> CREATE IMMUTABLE TABLE base (k1 VARCHAR, prefix CHAR(3) v1 DATE,
> CONSTRAINT pk PRIMARY KEY (k1, prefix))
> MULTI_TENANT = true,
> MAX_COLUMNS = 10;
> CREATE VIEW v1(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR)
> AS SELECT * FROM base WHERE prefix = 'A00';
> CREATE VIEW v2(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR);
> AS SELECT * FROM base WHERE prefix = 'A10';
> ...
> {code}
> As the number of views grow, the difference between the base table column
> encoding (column #1) and the starting column number of the view (since the
> starting offset is determined by an incrementing value on the base table)
> will increase. This bloats the storage as we need to store null values for
> column encodings between the base table column and the starting column of the
> view.
> Instead, we'll pass through the MAX_COLUMNS value for queries and anything
> column encoding less than this we know it'll be at the start. Anything
> greater and we'll start the search from <column encoding> - <minimum view
> column encoding>.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)