[ https://issues.apache.org/jira/browse/PHOENIX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336280#comment-16336280 ]
Thomas D'Silva commented on PHOENIX-4550: ----------------------------------------- [~jamestaylor] Another way to deal with the sparsness of view data is to use a mapping from column qualifier to array index. We could build this map when we create the PTable of the view. This would allow us to have a compact array. For example if the col qualifiers of the view cols are: base table cols : 1-2 view cols : 100-102, 500 We could have a map from col qualifier to array index 1->1, 2->2, 100->3,101->4,102->5, 500->6 We would have to have to handle columns that are deleted by flagging them (we already do this by using the LinkType.EXCLUDED_COLUMN for splittable system catalog in PHOENIX-3534), so that we build the map in the PTable correctly. We can actually drop the view column metadata when we rewrite the single cell without the deleted column value during compaction. > Allow declaration of max columns on base physical table > ------------------------------------------------------- > > Key: PHOENIX-4550 > URL: https://issues.apache.org/jira/browse/PHOENIX-4550 > Project: Phoenix > Issue Type: Improvement > Reporter: James Taylor > Priority: Major > > By declaring the max number of columns on a base table, we can optimize the > storage for SINGLE_CELL_ARRAY_WITH_OFFSETS by not storing null values for the > columns preceding the initial column of a view. This will make a huge > difference in storage when you have a base table with many views. For example: > {code} > -- Declare that the base table will have no more than 10 columns > CREATE IMMUTABLE TABLE base (k1 VARCHAR, prefix CHAR(3) v1 DATE, > CONSTRAINT pk PRIMARY KEY (k1, prefix)) > MULTI_TENANT = true, > MAX_COLUMNS = 10; > CREATE VIEW v1(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR) > AS SELECT * FROM base WHERE prefix = 'A00'; > CREATE VIEW v2(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR); > AS SELECT * FROM base WHERE prefix = 'A10'; > ... > {code} > As the number of views grow, the difference between the base table column > encoding (column #1) and the starting column number of the view (since the > starting offset is determined by an incrementing value on the base table) > will increase. This bloats the storage as we need to store null values for > column encodings between the base table column and the starting column of the > view. > Instead, we'll pass through the MAX_COLUMNS value for queries and anything > column encoding less than this we know it'll be at the start. Anything > greater and we'll start the search from <column encoding> - <minimum view > column encoding>. -- This message was sent by Atlassian JIRA (v7.6.3#76005)