[jira] [Updated] (PHOENIX-4550) Declare maximum columns to ensure storage is dense when table has many views

James Taylor (JIRA) Sun, 11 Feb 2018 09:35:32 -0800

     [ 
https://issues.apache.org/jira/browse/PHOENIX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


James Taylor updated PHOENIX-4550:
----------------------------------
    Description: 
By declaring the max number of columns on a base table, we can optimize the 
storage for SINGLE_CELL_ARRAY_WITH_OFFSETS by not storing null values for the 
columns preceding the initial column of a view. This will make a huge 
difference in storage when you have a base table with many views. For example:

{code}
-- Declare that the base table will have no more than 10 columns
CREATE IMMUTABLE TABLE base (k1 VARCHAR, prefix CHAR(3) v1 DATE,
    CONSTRAINT pk PRIMARY KEY (k1, prefix))
    MULTI_TENANT = true,
    MAX_COLUMNS = 10;
CREATE VIEW v1(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR)
    AS SELECT * FROM base WHERE prefix = 'A00';
CREATE VIEW v2(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR);
    AS SELECT * FROM base WHERE prefix = 'A10';
...
{code}

As the number of views grow, the difference between the base table column 
encoding (column #1) and the starting column number of the view (since the 
starting offset is determined by an incrementing value on the base table) will 
increase. This bloats the storage as we need to store null values for column 
encodings between the base table column and the starting column of the view.

Instead, we'll pass through the MAX_COLUMNS value for queries and anything 
column encoding less than this we know it'll be at the start. Anything greater 
and we'll start the search from <column encoding> - <minimum view column 
encoding>.

The downside of this approach is if you run out of columns in the base table, 
you're stuck. A more flexible, but more difficult approach is outlined in 
PHOENIX-4596.

  was:
By declaring the max number of columns on a base table, we can optimize the 
storage for SINGLE_CELL_ARRAY_WITH_OFFSETS by not storing null values for the 
columns preceding the initial column of a view. This will make a huge 
difference in storage when you have a base table with many views. For example:

{code}
-- Declare that the base table will have no more than 10 columns
CREATE IMMUTABLE TABLE base (k1 VARCHAR, prefix CHAR(3) v1 DATE,
    CONSTRAINT pk PRIMARY KEY (k1, prefix))
    MULTI_TENANT = true,
    MAX_COLUMNS = 10;
CREATE VIEW v1(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR)
    AS SELECT * FROM base WHERE prefix = 'A00';
CREATE VIEW v2(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR);
    AS SELECT * FROM base WHERE prefix = 'A10';
...
{code}

As the number of views grow, the difference between the base table column 
encoding (column #1) and the starting column number of the view (since the 
starting offset is determined by an incrementing value on the base table) will 
increase. This bloats the storage as we need to store null values for column 
encodings between the base table column and the starting column of the view.

Instead, we'll pass through the MAX_COLUMNS value for queries and anything 
column encoding less than this we know it'll be at the start. Anything greater 
and we'll start the search from <column encoding> - <minimum view column 
encoding>.


> Declare maximum columns to ensure storage is dense when table has many views
> ----------------------------------------------------------------------------
>
>                 Key: PHOENIX-4550
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4550
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Priority: Major
>
> By declaring the max number of columns on a base table, we can optimize the 
> storage for SINGLE_CELL_ARRAY_WITH_OFFSETS by not storing null values for the 
> columns preceding the initial column of a view. This will make a huge 
> difference in storage when you have a base table with many views. For example:
> {code}
> -- Declare that the base table will have no more than 10 columns
> CREATE IMMUTABLE TABLE base (k1 VARCHAR, prefix CHAR(3) v1 DATE,
>     CONSTRAINT pk PRIMARY KEY (k1, prefix))
>     MULTI_TENANT = true,
>     MAX_COLUMNS = 10;
> CREATE VIEW v1(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR)
>     AS SELECT * FROM base WHERE prefix = 'A00';
> CREATE VIEW v2(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR);
>     AS SELECT * FROM base WHERE prefix = 'A10';
> ...
> {code}
> As the number of views grow, the difference between the base table column 
> encoding (column #1) and the starting column number of the view (since the 
> starting offset is determined by an incrementing value on the base table) 
> will increase. This bloats the storage as we need to store null values for 
> column encodings between the base table column and the starting column of the 
> view.
> Instead, we'll pass through the MAX_COLUMNS value for queries and anything 
> column encoding less than this we know it'll be at the start. Anything 
> greater and we'll start the search from <column encoding> - <minimum view 
> column encoding>.
> The downside of this approach is if you run out of columns in the base table, 
> you're stuck. A more flexible, but more difficult approach is outlined in 
> PHOENIX-4596.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4550) Declare maximum columns to ensure storage is dense when table has many views

Reply via email to