[ 
https://issues.apache.org/jira/browse/PHOENIX-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated PHOENIX-1333:
--------------------------------------------
    Attachment: Phoenix-1333.patch

Patch that updates the schema for the SYSTEM.STATS table.
Changes the serialization of the guideposts.
The GUIDE_POSTS_COUNT is now added and the same count is also serialized as the 
first Vint bytes in the guideposts VARBINARY so that it can be used from the 
same cell while retrieving.

> Store statistics guideposts as VARBINARY
> ----------------------------------------
>
>                 Key: PHOENIX-1333
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1333
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: Phoenix-1333.patch
>
>
> There's a potential problem with storing the guideposts as a VARBINARY ARRAY, 
> as pointed out by PHOENIX-1329. We'd run into this issue if we're collecting 
> stats for a table with a trailing VARBINARY row key column if the value 
> contained embedded null bytes. Because of this, we're better off storing 
> guideposts as VARBINARY and serializing/deserializing in the following manner:
> <byte length as vint><bytes><byte length as vint><bytes>...
> We should also store as a separate KeyValue column the total number of 
> guideposts. So the schema of SYSTEM.STATS would look like this now instead:
> {code}
>     public static final String CREATE_STATS_TABLE_METADATA = 
>             "CREATE TABLE " + SYSTEM_CATALOG_SCHEMA + ".\"" + 
> SYSTEM_STATS_TABLE + "\"(\n" +
>             // PK columns
>             PHYSICAL_NAME  + " VARCHAR NOT NULL," +
>             COLUMN_FAMILY + " VARCHAR," +
>             REGION_NAME + " VARCHAR," +
>             GUIDE_POSTS  + " VARBINARY," +
>             GUIDE_POSTS_COUNT + " SMALLINT," +
>             MIN_KEY + " VARBINARY," + 
>             MAX_KEY + " VARBINARY," +
>             LAST_STATS_UPDATE_TIME+ " DATE, "+
>             "CONSTRAINT " + SYSTEM_TABLE_PK_NAME + " PRIMARY KEY ("
>             + PHYSICAL_NAME + ","
>             + COLUMN_FAMILY + ","+ REGION_NAME+"))\n" +
>             // TODO: should we support versioned stats?
>             // Install split policy to prevent a physical table's stats from 
> being split across regions.
>             HTableDescriptor.SPLIT_POLICY + "='" + 
> MetaDataSplitPolicy.class.getName() + "'\n";
> {code}
> Then the serialization code in StatisticsTable.addStats() would need to 
> change to populate the GUIDE_POSTS_COUNT and serialize the GUIDE_POSTS in the 
> new format.
> The deserialization code is isolated to StatisticsUtil.readStatisitics(). It 
> would need to read the GUIDE_POSTS_COUNT first for estimated sizing, and then 
> deserialize the GUIDE_POSTS in the new format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to