[
https://issues.apache.org/jira/browse/PHOENIX-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163167#comment-14163167
]
James Taylor commented on PHOENIX-1333:
---------------------------------------
One more minor, but important addition to the SYSTEM.STATS schema: include a
column that captures the value of guidepost width (i.e.
phoenix.stats.guidepost.width) as a BIGINT. Probably easiest to capture this
per region like we're doing with the other values. Make sure this get
serialized into the PStats and makes it's way into the PTable and PColumnFamily
as well. The reason is that the config value may change, but it's important
that we capture what it was when we ran the stats (so we can use it for
costing).
> Store statistics guideposts as VARBINARY
> ----------------------------------------
>
> Key: PHOENIX-1333
> URL: https://issues.apache.org/jira/browse/PHOENIX-1333
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
>
> There's a potential problem with storing the guideposts as a VARBINARY ARRAY,
> as pointed out by PHOENIX-1329. We'd run into this issue if we're collecting
> stats for a table with a trailing VARBINARY row key column if the value
> contained embedded null bytes. Because of this, we're better off storing
> guideposts as VARBINARY and serializing/deserializing in the following manner:
> <byte length as vint><bytes><byte length as vint><bytes>...
> We should also store as a separate KeyValue column the total number of
> guideposts. So the schema of SYSTEM.STATS would look like this now instead:
> {code}
> public static final String CREATE_STATS_TABLE_METADATA =
> "CREATE TABLE " + SYSTEM_CATALOG_SCHEMA + ".\"" +
> SYSTEM_STATS_TABLE + "\"(\n" +
> // PK columns
> PHYSICAL_NAME + " VARCHAR NOT NULL," +
> COLUMN_FAMILY + " VARCHAR," +
> REGION_NAME + " VARCHAR," +
> GUIDE_POSTS + " VARBINARY," +
> GUIDE_POSTS_COUNT + " SMALLINT," +
> MIN_KEY + " VARBINARY," +
> MAX_KEY + " VARBINARY," +
> LAST_STATS_UPDATE_TIME+ " DATE, "+
> "CONSTRAINT " + SYSTEM_TABLE_PK_NAME + " PRIMARY KEY ("
> + PHYSICAL_NAME + ","
> + COLUMN_FAMILY + ","+ REGION_NAME+"))\n" +
> // TODO: should we support versioned stats?
> // Install split policy to prevent a physical table's stats from
> being split across regions.
> HTableDescriptor.SPLIT_POLICY + "='" +
> MetaDataSplitPolicy.class.getName() + "'\n";
> {code}
> Then the serialization code in StatisticsTable.addStats() would need to
> change to populate the GUIDE_POSTS_COUNT and serialize the GUIDE_POSTS in the
> new format.
> The deserialization code is isolated to StatisticsUtil.readStatisitics(). It
> would need to read the GUIDE_POSTS_COUNT first for estimated sizing, and then
> deserialize the GUIDE_POSTS in the new format.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)