karanmehta93 commented on a change in pull request #419: PHOENIX-4009 Run
UPDATE STATISTICS command by using MR integration on…
URL: https://github.com/apache/phoenix/pull/419#discussion_r247224819
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/schema/stats/DefaultStatisticsCollector.java
##########
@@ -121,92 +120,121 @@
cachedGuidePosts = null;
}
}
-
- private void initGuidepostDepth() throws IOException,
ClassNotFoundException, SQLException {
- // First check is if guidepost info set on statement itself
+
+ @Override
+ public void init() throws IOException {
+ try {
+ initGuidepostDepth();
+ initStatsWriter();
+ } catch (SQLException e) {
+ throw new IOException(e);
+ }
+ LOG.info("Initialization complete for " +
+ this.getClass() + " statistics collector for table " +
tableName);
+ }
+
+ /**
+ * Determine the GPW for statistics collection for the table.
+ * The order of priority from highest to lowest is as follows
+ * 1. Value provided in UPDATE STATISTICS SQL statement (N/A for MR jobs)
+ * 2. GPW column in SYSTEM.CATALOG for the table is not null
+ * 3. Value from global configuration parameters from hbase-site.xml
+ */
+ private void initGuidepostDepth() throws IOException, SQLException {
if (guidePostPerRegionBytes != null || guidePostWidthBytes != null) {
- int guidepostPerRegion = 0;
- long guidepostWidth =
QueryServicesOptions.DEFAULT_STATS_GUIDEPOST_WIDTH_BYTES;
- if (guidePostPerRegionBytes != null) {
- guidepostPerRegion =
PInteger.INSTANCE.getCodec().decodeInt(guidePostPerRegionBytes, 0,
SortOrder.getDefault());
- }
- if (guidePostWidthBytes != null) {
- guidepostWidth =
PLong.INSTANCE.getCodec().decodeInt(guidePostWidthBytes, 0,
SortOrder.getDefault());
- }
- this.guidePostDepth =
StatisticsUtil.getGuidePostDepth(guidepostPerRegion, guidepostWidth,
- env.getRegion().getTableDesc());
+ getGuidePostDepthFromStatement();
+ LOG.info("Guide post depth determined from SQL statement: " +
guidePostDepth);
} else {
+ long guidepostWidth =
getGuidePostDepthFromSystemCatalog(getHTableForSystemCatalog());
+ if (guidepostWidth >= 0) {
Review comment:
Having it 0 disables the stats collection here. So it stats were already
present previously, this task will attempt to delete all of them from
SYSTEM.STATS table. Not reading up 0 as legal value would mean the fallback to
default value (which can potentially cause issue).
When a table is created, the `GUIDE_POSTS_WIDTH` column in `SYSTEM.CATALOG`
is null, so it always fall back to default global value (and sometimes
generates 1 GPW in the table). Having it to 0 explicitly disables it and hence
it is put up here.
I will add a comment here at the top.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services