karanmehta93 commented on a change in pull request #419: PHOENIX-4009 Run
UPDATE STATISTICS command by using MR integration on…
URL: https://github.com/apache/phoenix/pull/419#discussion_r246261373
##########
File path:
phoenix-core/src/main/java/org/apache/phoenix/mapreduce/PhoenixInputFormat.java
##########
@@ -180,23 +179,39 @@ private QueryPlan getQueryPlan(final JobContext context,
final Configuration con
try (final Connection connection =
ConnectionUtil.getInputConnection(configuration, overridingProps);
final Statement statement = connection.createStatement()) {
- final String selectStatement =
PhoenixConfigurationUtil.getSelectStatement(configuration);
+ SchemaType schemaType =
PhoenixConfigurationUtil.getSchemaType(configuration);
+
+ String selectStatement;
+ switch (schemaType) {
+ case UPDATE_STATS:
+ // This select statement indicates MR job for full table
scan for stats collection
+ selectStatement = "SELECT * FROM " +
PhoenixConfigurationUtil.getInputTableName(configuration);
+ break;
+ default:
+ selectStatement =
PhoenixConfigurationUtil.getSelectStatement(configuration);
+ }
Preconditions.checkNotNull(selectStatement);
final PhoenixStatement pstmt =
statement.unwrap(PhoenixStatement.class);
// Optimize the query plan so that we potentially use secondary
indexes
final QueryPlan queryPlan = pstmt.optimizeQuery(selectStatement);
final Scan scan = queryPlan.getContext().getScan();
+
+ if (schemaType == SchemaType.UPDATE_STATS) {
+ StatisticsUtil.setScanAttributes(scan, null);
Review comment:
`scan.setAttribute(RUN_UPDATE_STATS_ASYNC_ATTRIB, FALSE_BYTES);` attribute
is applicable fo case when stats are being collected using `UPDATE STATISTICS
SQL`. If `false`, the client has to wait till it runs on all the machines. If
`true`, the client is given an ACK back instantly and RS continues in the
background. Hence it is not applicable here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services