virajjasani commented on code in PR #2070: URL: https://github.com/apache/phoenix/pull/2070#discussion_r1942311616
########## phoenix-core-client/src/main/java/org/apache/phoenix/util/CDCUtil.java: ########## @@ -155,6 +162,73 @@ public static boolean isBinaryType(PDataType dataType) { || dataType.getSqlType() == PDataType.VARBINARY_ENCODED_TYPE); } + /** + * Return true if the parseNode or any of its children contains PARTITION_ID() function. + * + * @param parseNode The parseNode from Where clause. + * @return True if the parseNode or any of its children contains PARTITION_ID() + * function. False otherwise. + */ + public static boolean isPartitionIdIncludedInTree(ParseNode parseNode) { + if (parseNode instanceof PartitionIdParseNode) { + return true; + } + if (parseNode == null || CollectionUtils.isEmpty(parseNode.getChildren())) { + return false; + } + return parseNode.getChildren().stream() + .anyMatch(CDCUtil::isPartitionIdIncludedInTree); + } + + /** + * Add IN Operator for PARTITION_ID() so that the full table scan CDC query can be + * optimized to be range scan. + * + * @param conn The Connection. + * @param cdcName CDC Object name. + * @param query SQL Query Statement. + * @return Updated query including PartitionId with IN operator. + * @throws SQLException If the distinct partition ids retrieval fails. + */ + public static String addPartitionInList(final Connection conn, final String cdcName, + final String query) throws SQLException { + ResultSet rs = conn.createStatement().executeQuery("SELECT DISTINCT PARTITION_ID() FROM " + + cdcName); + List<String> partitionIds = new ArrayList<>(); + while (rs.next()) { + partitionIds.add(rs.getString(1)); + } + if (partitionIds.isEmpty()) { + return query; + } + StringBuilder builder; + boolean queryHasWhere = query.contains(" WHERE "); Review Comment: Yes I came across it but `WHERE PARTITION_ID() IN (SELECT DISTINCT PARTITION_ID() FROM ....)` does not work, it needs more involved change, it does not matter whether we add it to the parse tree or re-compile from statement. The reason why I liked recompiling statement within executeQuery() is because it allows one to execute sub-query, get the result and then recompile original statement. It would not be prudent to execute query as part of ParseNode creations. On the other hand, if things could work without us having to execute a query and use result for original query (e.g. having sub-query or using ORDER BY or additional WHERE clause expressions that use static ParseNode values), it makes more sense to make changes while generating nodes in QueryCompiler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org