haridsv commented on code in PR #2070:
URL: https://github.com/apache/phoenix/pull/2070#discussion_r1941254568


##########
phoenix-core-client/src/main/java/org/apache/phoenix/util/CDCUtil.java:
##########
@@ -155,6 +162,73 @@ public static boolean isBinaryType(PDataType dataType) {
                 || dataType.getSqlType() == PDataType.VARBINARY_ENCODED_TYPE);
     }
 
+    /**
+     * Return true if the parseNode or any of its children contains 
PARTITION_ID() function.
+     *
+     * @param parseNode The parseNode from Where clause.
+     * @return True if the parseNode or any of its children contains 
PARTITION_ID()
+     * function. False otherwise.
+     */
+    public static boolean isPartitionIdIncludedInTree(ParseNode parseNode) {
+        if (parseNode instanceof PartitionIdParseNode) {
+            return true;
+        }
+        if (parseNode == null || 
CollectionUtils.isEmpty(parseNode.getChildren())) {
+            return false;
+        }
+        return parseNode.getChildren().stream()
+                .anyMatch(CDCUtil::isPartitionIdIncludedInTree);
+    }
+
+    /**
+     * Add IN Operator for PARTITION_ID() so that the full table scan CDC 
query can be
+     * optimized to be range scan.
+     *
+     * @param conn The Connection.
+     * @param cdcName CDC Object name.
+     * @param query SQL Query Statement.
+     * @return Updated query including PartitionId with IN operator.
+     * @throws SQLException If the distinct partition ids retrieval fails.
+     */
+    public static String addPartitionInList(final Connection conn, final 
String cdcName,
+                                            final String query) throws 
SQLException {
+        ResultSet rs = conn.createStatement().executeQuery("SELECT DISTINCT 
PARTITION_ID() FROM "
+                + cdcName);
+        List<String> partitionIds = new ArrayList<>();
+        while (rs.next()) {
+            partitionIds.add(rs.getString(1));
+        }
+        if (partitionIds.isEmpty()) {
+            return query;
+        }
+        StringBuilder builder;
+        boolean queryHasWhere = query.contains(" WHERE ");
+        if (queryHasWhere) {
+            builder = new StringBuilder(query);
+            builder.append(" AND PARTITION_ID() IN (");
+        } else {
+            builder = new StringBuilder(query.split(cdcName)[0]);
+            builder.append(cdcName);
+            builder.append(" WHERE PARTITION_ID() IN (");
+        }
+        boolean initialized = false;
+        for (String partitionId : partitionIds) {
+            if (!initialized) {
+                builder.append("'");
+                initialized = true;
+            } else {
+                builder.append(",'");
+            }
+            builder.append(partitionId);
+            builder.append("'");
+        }
+        builder.append(")");
+        if (!queryHasWhere) {
+            builder.append(query.split(cdcName)[1]);

Review Comment:
   This is very risky, the CDC name can in theory appear anywhere in the query, 
say a column name or a where clause.



##########
phoenix-core-client/src/main/java/org/apache/phoenix/util/CDCUtil.java:
##########
@@ -155,6 +162,73 @@ public static boolean isBinaryType(PDataType dataType) {
                 || dataType.getSqlType() == PDataType.VARBINARY_ENCODED_TYPE);
     }
 
+    /**
+     * Return true if the parseNode or any of its children contains 
PARTITION_ID() function.
+     *
+     * @param parseNode The parseNode from Where clause.
+     * @return True if the parseNode or any of its children contains 
PARTITION_ID()
+     * function. False otherwise.
+     */
+    public static boolean isPartitionIdIncludedInTree(ParseNode parseNode) {
+        if (parseNode instanceof PartitionIdParseNode) {
+            return true;
+        }
+        if (parseNode == null || 
CollectionUtils.isEmpty(parseNode.getChildren())) {
+            return false;
+        }
+        return parseNode.getChildren().stream()
+                .anyMatch(CDCUtil::isPartitionIdIncludedInTree);
+    }
+
+    /**
+     * Add IN Operator for PARTITION_ID() so that the full table scan CDC 
query can be
+     * optimized to be range scan.
+     *
+     * @param conn The Connection.
+     * @param cdcName CDC Object name.
+     * @param query SQL Query Statement.
+     * @return Updated query including PartitionId with IN operator.
+     * @throws SQLException If the distinct partition ids retrieval fails.
+     */
+    public static String addPartitionInList(final Connection conn, final 
String cdcName,
+                                            final String query) throws 
SQLException {
+        ResultSet rs = conn.createStatement().executeQuery("SELECT DISTINCT 
PARTITION_ID() FROM "
+                + cdcName);
+        List<String> partitionIds = new ArrayList<>();
+        while (rs.next()) {
+            partitionIds.add(rs.getString(1));
+        }
+        if (partitionIds.isEmpty()) {
+            return query;
+        }
+        StringBuilder builder;
+        boolean queryHasWhere = query.contains(" WHERE ");

Review Comment:
   Technically, a tab or a newline can also be there around `WHERE` so you need 
to check for whitespace, not just space.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to