Re: [PR] HIVE-28197: Add deserializer to convert JSON plans to RelNodes [hive]

via GitHub Fri, 26 Sep 2025 10:23:52 -0700


zabetak commented on code in PR #6067:
URL: https://github.com/apache/hive/pull/6067#discussion_r2377941683



##########
ql/src/java/org/apache/hadoop/hive/ql/metadata/NotNullConstraint.java:
##########
@@ -56,7 +56,7 @@ public NotNullConstraint(List<SQLNotNullConstraint> nns, 
String tableName, Strin
         String enable = pk.isEnable_cstr()? "ENABLE": "DISABLE";
         String validate = pk.isValidate_cstr()? "VALIDATE": "NOVALIDATE";
         String rely = pk.isRely_cstr()? "RELY": "NORELY";
-        enableValidateRely.put(pk.getNn_name(), ImmutableList.of(enable, 
validate, rely));

Review Comment:
   Why is this change necessary?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveMultiJoin.java:
##########
@@ -113,6 +115,18 @@ public HiveMultiJoin(
     this(cluster, Lists.newArrayList(inputs), condition, rowType, joinInputs, 
joinTypes, filters, null);
   }
 
+  public HiveMultiJoin(RelInput input) {
+    this(
+        input.getCluster(),
+        input.getInputs(),
+        input.getExpression("condition"),
+        input.getRowType("rowType"),
+        (List<Pair<Integer, Integer>>) 
input.get("getJoinInputsForHiveMultiJoin"),
+        (List<JoinRelType>) input.get("getJoinTypesForHiveMultiJoin"),
+        input.getExpressionList("filters")
+    );
+  }
+

Review Comment:
   Why do we to modify this class? Normally we shouldn't need to 
serialize/deserialize `MultiJoin` expressions cause they never appear in the 
final plan.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java:
##########
@@ -414,20 +415,14 @@ private Double computeConjunctionSelectivity(RexCall 
call) {
    * @return
    */
   private long getMaxNulls(RexCall call, HiveTableScan t) {

Review Comment:
   Why are we changing the selectivity estimator?



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java:
##########
@@ -516,34 +507,7 @@ public RelNode genLogicalPlan(ASTNode ast) throws 
SemanticException {
   }
 
   public static RelOptPlanner createPlanner(HiveConf conf) {

Review Comment:
   It was already possible to create a planner through this method. Why did we 
move the entire logic in `HiveRelOptUtil`?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveSortExchange.java:
##########
@@ -87,6 +105,28 @@ public static HiveSortExchange create(RelNode input,
     return new HiveSortExchange(cluster, traitSet, input, distribution, 
collation, builder.build());
   }
 
+  private static ImmutableList<RexNode> getKeys(RelInput input) {
+    RelCollation collation = 
RelCollationTraitDef.INSTANCE.canonize(input.getCollation());
+    RelTraitSet traitSet = getTraitSet(input.getCluster(), collation, 
input.getDistribution());
+    RelCollation canonizedCollation = 
traitSet.canonize(RelCollationImpl.of(collation.getFieldCollations()));
+    ImmutableList.Builder<RexNode> builder = ImmutableList.builder();
+    for (RelFieldCollation relFieldCollation : 
canonizedCollation.getFieldCollations()) {
+      int index = relFieldCollation.getFieldIndex();
+      
builder.add(input.getCluster().getRexBuilder().makeInputRef(input.getInput(), 
index));
+    }
+
+    return builder.build();
+  }
+
+  private static RelDistribution getDistribution(RelInput input) {
+    RelDistribution result = input.getDistribution();
+    if (RelDistribution.Type.ANY == result.getType()) {

Review Comment:
   The `RelDistribution.Type.ANY` does not imply that distribution keys are 
empty. I am not sure what is the method trying to do here.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java:
##########
@@ -326,10 +355,23 @@ public boolean isInsideView() {
     return insideView;
   }
 
+  private boolean isMaterializedTable() {
+    return ((RelOptHiveTable) table).getHiveTableMD().isMaterializedTable();
+  }
+
   public HiveTableScanTrait getTableScanTrait() {
     return tableScanTrait;
   }
 
+  private static HiveTableScanTrait createTableScanTrait(RelInput input) {
+    String enumName = input.getString("tableScanTrait");
+    if (enumName == null) {
+      return null;
+    }
+
+    return HiveRelEnumTypes.toEnum(enumName);

Review Comment:
   The use of `HiveRelEnumTypes` seems a bit of an overkill. Can't we simply 
create the instance directly and drop the entire `RelEnumTypes` copy?
   ```suggestion
       return HiveTableScanTrait.valueOf(enumName);
   ```
   



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:
##########
@@ -1079,15 +1106,110 @@ public static Pair<RelOptTable, List<Integer>> 
getColumnOriginSet(RelNode rel, I
    * to parse the string back.
    */
   public static String toJsonString(final RelNode rel) {
+    return serializeWithPlanWriter(rel, new HiveRelJsonImplWithStats());
+  }
+
+  private static String serializeWithPlanWriter(RelNode rel, HiveRelJsonImpl 
planWriter) {
     if (rel == null) {
       return null;
     }
-
-    final HiveRelJsonImpl planWriter = new HiveRelJsonImpl();
     rel.explain(planWriter);
     return planWriter.asString();
   }
 
+  public static Optional<String> serializeToJSON(RelNode plan) {
+    if (!isSerializable(plan)) {
+      return Optional.empty();
+    }
+
+    JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>());
+    outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new 
HiveRelJsonImpl()));

Review Comment:
   I don't think we need the extra wrapping attribute for "CBOPlan".



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/translator/RexNodeConverter.java:
##########
@@ -699,15 +699,23 @@ protected RexNode convert(ExprNodeConstantDesc literal) 
throws CalciteSemanticEx
       calciteLiteral = rexBuilder.makeExactLiteral((BigDecimal) value, 
calciteDataType);
       break;
     case FLOAT:
-      calciteLiteral = rexBuilder.makeApproxLiteral(
+      // We mostly convert FLOAT to exact literal everywhere else, except when
+      // there is a suffix 'F'. See 
TypeCheckProcFactory.NumExprProcessor#process
+      // Converting to an exact literal helps with deserialization by avoiding
+      // scientific notation.
+      calciteLiteral = rexBuilder.makeExactLiteral(

Review Comment:
   I am reluctant to change the type of literals just for the sake of 
serialization/deserialization. Is there another way to solve the problem? Even 
if it makes sense to change literal types it is a risky change that should be 
tested and land on its own.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveRelNode.java:
##########
@@ -20,8 +20,19 @@
 import org.apache.calcite.plan.Convention;
 import org.apache.calcite.rel.RelNode;
 
+import java.util.stream.Stream;
+
 public interface HiveRelNode extends RelNode {
 
   /** Calling convention for relational operations that occur in Hive. */
   Convention CONVENTION = new Convention.Impl("HIVE", HiveRelNode.class);
+
+  static Stream<RelNode> stream(RelNode node) {
+    return Stream.concat(
+        Stream.of(node),
+        node.getInputs()
+            .stream()
+            .flatMap(HiveRelNode::stream)
+    );
+  }

Review Comment:
   If we keep this we should add appropriate Javadoc. In addition, putting 
static methods in interfaces is not a good pattern; it is better to move it to 
a utility class.
   
   Other than that the most common way to traverse RelNode tree is via visitor 
and shuttles so not sure if this kind of Stream based traversal is something 
that will be well adopted.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java:
##########
@@ -51,6 +52,11 @@ public HiveAggregate(RelOptCluster cluster, RelTraitSet 
traitSet, RelNode child,
             groupSet, groupSets, aggCalls);
   }
 
+  public HiveAggregate(RelInput input) {
+    this(input.getCluster(), input.getTraitSet(), input.getInput(),
+        input.getBitSet("group"), input.getBitSetList("groups"), 
input.getAggregateCalls("aggs"));

Review Comment:
   Can the following work?
   ```suggestion
       super(input);
   ```
   If yes then can I do the same on the other RelNodes?



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/RexNodeExprFactory.java:
##########
@@ -617,6 +617,26 @@ protected RexNode createConstantExpr(TypeInfo typeInfo, 
Object constantValue)
           operands);
     }
     RelDataType finalType = TypeConverter.convert(typeInfo, 
rexBuilder.getTypeFactory());
+
+    // Make exact literals for float/double when the value is constant
+    // to avoid printing in scientific notation. This helps while
+    // deserializing the json plan.
+    switch (finalType.getSqlTypeName()) {

Review Comment:
   As mentioned elsewhere this kind of change is risky so it should be part of 
another PR if necessary.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:
##########
@@ -1079,15 +1106,110 @@ public static Pair<RelOptTable, List<Integer>> 
getColumnOriginSet(RelNode rel, I
    * to parse the string back.
    */
   public static String toJsonString(final RelNode rel) {
+    return serializeWithPlanWriter(rel, new HiveRelJsonImplWithStats());
+  }
+
+  private static String serializeWithPlanWriter(RelNode rel, HiveRelJsonImpl 
planWriter) {
     if (rel == null) {
       return null;
     }
-
-    final HiveRelJsonImpl planWriter = new HiveRelJsonImpl();
     rel.explain(planWriter);
     return planWriter.asString();
   }
 
+  public static Optional<String> serializeToJSON(RelNode plan) {
+    if (!isSerializable(plan)) {
+      return Optional.empty();
+    }
+
+    JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>());
+    outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new 
HiveRelJsonImpl()));
+    String jsonPlan = outJSONObject.toString();
+
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Plan to serialize: \n{}", RelOptUtil.toString(plan));
+      LOG.debug("JSON plan: \n{}", jsonPlan);
+    }
+
+    return Optional.of(jsonPlan);
+  }
+
+  private static boolean isSerializable(RelNode plan) {
+    return HiveRelNode.stream(plan)
+        .noneMatch(node ->
+            
Objects.requireNonNull(node.getConvention()).getName().toLowerCase().contains("jdbc")
+        );
+  }
+
+  public static RelNode deserializePlan(HiveConf conf, String jsonPlan) throws 
IOException {
+    RelOptPlanner planner = HiveRelOptUtil.createPlanner(conf, 
StatsSources.getStatsSource(conf), false);
+    RexBuilder rexBuilder = new RexBuilder(new HiveTypeFactory());
+    RelOptCluster cluster = RelOptCluster.create(planner, rexBuilder);
+
+    RelPlanParser parser = new RelPlanParser(cluster, conf);
+    RelNode deserializedPlan = parser.parse(jsonPlan);
+    // Apply partition pruning to compute partition list in HiveTableScan
+    deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, 
planner);

Review Comment:
   Why do we need the partition list? Can't we deserialize the plan without it?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:
##########
@@ -1079,15 +1106,110 @@ public static Pair<RelOptTable, List<Integer>> 
getColumnOriginSet(RelNode rel, I
    * to parse the string back.
    */
   public static String toJsonString(final RelNode rel) {
+    return serializeWithPlanWriter(rel, new HiveRelJsonImplWithStats());
+  }
+
+  private static String serializeWithPlanWriter(RelNode rel, HiveRelJsonImpl 
planWriter) {
     if (rel == null) {
       return null;
     }
-
-    final HiveRelJsonImpl planWriter = new HiveRelJsonImpl();
     rel.explain(planWriter);
     return planWriter.asString();
   }
 
+  public static Optional<String> serializeToJSON(RelNode plan) {
+    if (!isSerializable(plan)) {
+      return Optional.empty();
+    }
+
+    JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>());
+    outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new 
HiveRelJsonImpl()));
+    String jsonPlan = outJSONObject.toString();
+
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Plan to serialize: \n{}", RelOptUtil.toString(plan));
+      LOG.debug("JSON plan: \n{}", jsonPlan);
+    }

Review Comment:
   Don't think we need logging at this stage since we are giving back the 
result and the receiver can decide what to do about it. Also the only caller at 
this stage is test code so there is no benefit in having the logs since the 
tests are explicitly validating this output.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java:
##########
@@ -326,10 +355,23 @@ public boolean isInsideView() {
     return insideView;
   }
 
+  private boolean isMaterializedTable() {
+    return ((RelOptHiveTable) table).getHiveTableMD().isMaterializedTable();
+  }
+
   public HiveTableScanTrait getTableScanTrait() {
     return tableScanTrait;
   }
 
+  private static HiveTableScanTrait createTableScanTrait(RelInput input) {
+    String enumName = input.getString("tableScanTrait");
+    if (enumName == null) {
+      return null;
+    }

Review Comment:
   Are there cases where we don't serialize the trait? Can we ever have null 
here?



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelJsonImplWithStats.java:
##########
@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite;
+
+import org.apache.calcite.rel.RelNode;
+import org.apache.calcite.rel.metadata.RelMetadataQuery;
+import org.apache.calcite.util.ImmutableBitSet;
+import org.apache.calcite.util.Pair;
+import org.apache.hadoop.hive.ql.plan.ColStatistics;
+
+import java.util.List;
+import java.util.Map;
+
+public class HiveRelJsonImplWithStats extends HiveRelJsonImpl{

Review Comment:
   Since including the stats was the default behavior so far I would follow a 
slightly different approach. I would keep the original class intact and have 
the "nostats" serialization logic in this **new** class (changing the name 
accordingly). This makes it easier to follow what change and what not.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:
##########
@@ -1079,15 +1106,110 @@ public static Pair<RelOptTable, List<Integer>> 
getColumnOriginSet(RelNode rel, I
    * to parse the string back.
    */
   public static String toJsonString(final RelNode rel) {
+    return serializeWithPlanWriter(rel, new HiveRelJsonImplWithStats());
+  }
+
+  private static String serializeWithPlanWriter(RelNode rel, HiveRelJsonImpl 
planWriter) {
     if (rel == null) {
       return null;
     }
-
-    final HiveRelJsonImpl planWriter = new HiveRelJsonImpl();
     rel.explain(planWriter);
     return planWriter.asString();
   }
 
+  public static Optional<String> serializeToJSON(RelNode plan) {
+    if (!isSerializable(plan)) {
+      return Optional.empty();
+    }
+
+    JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>());
+    outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new 
HiveRelJsonImpl()));
+    String jsonPlan = outJSONObject.toString();
+
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Plan to serialize: \n{}", RelOptUtil.toString(plan));
+      LOG.debug("JSON plan: \n{}", jsonPlan);
+    }
+
+    return Optional.of(jsonPlan);
+  }
+
+  private static boolean isSerializable(RelNode plan) {
+    return HiveRelNode.stream(plan)
+        .noneMatch(node ->
+            
Objects.requireNonNull(node.getConvention()).getName().toLowerCase().contains("jdbc")
+        );
+  }
+
+  public static RelNode deserializePlan(HiveConf conf, String jsonPlan) throws 
IOException {
+    RelOptPlanner planner = HiveRelOptUtil.createPlanner(conf, 
StatsSources.getStatsSource(conf), false);
+    RexBuilder rexBuilder = new RexBuilder(new HiveTypeFactory());
+    RelOptCluster cluster = RelOptCluster.create(planner, rexBuilder);
+
+    RelPlanParser parser = new RelPlanParser(cluster, conf);
+    RelNode deserializedPlan = parser.parse(jsonPlan);
+    // Apply partition pruning to compute partition list in HiveTableScan
+    deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, 
planner);
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Deserialized plan: \n{}", 
RelOptUtil.toString(deserializedPlan));

Review Comment:
   Consider removing logging from this API. Same reasons as the one mentioned 
before.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:
##########
@@ -1079,15 +1106,110 @@ public static Pair<RelOptTable, List<Integer>> 
getColumnOriginSet(RelNode rel, I
    * to parse the string back.
    */
   public static String toJsonString(final RelNode rel) {
+    return serializeWithPlanWriter(rel, new HiveRelJsonImplWithStats());
+  }
+
+  private static String serializeWithPlanWriter(RelNode rel, HiveRelJsonImpl 
planWriter) {
     if (rel == null) {
       return null;
     }
-
-    final HiveRelJsonImpl planWriter = new HiveRelJsonImpl();
     rel.explain(planWriter);
     return planWriter.asString();
   }
 
+  public static Optional<String> serializeToJSON(RelNode plan) {

Review Comment:
   Since this is a new public API in a commonly used class it should have 
Javadoc. Moreover, it really resembles to an existing method `toJsonString` but 
it's not obvious why we picked a different name. It's nto possible to decide 
which one to call without reading the code. When we are adding APIs to existing 
classes we try to keep things uniform so that other devs can find easier what 
they need.



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java:
##########
@@ -1079,15 +1106,110 @@ public static Pair<RelOptTable, List<Integer>> 
getColumnOriginSet(RelNode rel, I
    * to parse the string back.
    */
   public static String toJsonString(final RelNode rel) {
+    return serializeWithPlanWriter(rel, new HiveRelJsonImplWithStats());
+  }
+
+  private static String serializeWithPlanWriter(RelNode rel, HiveRelJsonImpl 
planWriter) {
     if (rel == null) {
       return null;
     }
-
-    final HiveRelJsonImpl planWriter = new HiveRelJsonImpl();
     rel.explain(planWriter);
     return planWriter.asString();
   }
 
+  public static Optional<String> serializeToJSON(RelNode plan) {
+    if (!isSerializable(plan)) {
+      return Optional.empty();
+    }
+
+    JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>());
+    outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new 
HiveRelJsonImpl()));
+    String jsonPlan = outJSONObject.toString();
+
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Plan to serialize: \n{}", RelOptUtil.toString(plan));
+      LOG.debug("JSON plan: \n{}", jsonPlan);
+    }
+
+    return Optional.of(jsonPlan);
+  }
+
+  private static boolean isSerializable(RelNode plan) {

Review Comment:
   Alternative implem without new APIs.
   ```java
   private static boolean isSerializable(RelNode plan) {
       boolean[] hasJdbcRel = new boolean[] { false };
       plan.accept(new RelHomogeneousShuttle() {
         @Override
         public RelNode visit(RelNode other) {
           hasJdbcRel[0] |= other instanceof JdbcRel;
           return hasJdbcRel[0] ? other : super.visit(other);
         }
       });
       return !hasJdbcRel[0];
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-28197: Add deserializer to convert JSON plans to RelNodes [hive]

Reply via email to