nsivabalan commented on code in PR #5427:
URL: https://github.com/apache/hudi/pull/5427#discussion_r858306275


##########
hudi-common/src/main/java/org/apache/hudi/internal/schema/utils/InternalSchemaUtils.java:
##########
@@ -54,29 +58,75 @@ private InternalSchemaUtils() {
    */
   public static InternalSchema pruneInternalSchema(InternalSchema schema, 
List<String> names) {
     // do check
-    List<Integer> prunedIds = names.stream().map(name -> {
+    List<Integer> prunedIds = names.stream()
+        .filter(name -> {
+          int id = schema.findIdByName(name);
+          if (id < 0) {
+            LOG.warn(String.format("cannot prune col: %s does not exist in 
hudi table", name));
+            return false;
+          }
+          return true;
+        })
+        .map(schema::findIdByName).collect(Collectors.toList());
+    // find top parent field ID. eg: a.b.c, f.g.h, only collect id of a and f 
ignore all child field.
+    List<Integer> topParentFieldIds = new ArrayList<>();
+    names.stream().forEach(f -> {
+      int id = schema.findIdByName(f.split("\\.")[0]);
+      if (!topParentFieldIds.contains(id)) {
+        topParentFieldIds.add(id);
+      }
+    });
+    return pruneInternalSchemaByID(schema, prunedIds, topParentFieldIds);
+  }
+
+  /**
+   * Create project internalSchema, based on the project names which produced 
by query engine and Hudi fields.
+   * support nested project.
+   *
+   * @param schema      a internal schema.
+   * @param queryFields project names produced by query engine.
+   * @param hudiFields  project names required by Hudi merging.
+   * @return a project internalSchema.
+   */
+  public static InternalSchema pruneInternalSchema(InternalSchema schema, 
List<String> queryFields, List<String> hudiFields) {

Review Comment:
   with the addition of this new method, is method at L 59 called anywhere? I 
expect all callers to use this instead of that? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to