[GitHub] [iceberg] openinx commented on a change in pull request #2465: Core: add row identifier to schema

GitBox Tue, 13 Apr 2021 00:50:42 -0700


openinx commented on a change in pull request #2465:
URL: https://github.com/apache/iceberg/pull/2465#discussion_r612209228




##########
File path: core/src/main/java/org/apache/iceberg/util/JsonUtil.java
##########
@@ -116,19 +122,42 @@ public static String getStringOrNull(String property, 
JsonNode node) {
   }
 
   public static List<String> getStringList(String property, JsonNode node) {
-    Preconditions.checkArgument(node.has(property), "Cannot parse missing list 
%s", property);
+    ImmutableList.Builder<String> builder = ImmutableList.builder();
+    return getCollection(property, node, builder::build, builder::add, 
JsonNode::asText,
+        e -> Preconditions.checkArgument(e.isTextual(), "Cannot parse string 
from non-text value: %s", e),
+        false);
+  }
+
+  public static Set<Integer> getIntegerSetOrNull(String property, JsonNode 
node) {
+    ImmutableSet.Builder<Integer> builder = ImmutableSet.builder();
+    return getCollection(property, node, builder::build, builder::add, 
JsonNode::asInt,
+        e -> Preconditions.checkArgument(e.isInt(), "Cannot parse string from 
non-integer value: %s", e),
+        true);
+  }
+
+  private static <C extends Collection<T>, T> C getCollection(
+      String property, JsonNode node,
+      Supplier<C> builder, Consumer<T> appender,
+      Function<JsonNode, T> getter, Consumer<JsonNode> validator,

Review comment:
       Those `builder`, `appender`, `getter` and `validator` makes the method 
to be really hard to follow, could we just defines a simple `visitor` interface 
to visit the `JsonNode` in this collection and collect all the final result to 
the upper layer method ? 

##########
File path: core/src/main/java/org/apache/iceberg/SchemaUpdate.java
##########
@@ -408,11 +450,15 @@ private TableMetadata applyChangesToMapping(TableMetadata 
metadata) {
   private static Schema applyChanges(Schema schema, List<Integer> deletes,
                                      Map<Integer, Types.NestedField> updates,
                                      Multimap<Integer, Types.NestedField> adds,
-                                     Multimap<Integer, Move> moves) {
+                                     Multimap<Integer, Move> moves,
+                                     Set<String> rowIdentifierNames) {
     Types.StructType struct = TypeUtil
         .visit(schema, new ApplyChanges(deletes, updates, adds, moves))
         .asNestedType().asStructType();
-    return new Schema(struct.fields());
+    Set<Integer> refreshedRowIdentifiers = rowIdentifierNames.stream()
+        .map(name -> struct.field(name).fieldId())

Review comment:
       If we delete column `b` by using the `SchemaUpdate#deleteColumn` API,  
then the `struct` variable which has been applied by those changes  won't 
include the field that has the colum name `b`, right ?   If sure, then how 
could we locate the field id for that given column name `b` in this sentence ? 

##########
File path: core/src/main/java/org/apache/iceberg/SchemaParser.java
##########
@@ -158,7 +169,7 @@ public static String toJson(Schema schema, boolean pretty) {
       if (pretty) {
         generator.useDefaultPrettyPrinter();
       }
-      toJson(schema.asStruct(), generator);
+      toJson(schema.asStruct(), schema.schemaId(), schema.rowIdentifiers(), 
generator);

Review comment:
       I think this 
[PR](https://github.com/apache/iceberg/pull/2096/files#diff-0fd54fb65a9c5d2fce5bd3386550b2e829e0f70c2992fdad8fdb9bd2d866a74fR161)
 forget to refresh this `toJson`  method.

##########
File path: api/src/main/java/org/apache/iceberg/Schema.java
##########
@@ -355,9 +387,17 @@ private Schema internalSelect(Collection<String> names, 
boolean caseSensitive) {
 
   @Override
   public String toString() {
-    return String.format("table {\n%s\n}",
-        NEWLINE.join(struct.fields().stream()
-            .map(f -> "  " + f)
-            .collect(Collectors.toList())));
+    StringBuilder sb = new StringBuilder();
+    sb.append("table {\n");
+    sb.append("  fields {\n");
+    sb.append(NEWLINE.join(struct.fields().stream()
+        .map(f -> "    " + f)
+        .collect(Collectors.toList())));
+    sb.append("\n  }\n  row identifiers { ");
+    sb.append(COMMA.join(Arrays.stream(rowIdentifiers)
+        .mapToObj(id -> String.format("%d", id))
+        .collect(Collectors.toList())));

Review comment:
       How about just simplify this as 
`sb.append(COMMA.join(lazyRowIdentifiersSet()))` ? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on a change in pull request #2465: Core: add row identifier to schema

Reply via email to