Re: [PR] [core] Trim key field in reading, map it to value field [paimon]

via GitHub Mon, 09 Dec 2024 00:01:48 -0800


tsreaper commented on code in PR #4651:
URL: https://github.com/apache/paimon/pull/4651#discussion_r1875481670



##########
paimon-core/src/main/java/org/apache/paimon/utils/BulkFormatMapping.java:
##########
@@ -35,18 +36,23 @@
 import javax.annotation.Nullable;
 
 import java.util.ArrayList;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
 import java.util.function.Function;
 
 import static 
org.apache.paimon.predicate.PredicateBuilder.excludePredicateWithFields;
+import static org.apache.paimon.table.SpecialFields.KEY_FIELD_ID_START;
 
 /** Class with index mapping and bulk format. */
 public class BulkFormatMapping {
 
     @Nullable private final int[] indexMapping;
     @Nullable private final CastFieldGetter[] castMapping;
     @Nullable private final Pair<int[], RowType> partitionPair;
+    @Nullable private final int[] trimmedKeyMapping;

Review Comment:
   Now that we have so many mappings, it would be better to add some comments 
to explain what each mapping means and what columns they store.



##########
paimon-core/src/main/java/org/apache/paimon/utils/BulkFormatMapping.java:
##########
@@ -139,17 +171,51 @@ public BulkFormatMapping build(
                     indexCastMapping.getIndexMapping(),
                     indexCastMapping.getCastMapping(),
                     partitionMapping,
+                    trimmedKeyPair.getLeft(),
                     formatDiscover
                             .discover(formatIdentifier)
-                            .createReaderFactory(readDataRowType, readFilters),
+                            .createReaderFactory(trimmedKeyPair.getRight(), 
readFilters),
                     dataSchema,
                     readFilters);
         }
 
-        private List<DataField> readDataFields(TableSchema dataSchema) {
-            List<DataField> dataFields = fieldsExtractor.apply(dataSchema);
+        static Pair<int[], RowType> trimKeyFields(
+                List<DataField> fieldsWithoutPartition, List<DataField> 
fields) {
+            int[] map = new int[fieldsWithoutPartition.size()];
+            List<DataField> trimmedFields = new ArrayList<>();
+            Map<Integer, DataField> fieldMap = new HashMap<>();
+            Map<Integer, Integer> positionMap = new HashMap<>();
+
+            for (DataField field : fields) {
+                fieldMap.put(field.id(), field);
+            }
+
+            AtomicInteger index = new AtomicInteger();
+            for (int i = 0; i < fieldsWithoutPartition.size(); i++) {
+                DataField field = fieldsWithoutPartition.get(i);
+                boolean keyField = SpecialFields.isKeyField(field.id());

Review Comment:
   Why checking this with id? In [Paimon 
spec](https://paimon.apache.org/docs/master/concepts/spec/datafile/) we state 
that key column names must start with `_KEY_`. Check their name instead.



##########
paimon-common/src/main/java/org/apache/paimon/table/SpecialFields.java:
##########
@@ -72,6 +72,8 @@ public class SpecialFields {
 
     public static final String KEY_FIELD_PREFIX = "_KEY_";
     public static final int KEY_FIELD_ID_START = SYSTEM_FIELD_ID_START;
+    // reserve 1000 for other system fields
+    public static final int KEY_FIELD_ID_END = Integer.MAX_VALUE - 1_000;

Review Comment:
   So there can be at most $10^9$ keys and only $10^3$ system fields? What if I 
need to add a new group of system fields?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [core] Trim key field in reading, map it to value field [paimon]

Reply via email to