tsreaper commented on code in PR #4651:
URL: https://github.com/apache/paimon/pull/4651#discussion_r1875481670
##########
paimon-core/src/main/java/org/apache/paimon/utils/BulkFormatMapping.java:
##########
@@ -35,18 +36,23 @@
import javax.annotation.Nullable;
import java.util.ArrayList;
+import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
+import java.util.Map;
+import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Function;
import static
org.apache.paimon.predicate.PredicateBuilder.excludePredicateWithFields;
+import static org.apache.paimon.table.SpecialFields.KEY_FIELD_ID_START;
/** Class with index mapping and bulk format. */
public class BulkFormatMapping {
@Nullable private final int[] indexMapping;
@Nullable private final CastFieldGetter[] castMapping;
@Nullable private final Pair<int[], RowType> partitionPair;
+ @Nullable private final int[] trimmedKeyMapping;
Review Comment:
Now that we have so many mappings, it would be better to add some comments
to explain what each mapping means and what columns they store.
##########
paimon-core/src/main/java/org/apache/paimon/utils/BulkFormatMapping.java:
##########
@@ -139,17 +171,51 @@ public BulkFormatMapping build(
indexCastMapping.getIndexMapping(),
indexCastMapping.getCastMapping(),
partitionMapping,
+ trimmedKeyPair.getLeft(),
formatDiscover
.discover(formatIdentifier)
- .createReaderFactory(readDataRowType, readFilters),
+ .createReaderFactory(trimmedKeyPair.getRight(),
readFilters),
dataSchema,
readFilters);
}
- private List<DataField> readDataFields(TableSchema dataSchema) {
- List<DataField> dataFields = fieldsExtractor.apply(dataSchema);
+ static Pair<int[], RowType> trimKeyFields(
+ List<DataField> fieldsWithoutPartition, List<DataField>
fields) {
+ int[] map = new int[fieldsWithoutPartition.size()];
+ List<DataField> trimmedFields = new ArrayList<>();
+ Map<Integer, DataField> fieldMap = new HashMap<>();
+ Map<Integer, Integer> positionMap = new HashMap<>();
+
+ for (DataField field : fields) {
+ fieldMap.put(field.id(), field);
+ }
+
+ AtomicInteger index = new AtomicInteger();
+ for (int i = 0; i < fieldsWithoutPartition.size(); i++) {
+ DataField field = fieldsWithoutPartition.get(i);
+ boolean keyField = SpecialFields.isKeyField(field.id());
Review Comment:
Why checking this with id? In [Paimon
spec](https://paimon.apache.org/docs/master/concepts/spec/datafile/) we state
that key column names must start with `_KEY_`. Check their name instead.
##########
paimon-common/src/main/java/org/apache/paimon/table/SpecialFields.java:
##########
@@ -72,6 +72,8 @@ public class SpecialFields {
public static final String KEY_FIELD_PREFIX = "_KEY_";
public static final int KEY_FIELD_ID_START = SYSTEM_FIELD_ID_START;
+ // reserve 1000 for other system fields
+ public static final int KEY_FIELD_ID_END = Integer.MAX_VALUE - 1_000;
Review Comment:
So there can be at most $10^9$ keys and only $10^3$ system fields? What if I
need to add a new group of system fields?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]