leaves12138 commented on PR #7934:
URL: https://github.com/apache/paimon/pull/7934#issuecomment-4517157505
Thanks for the update. The direct `ARRAY<ROW<...>>` and `MAP<..., ROW<...>>`
cases now look covered, and the existing targeted tests pass for me. I found
two remaining edge cases:
1. Nested projection still does not propagate through nested collection
levels. `NestedProjectedRow.create` only registers an array projection when the
immediate array element type is `ROW`, and
`ProjectedInternalArray#getArray/#getMap` delegates to the underlying array
as-is. So a type like `ARRAY<ARRAY<ROW<a INT, b INT>>>` projected as
`ARRAY<ARRAY<ROW<b INT>>>` still returns `a` when reading the inner row:
```java
RowType elementType = new RowType(Arrays.asList(
new DataField(10, "a", new IntType()),
new DataField(11, "b", new IntType())));
RowType dataSchema = new RowType(Collections.singletonList(
new DataField(0, "arr", new ArrayType(new ArrayType(elementType)))));
RowType projectedElementType = new RowType(Collections.singletonList(
new DataField(11, "b", new IntType())));
RowType projectedSchema = new RowType(Collections.singletonList(
new DataField(0, "arr", new ArrayType(new
ArrayType(projectedElementType)))));
// arr = [[ROW<a=1, b=100>]]
assertThat(row.getArray(0).getArray(0).getRow(0,
1).getInt(0)).isEqualTo(100);
```
This currently returns `1`. The same pattern should apply to `MAP`
values/keys that contain nested collections of pruned rows, because
`FormatReaderMapping.pruneDataType()` recurses through `ARRAY` and `MAP` at
arbitrary depth.
2. Projecting a `MULTISET` field can fail before reading. The branch for
`DataTypeRoot.MAP || DataTypeRoot.MULTISET` casts the type to `MapType`, but
`MultisetType` is not a `MapType`. For example, reading only the `ms` field
from `ROW<id INT, ms MULTISET<STRING>>` throws:
```text
java.lang.ClassCastException: org.apache.paimon.types.MultisetType cannot be
cast to org.apache.paimon.types.MapType
at
org.apache.paimon.utils.NestedProjectedRow.create(NestedProjectedRow.java:158)
at
org.apache.paimon.format.row.RowFileFormat.createReaderFactory(RowFileFormat.java:55)
```
Could you make the collection projection logic recursive for nested
`ARRAY`/`MAP` cases, and handle `MULTISET` separately (or just leave it as an
unprojected map-like value when its element type is not being pruned)?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]