leaves12138 commented on PR #7934:
URL: https://github.com/apache/paimon/pull/7934#issuecomment-4517157505

   Thanks for the update. The direct `ARRAY<ROW<...>>` and `MAP<..., ROW<...>>` 
cases now look covered, and the existing targeted tests pass for me. I found 
two remaining edge cases:
   
   1. Nested projection still does not propagate through nested collection 
levels. `NestedProjectedRow.create` only registers an array projection when the 
immediate array element type is `ROW`, and 
`ProjectedInternalArray#getArray/#getMap` delegates to the underlying array 
as-is. So a type like `ARRAY<ARRAY<ROW<a INT, b INT>>>` projected as 
`ARRAY<ARRAY<ROW<b INT>>>` still returns `a` when reading the inner row:
   
   ```java
   RowType elementType = new RowType(Arrays.asList(
       new DataField(10, "a", new IntType()),
       new DataField(11, "b", new IntType())));
   RowType dataSchema = new RowType(Collections.singletonList(
       new DataField(0, "arr", new ArrayType(new ArrayType(elementType)))));
   
   RowType projectedElementType = new RowType(Collections.singletonList(
       new DataField(11, "b", new IntType())));
   RowType projectedSchema = new RowType(Collections.singletonList(
       new DataField(0, "arr", new ArrayType(new 
ArrayType(projectedElementType)))));
   
   // arr = [[ROW<a=1, b=100>]]
   assertThat(row.getArray(0).getArray(0).getRow(0, 
1).getInt(0)).isEqualTo(100);
   ```
   
   This currently returns `1`. The same pattern should apply to `MAP` 
values/keys that contain nested collections of pruned rows, because 
`FormatReaderMapping.pruneDataType()` recurses through `ARRAY` and `MAP` at 
arbitrary depth.
   
   2. Projecting a `MULTISET` field can fail before reading. The branch for 
`DataTypeRoot.MAP || DataTypeRoot.MULTISET` casts the type to `MapType`, but 
`MultisetType` is not a `MapType`. For example, reading only the `ms` field 
from `ROW<id INT, ms MULTISET<STRING>>` throws:
   
   ```text
   java.lang.ClassCastException: org.apache.paimon.types.MultisetType cannot be 
cast to org.apache.paimon.types.MapType
       at 
org.apache.paimon.utils.NestedProjectedRow.create(NestedProjectedRow.java:158)
       at 
org.apache.paimon.format.row.RowFileFormat.createReaderFactory(RowFileFormat.java:55)
   ```
   
   Could you make the collection projection logic recursive for nested 
`ARRAY`/`MAP` cases, and handle `MULTISET` separately (or just leave it as an 
unprojected map-like value when its element type is not being pruned)?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to