leaves12138 commented on PR #7919:
URL: https://github.com/apache/paimon/pull/7919#issuecomment-4509237840
Thanks for the update. I found two blocking correctness/contract issues in
the current head (`a4143bf63109`).
1. `RTreeFileIndexWriter` does not accept the object type passed by the real
file-index write path for `ARRAY<DOUBLE>` columns.
`DataFileIndexWriter.FileIndexMaintainer.write()` calls
`fileIndexWriter.writeRecord(getter.getFieldOrNull(row))`, so an array field is
passed as Paimon's `InternalArray` implementation, e.g. `GenericArray` / binary
array, not as `double[]` or `java.util.List`. However,
`RTreeFileIndexWriter.extractPoint()` only handles `List` and `double[]`. A
normal write through the file-index path can therefore fail with:
```text
Cannot extract point from: org.apache.paimon.data.GenericArray
```
2. Production deserialization still loses the leaf flag for non-root nodes.
`RTreeFileIndexReader.deserializeNode()` only calls `node.setLeaf(isLeaf)` for
the root. Child nodes are always created with `new RTreeNode(..., false)`, and
the recursive call does not update their serialized `isLeaf` value. As a
result, leaf children under an internal node remain `isLeaf=false`; after
serializing a multi-level tree with `RTreeFileIndexWriter` and reading it back
with `RTreeFileIndex.createReader()`, an equality query for an existing point
can return `SKIP`.
I reproduced both with a small contract test:
- `writer.writeRecord(new GenericArray(new double[] {1.0, 2.0}))` throws
from `RTreeFileIndexWriter.extractPoint()`.
- Writing 200 points, serializing with `RTreeFileIndexWriter`, deserializing
with the production `RTreeFileIndexReader`, and querying `[50.0, 50.0]` returns
no match.
The existing tests pass because they mostly call `writer.write(new double[]
{...})` directly, bypassing `FileIndexWriter.writeRecord()`, and
`RTreeSerializationTest` uses its own test deserializer that constructs nodes
with the serialized `isLeaf` flag, bypassing the production reader bug.
Please fix these before merge:
1. Make the writer handle Paimon's `InternalArray` for `ARRAY<DOUBLE>`
values, and check whether reader literals need the same treatment.
2. Preserve `isLeaf` for every node during production deserialization, not
only the root.
3. Add regression coverage through the real `FileIndexWriter.writeRecord()`
/ `RTreeFileIndex.createReader()` path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]