xuzifu666 commented on PR #7919:
URL: https://github.com/apache/paimon/pull/7919#issuecomment-4514611557
> Thanks for the update. I found two blocking correctness/contract issues in
the current head (`a4143bf63109`).
>
> 1. `RTreeFileIndexWriter` does not accept the object type passed by the
real file-index write path for `ARRAY<DOUBLE>` columns.
`DataFileIndexWriter.FileIndexMaintainer.write()` calls
`fileIndexWriter.writeRecord(getter.getFieldOrNull(row))`, so an array field is
passed as Paimon's `InternalArray` implementation, e.g. `GenericArray` / binary
array, not as `double[]` or `java.util.List`. However,
`RTreeFileIndexWriter.extractPoint()` only handles `List` and `double[]`. A
normal write through the file-index path can therefore fail with:
> ```
> Cannot extract point from: org.apache.paimon.data.GenericArray
> ```
> 2. Production deserialization still loses the leaf flag for non-root
nodes. `RTreeFileIndexReader.deserializeNode()` only calls
`node.setLeaf(isLeaf)` for the root. Child nodes are always created with `new
RTreeNode(..., false)`, and the recursive call does not update their serialized
`isLeaf` value. As a result, leaf children under an internal node remain
`isLeaf=false`; after serializing a multi-level tree with
`RTreeFileIndexWriter` and reading it back with
`RTreeFileIndex.createReader()`, an equality query for an existing point can
return `SKIP`.
>
> I reproduced both with a small contract test:
>
> * `writer.writeRecord(new GenericArray(new double[] {1.0, 2.0}))` throws
from `RTreeFileIndexWriter.extractPoint()`.
> * Writing 200 points, serializing with `RTreeFileIndexWriter`,
deserializing with the production `RTreeFileIndexReader`, and querying `[50.0,
50.0]` returns no match.
>
> The existing tests pass because they mostly call `writer.write(new
double[] {...})` directly, bypassing `FileIndexWriter.writeRecord()`, and
`RTreeSerializationTest` uses its own test deserializer that constructs nodes
with the serialized `isLeaf` flag, bypassing the production reader bug.
>
> Please fix these before merge:
>
> 1. Make the writer handle Paimon's `InternalArray` for `ARRAY<DOUBLE>`
values, and check whether reader literals need the same treatment.
> 2. Preserve `isLeaf` for every node during production deserialization, not
only the root.
> 3. Add regression coverage through the real
`FileIndexWriter.writeRecord()` / `RTreeFileIndex.createReader()` path.
Thank you for the suggestion!I made relevant changes, PTAL @leaves12138
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]