contrueCT opened a new issue, #3053:
URL: https://github.com/apache/hugegraph/issues/3053

   ### Bug Type (问题类型)
   
   logic (逻辑设计问题)
   
   ### Before submit
   
   - [x] 我已经确认现有的 [Issues](https://github.com/apache/hugegraph/issues) 与 
[FAQ](https://hugegraph.apache.org/docs/guides/faq/) 中没有相同 / 重复问题 (I have 
confirmed and searched that there are no similar problems in the historical 
issue and documents)
   
   ### Environment (环境信息)
   
   - Server Version: master / PR #2994 based branch
   - Backend: HStore
   - OS: GitHub Actions hstore CI
   
   
   ### Expected & Actual behavior (期望与实际表现)
   
   While investigating the hstore CI failure in #2994, we found an existing 
latent issue in the HStore range-index scan path.
   
   For range-index queries with `limit`, `offset`, or paging, HugeGraph's upper 
layer assumes that backend range scan results are returned in global 
range-index-key order. It also assumes that the returned `PageState.position()` 
can be reused as a HugeGraph range cursor.
   
   However, HStore's multi-node/tablet scan path can return entries in backend 
iterator order instead of globally sorted key order. The page state is also an 
internal storage cursor, not necessarily a HugeGraph range-index key. This can 
make range-index queries return unstable ordering or skip valid entries when 
paging is involved.
   
   One concrete failure exposed by #2994 was:
   
   ```java
   graph.traversal().V().hasLabel("person")
        .has("birth", P.between(date2013, date2016))
        .limit(2)
        .toList();
   ```
   The expected range-index order is:
   2013 -> 2014 -> 2015
   
   But the HStore scan returned entries like:
   2014 -> 2013 -> 2015
   
   Then limit(2) selected the wrong first two entries.
   
   Another paging-related failure showed that after the first page, the page 
position was an HStore internal cursor. Reusing it as the range scan start 
could skip valid range-index entries.
   
   In #2994 we added a narrow workaround in GraphIndexTransaction: for HStore 
range-index queries whose visible result depends on limit, offset, or paging, 
the index layer reads the matched range-index entries, sorts them by 
range-index value, and slices them at the HugeGraph layer. Unbounded 
range-index scans still use the original streaming path to avoid disturbing 
count, joint-index, and cleanup paths.
   
   This workaround fixes the immediate user-visible correctness issue, but the 
lower-level contract is still unclear.
   
   ### Expected behavior
   HStore range scans should have a clear and reliable contract:
   
   If HugeGraph range-index scan semantics require ordered results, HStore 
should return globally sorted entries across node/tablet iterators.
   PageState.position() should have a well-defined meaning. It should be clear 
whether it is a backend-internal cursor or a HugeGraph key cursor.
   Range-index paging should not skip valid entries or depend on accidental 
backend iterator order.
   
   ### Possible fix direction
   A more complete fix should probably be handled in the HStore store-client / 
scan iterator layer:
   
   - define whether IdRangeQuery results must be globally ordered by key;
   - merge multiple node/tablet iterators by key order when serving ordered 
range scans;
   - separate backend-internal page cursor semantics from HugeGraph range-key 
cursor semantics;
   - add HStore-specific regression tests for:
     - range index + limit;
     - range index + offset;
     - range index + paging across multiple pages;
     - cross-node/tablet range scans;
     - count / joint-index / left-index cleanup paths to avoid regressions.
   
   ### Related context
   This was exposed during #2994, but it does not seem to be caused by the 
query-condition refactoring itself. The PR only made the latent HStore issue 
visible in CI.
   
   ### Vertex/Edge example (问题点 / 边数据举例)
   
   ```javascript
   
   ```
   
   ### Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)
   
   ```javascript
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to