Phixsura commented on PR #38699: URL: https://github.com/apache/shardingsphere/pull/38699#issuecomment-4527982905
> Decision > > Merge Verdict: Not Mergeable > > Reviewed Scope: PR latest head [bd2238a](https://github.com/apache/shardingsphere/commit/bd2238a2e61210b0ef694804d6f57dcecdab308d), changed files in RELEASE-NOTES.md and features/sharding/core/.../orderby, plus adjacent sharding result-merge paths: OrderByStreamMergedResult, ShardingDQLResultMerger, GroupByStreamMergedResult, GroupByMemoryMergedResult, GroupByValue, GroupByRowComparator, FetchStreamMergedResult, CompareUtils. > > Not Reviewed Scope: full Maven/E2E execution locally, full dialect-by-dialect binary type behavior, parser modules because this PR does not touch parser code. > > Need Expert Review: sharding result-merge maintainer review recommended; no security or supply-chain expert review needed. > > Positive Feedback > > The PR is aimed at the right root point. The issue reports MySQL VARBINARY ordering failing in both Proxy and JDBC because byte[] is not Comparable in OrderByValue, and the PR wraps byte[] before comparison in the plain order-by merge path. MySQL docs support byte-value based sorting for BINARY/VARBINARY and state all bytes matter for ORDER BY. > > Major Issues > > [P1] Adjacent group-by/distinct merge paths are still unsafe for byte[]. > > Symptom: ShardingDQLResultMerger processes group-by/distinct before plain order-by at ShardingDQLResultMerger.java (line 88). Those paths still use raw byte[]: GroupByValue stores raw values at GroupByValue.java (line 43), GroupByStreamMergedResult compares lists via equals at GroupByStreamMergedResult.java (line 82), and GroupByRowComparator still rejects non-Comparable byte arrays at GroupByRowComparator.java (line 53). > > Risk: an adjacent valid query such as SELECT order_id, COUNT(*) FROM t_order GROUP BY order_id ORDER BY order_id over two shards can now pass the new OrderByValue wrapper but fail to merge equal byte contents because Java arrays compare by identity. SELECT DISTINCT order_id ... ORDER BY order_id has the same risk through the distinct-as-group-by branch. > > Recommended action: reuse the same binary normalization semantics in group keys and memory-row ordering, then add tests for plain ORDER BY, GROUP BY ... ORDER BY, and DISTINCT ... ORDER BY on equal byte contents from different query results. > > [P1] Validation is incomplete on the latest PR head. > > Symptom: GitHub check-runs show 58 successful checks but 1 cancelled check: E2E - SQL (Stage 2) (proxy, Cluster, MySQL, dbtbl_with_readwrite_splitting), and the overall Actions run is cancelled: [run 25996769398](https://github.com/apache/shardingsphere/actions/runs/25996769398). > > Risk: the cancelled job is in a MySQL Proxy cluster sharding-related matrix, close to the affected runtime topology. > > Recommended action: rerun CI to a fully green state, or provide equivalent scoped evidence with dependent modules, for example ./mvnw -pl features/sharding/core -am -DskipITs -Dspotless.skip=true -Dtest=ComparableByteArrayTest,OrderByValueTest -Dsurefire.failIfNoSpecifiedTests=false test. > > Newly Introduced Issues > > [P2] ComparableByteArray is public though it appears package-internal only. CODE_OF_CONDUCT.md asks for minimal access control at CODE_OF_CONDUCT.md (line 59). Please make it package-private unless there is a documented API reason. > > Next Steps > > Normalize byte arrays consistently across order-by, group-by, distinct, and memory comparator paths. Add counterexample tests with equal byte contents from different QueryResult instances. Rerun the cancelled CI job and keep the full check suite green. Re-run formatting/check gates required by AGENTS.md (line 45) and CODE_OF_CONDUCT.md (line 17). Thanks for the thorough review. On P1.1: You're right, the group-by and distinct paths route through `GroupByValue` and `GroupByRowComparator` before plain order-by even runs, so the `OrderByValue` wrapper doesn't help there. I'll extend the same binary normalization to `GroupByValue.getGroupByValues` (so two byte[] with equal contents are `.equals`-equal) and `GroupByRowComparator.compare` (so the `Comparable` precondition holds for byte[]), and add tests for `GROUP BY ... ORDER BY` and `DISTINCT ... ORDER BY` over equal byte contents from different shard results. On P1.2: I'll rerun CI on the next push and target a fully green state before asking for re-review. On P2: Will demote `ComparableByteArray` to package-private and relocate it to whichever package keeps all three callers (order-by, group-by, distinct) inside the same access scope. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
