Phixsura commented on PR #38699:
URL: https://github.com/apache/shardingsphere/pull/38699#issuecomment-4527982905

   > Decision
   > 
   > Merge Verdict: Not Mergeable
   > 
   > Reviewed Scope: PR latest head 
[bd2238a](https://github.com/apache/shardingsphere/commit/bd2238a2e61210b0ef694804d6f57dcecdab308d),
 changed files in RELEASE-NOTES.md and features/sharding/core/.../orderby, plus 
adjacent sharding result-merge paths: OrderByStreamMergedResult, 
ShardingDQLResultMerger, GroupByStreamMergedResult, GroupByMemoryMergedResult, 
GroupByValue, GroupByRowComparator, FetchStreamMergedResult, CompareUtils.
   > 
   > Not Reviewed Scope: full Maven/E2E execution locally, full 
dialect-by-dialect binary type behavior, parser modules because this PR does 
not touch parser code.
   > 
   > Need Expert Review: sharding result-merge maintainer review recommended; 
no security or supply-chain expert review needed.
   > 
   > Positive Feedback
   > 
   > The PR is aimed at the right root point. The issue reports MySQL VARBINARY 
ordering failing in both Proxy and JDBC because byte[] is not Comparable in 
OrderByValue, and the PR wraps byte[] before comparison in the plain order-by 
merge path. MySQL docs support byte-value based sorting for BINARY/VARBINARY 
and state all bytes matter for ORDER BY.
   > 
   > Major Issues
   > 
   > [P1] Adjacent group-by/distinct merge paths are still unsafe for byte[].
   > 
   > Symptom: ShardingDQLResultMerger processes group-by/distinct before plain 
order-by at ShardingDQLResultMerger.java (line 88). Those paths still use raw 
byte[]: GroupByValue stores raw values at GroupByValue.java (line 43), 
GroupByStreamMergedResult compares lists via equals at 
GroupByStreamMergedResult.java (line 82), and GroupByRowComparator still 
rejects non-Comparable byte arrays at GroupByRowComparator.java (line 53).
   > 
   > Risk: an adjacent valid query such as SELECT order_id, COUNT(*) FROM 
t_order GROUP BY order_id ORDER BY order_id over two shards can now pass the 
new OrderByValue wrapper but fail to merge equal byte contents because Java 
arrays compare by identity. SELECT DISTINCT order_id ... ORDER BY order_id has 
the same risk through the distinct-as-group-by branch.
   > 
   > Recommended action: reuse the same binary normalization semantics in group 
keys and memory-row ordering, then add tests for plain ORDER BY, GROUP BY ... 
ORDER BY, and DISTINCT ... ORDER BY on equal byte contents from different query 
results.
   > 
   > [P1] Validation is incomplete on the latest PR head.
   > 
   > Symptom: GitHub check-runs show 58 successful checks but 1 cancelled 
check: E2E - SQL (Stage 2) (proxy, Cluster, MySQL, 
dbtbl_with_readwrite_splitting), and the overall Actions run is cancelled: [run 
25996769398](https://github.com/apache/shardingsphere/actions/runs/25996769398).
   > 
   > Risk: the cancelled job is in a MySQL Proxy cluster sharding-related 
matrix, close to the affected runtime topology.
   > 
   > Recommended action: rerun CI to a fully green state, or provide equivalent 
scoped evidence with dependent modules, for example ./mvnw -pl 
features/sharding/core -am -DskipITs -Dspotless.skip=true 
-Dtest=ComparableByteArrayTest,OrderByValueTest 
-Dsurefire.failIfNoSpecifiedTests=false test.
   > 
   > Newly Introduced Issues
   > 
   > [P2] ComparableByteArray is public though it appears package-internal 
only. CODE_OF_CONDUCT.md asks for minimal access control at CODE_OF_CONDUCT.md 
(line 59). Please make it package-private unless there is a documented API 
reason.
   > 
   > Next Steps
   > 
   > Normalize byte arrays consistently across order-by, group-by, distinct, 
and memory comparator paths. Add counterexample tests with equal byte contents 
from different QueryResult instances. Rerun the cancelled CI job and keep the 
full check suite green. Re-run formatting/check gates required by AGENTS.md 
(line 45) and CODE_OF_CONDUCT.md (line 17).
   
   Thanks for the thorough review.
   
   On P1.1: You're right, the group-by and distinct paths route through 
`GroupByValue` and `GroupByRowComparator` before plain order-by even runs, so 
the `OrderByValue` wrapper doesn't help there. I'll extend the same binary 
normalization to `GroupByValue.getGroupByValues` (so two byte[] with equal 
contents are `.equals`-equal) and `GroupByRowComparator.compare` (so the 
`Comparable` precondition holds for byte[]), and add tests for `GROUP BY ... 
ORDER BY` and `DISTINCT ... ORDER BY` over equal byte contents from different 
shard results.
   
   On P1.2: I'll rerun CI on the next push and target a fully green state 
before asking for re-review.
   
   On P2: Will demote `ComparableByteArray` to package-private and relocate it 
to whichever package keeps all three callers (order-by, group-by, distinct) 
inside the same access scope.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to