Alexey Kudinkin created HUDI-2814:
-------------------------------------

             Summary: Address issues w/ Z-order Layout Optimization
                 Key: HUDI-2814
                 URL: https://issues.apache.org/jira/browse/HUDI-2814
             Project: Apache Hudi
          Issue Type: Bug
          Components: Index
            Reporter: Alexey Kudinkin


During extensive testing following issues have been discovered, which we're 
planning to addres in the upcoming PR:
 * Data-skipping seq incorrectly handles cases when columns that are not 
Z-sorted are present in the query (it simply ignores this fact, while it should 
abandon pruning altogether[1])
 * Exception w/in file-pruning seq should not be affecting overall query (it 
should in the worst case fallback to full-scan)
 * Merging seq prefers records from the old Z-index table, while should prefer 
those from the new one.
 * After clustering columns change, Z-index should simply overwrite index 
(currently it actually does the opposite – it skips updating the index in case 
old and new tables diverge in schemas)
 * Incorrect type conversions (for ex, Decimal is converted to Double)

Additionally we're planning to beef up current Z-index implementation 
test-suite making sure that all critical flows of the Z-indexing have 
appropriate coverage.

[1] Actually, with more advanced analysis we could still prune the search 
space, but this requires substantial sophistication of the analysis conducted, 
which is beyond our current focus



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to