Alexey Kudinkin created HUDI-2814:
-------------------------------------
Summary: Address issues w/ Z-order Layout Optimization
Key: HUDI-2814
URL: https://issues.apache.org/jira/browse/HUDI-2814
Project: Apache Hudi
Issue Type: Bug
Components: Index
Reporter: Alexey Kudinkin
During extensive testing following issues have been discovered, which we're
planning to addres in the upcoming PR:
* Data-skipping seq incorrectly handles cases when columns that are not
Z-sorted are present in the query (it simply ignores this fact, while it should
abandon pruning altogether[1])
* Exception w/in file-pruning seq should not be affecting overall query (it
should in the worst case fallback to full-scan)
* Merging seq prefers records from the old Z-index table, while should prefer
those from the new one.
* After clustering columns change, Z-index should simply overwrite index
(currently it actually does the opposite – it skips updating the index in case
old and new tables diverge in schemas)
* Incorrect type conversions (for ex, Decimal is converted to Double)
Additionally we're planning to beef up current Z-index implementation
test-suite making sure that all critical flows of the Z-indexing have
appropriate coverage.
[1] Actually, with more advanced analysis we could still prune the search
space, but this requires substantial sophistication of the analysis conducted,
which is beyond our current focus
--
This message was sent by Atlassian Jira
(v8.20.1#820001)