marton-bod opened a new pull request #2779:
URL: https://github.com/apache/hive/pull/2779


   ### What changes were proposed in this pull request?
   Revamp the algorithm which detects reordered columns. The current 
implementation is faulty. The new idea is too look for the column which has the 
highest index difference between its position in the HMS and Iceberg schemas.
   E.g. Current schema: A, B, C, D
   New schema (A moved to the end): B, C, D, A
   Index difference for each column: A: 3, B: 1, C: 1, D: 1
   So we know that A was the one that got moved.
   
   In general, there are 3 scenarios: 
   1) highest index diff = 0 -> there were no reorders
   2) highest index diff = 1 -> two adjacent columns got swapped with each 
other:
   E.g. A, B, C, D -> A, C, B, D
   In this case we cannot identify for sure which one got moved by the user, 
but we it's okay because we can either do moveAfter(C, A) or moveAfter(B, C), 
they should be equivalent operations
   3) highest index diff > 1 -> the reordered column can be identified 
definitively
   
   ### Why are the changes needed?
   Fix correctness problem
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   new unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to