rdblue commented on pull request #1947:
URL: https://github.com/apache/iceberg/pull/1947#issuecomment-748246932


   @aokolnychyi, I agree with the idea to have a flag to disable global sort. 
Probably best to do this specific to copy-on-write because delta writes will 
need to be sorted by `_file` and `_pos` for deletes and we expect the inserts 
to be much, much smaller than the copy-on-write data. If we aren't rewriting 
retained rows, I think the global sort (with a repartition as you said) would 
be much cheaper.
   
   For sorting by `_file` and `_pos`, what if we only did that for existing 
rows? We can discard the columns for updated rows. That way we rewrite the data 
files as though the rows were deleted and append the inserts and updates 
together. We may even want to do this in all cases: always prepend `_file` and 
`_pos` to whatever sort order we inject.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to