[GitHub] [iceberg] RussellSpitzer commented on pull request #2591: Spark: RewriteDatafilesAction V2

GitBox Fri, 21 May 2021 06:11:14 -0700


RussellSpitzer commented on pull request #2591:
URL: https://github.com/apache/iceberg/pull/2591#issuecomment-845939886



   Thanks for discussing it with me, I definitely want to make this efficient 
and I know some of our future plans haven’t been documented yet. On thing I 
really want to finish is distributed planning where no machine ever gets the 
full scan plan.
   
   Sent from my iPhone
   
   > On May 21, 2021, at 12:11 AM, Jack Ye ***@***.***> wrote:
   > 
   > 
   > We talked about this previously as a possible post-merge, post-delete, 
post-rewrite sort of thing.
   > 
   > Cool, that cleanUnreferencedDeleteFiles() was just a divergent thought, 
great that we already thought about it.
   > 
   > If file C for example is the correct size, and we never need to rewrite 
it, we never clean up those deletes so we still have to make another sort of 
action to clean up those files.
   > 
   > Yes, that goes back to what I was thinking before, if we can have an 
option to force check the delete file and avoid filtering it out of the 
rewrite, then it should work.
   > 
   > But I think I am starting to see where you are coming from. If this is 
done as a different action then we can save the write time if the file read 
does not contain any rows to delete in the delete file. To enable such a check 
in Spark, it cannot use the same code path that fully read all the rows and 
write it back. So it probably does not make sense to add delete functionality 
from that perspective. Thanks for the clarification!
   > 
   > —
   > You are receiving this because you authored the thread.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] RussellSpitzer commented on pull request #2591: Spark: RewriteDatafilesAction V2

Reply via email to