[GitHub] [iceberg] ddcprg opened a new issue, #7192: OPTIMIZE operation prunes data not visible to the table

via GitHub Fri, 24 Mar 2023 02:12:33 -0700


ddcprg opened a new issue, #7192:
URL: https://github.com/apache/iceberg/issues/7192


   ### Feature Request / Improvement
   
   The **OPTIMIZE** job removes data from data files which don't have a column 
mapped in the table schema.
   
   For example, let's say that there is a table with column A and two data 
files: one with columns A and B and another file with columns A and C. When the 
optimize operation runs the optimization job will merge both files into a 
single file with only colunm A, i.e. data for columns B and C is lost.
   
   We would expect the job to merge the file schemas into a single schema with 
columns A, B and C to avoid data loss.
   
   
   
   ### Query engine
   
   Athena


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] ddcprg opened a new issue, #7192: OPTIMIZE operation prunes data not visible to the table

Reply via email to