Stephen-Robin edited a comment on issue #2195:
URL: https://github.com/apache/iceberg/issues/2195#issuecomment-771781055


   > That does seem pretty dangerous although I haven't seen this happen 
before. Do you have the snapshot summary for the compaction operation? It 
should hopefully also provide some hints as to what is going on It would also 
be great if you had a programmatic reproduction of the issue.
   
   @RussellSpitzer Thanks for your reply
   I just ran my previous experiment again and  will introduce my reproduction 
process in detail below.
   Creating tables and inserting data have been mentioned above, Then 
   
![image](https://user-images.githubusercontent.com/77377842/106631262-8cc3e100-65b7-11eb-87a8-dc99ac4fb786.png)
   
![image](https://user-images.githubusercontent.com/77377842/106631285-92212b80-65b7-11eb-9b42-0cf61d83a827.png)
   
   data files as follows
   
![image](https://user-images.githubusercontent.com/77377842/106633145-7880e380-65b9-11eb-8b87-57b154be15ce.png)
   
   Then execute compaction:
   
![image](https://user-images.githubusercontent.com/77377842/106631412-b7ae3500-65b7-11eb-9eb0-514ff8a065d2.png)
   
   During the execution, I debugged.
   
![image](https://user-images.githubusercontent.com/77377842/106632069-533fa580-65b8-11eb-8539-feac7d78f7df.png)
   
   The variable `combinedScanTasks` includes the divided small file part B 
(1407488 bytes), but there is no part A (10M) file.
   
![image](https://user-images.githubusercontent.com/77377842/106632142-6488b200-65b8-11eb-838f-1ff9d8b81a6a.png)
   The Variable `currentDataFiles` includes an initial file of 11MB in size. 
   The file of 11MB will be deleted during commit, but the 10M file B has not 
been rewritten.
   Then snapshot file concerned as follows:
   
![image](https://user-images.githubusercontent.com/77377842/106635461-d1517b80-65bb-11eb-8356-99dfcb2c47ba.png)
   
   
![image](https://user-images.githubusercontent.com/77377842/106635316-a8c98180-65bb-11eb-9d17-696729a6200b.png)
   
   
![image](https://user-images.githubusercontent.com/77377842/106632492-c5b08580-65b8-11eb-9b2b-e0d414f83a38.png)
   
   show the rows of table
   
![image](https://user-images.githubusercontent.com/77377842/106632545-d660fb80-65b8-11eb-81da-864e66f508a5.png)
   
![image](https://user-images.githubusercontent.com/77377842/106632610-e973cb80-65b8-11eb-9441-a566041c5c67.png)
   We can find that the 11MB data file has been lost
   
   data files as follows:
   
![image](https://user-images.githubusercontent.com/77377842/106632989-4cfdf900-65b9-11eb-8213-21f534f997d8.png)
   The red box is the new rewritten data
   
   
   I think the problem described in my issue may exist.
   Looking forward to your reply
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to