lordk911 commented on issue #1476:
URL: https://github.com/apache/iceberg/issues/1476#issuecomment-694660247


   some thing I missed to mention, befor I do snapshot expired, I do the 
rewriteDataFiles operation to rewrite all the small data file to one big file , 
it generate the latest snapshot , then I try to expired all other snapshot , I 
think the all small data files should be deleted.
   
   ```
   scala> spark.sql("select manifest_list from 
hadoop_prod.ice.recmd_feedback_tb.snapshots").show(false)
   
+-------------------------------------------------------------------------------------------------------------------------------------+
   |manifest_list                                                               
                                                         |
   
+-------------------------------------------------------------------------------------------------------------------------------------+
   
|hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-6248485188751590692-1-7d3b47bc-a247-4701-9e05-780b1f53ba58.avro|
   
+-------------------------------------------------------------------------------------------------------------------------------------+
   
   hdfs dfs -text 
/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-6248485188751590692-1-7d3b47bc-a247-4701-9e05-780b1f53ba58.avro
   20/09/18 13:18:00 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m9.avro","manifest_length":5976,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":1},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":0},"partitions":{"array":[]},"added_rows_count":{"long":19859},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":0}}
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m4.avro","manifest_length":5858,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":0},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":1},"partitions":{"array":[]},"added_rows_count":{"long":0},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":1}}
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m6.avro","manifest_length":5923,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":0},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":1},"partitions":{"array":[]},"added_rows_count":{"long":0},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":2}}
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m7.avro","manifest_length":5888,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":0},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":1},"partitions":{"array":[]},"added_rows_count":{"long":0},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":2}}
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m3.avro","manifest_length":6151,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":0},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":3},"partitions":{"array":[]},"added_rows_count":{"long":0},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":4}}
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m1.avro","manifest_length":6094,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":0},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":2},"partitions":{"array":[]},"added_rows_count":{"long":0},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":3}}
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m2.avro","manifest_length":5833,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":0},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":1},"partitions":{"array":[]},"added_rows_count":{"long":0},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":1}}
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m5.avro","manifest_length":6454,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":0},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":3},"partitions":{"array":[]},"added_rows_count":{"long":0},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":3373}}
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m0.avro","manifest_length":6439,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":0},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":3},"partitions":{"array":[]},"added_rows_count":{"long":0},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":8042}}
   
{"manifest_path":"hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/7d3b47bc-a247-4701-9e05-780b1f53ba58-m8.avro","manifest_length":6473,"partition_spec_id":0,"added_snapshot_id":{"long":6248485188751590692},"added_data_files_count":{"int":0},"existing_data_files_count":{"int":0},"deleted_data_files_count":{"int":3},"partitions":{"array":[]},"added_rows_count":{"long":0},"existing_rows_count":{"long":0},"deleted_rows_count":{"long":8431}}
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to