lordk911 opened a new issue #1476:
URL: https://github.com/apache/iceberg/issues/1476
I'm testing with spark3.0.1 and cdh5.14 ,iceberg0.9.1. and spark-shell
catalog config is :
spark.sql.catalog.hadoop_prod
org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.hadoop_prod.type hadoop
spark.sql.catalog.hadoop_prod.warehouse
hdfs://hdfsnamespace/user/hive/warehouse
```
scala> val tsToExpire = System.currentTimeMillis() - (1000 * 60 * 60 * 5)
tsToExpire: Long = 1600378123947
scala> table.expireSnapshots().expireOlderThan(tsToExpire).commit()
20/09/18 10:29:20 WARN RemoveSnapshots: Manifests to delete:
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/ecaf23d3-083d-4ce5-bcdb-6768cbf952cc-m0.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/18b7ea50-7378-420e-b1a4-b73f099ad5c7-m0.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/85196053-4320-4a83-b5c8-ce45f34246d2-m0.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/0c988e30-f620-4981-9baa-29e546440a65-m0.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/716cce5b-5831-4d47-b69a-1847fc867356-m0.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/867e6411-41e4-485a-9819-29e85a93369b-m0.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/d1b7e3c8-1ded-4245-b015-7f8904a690b5-m0.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/56334be8-74b0-4966-994a-a2a660470825-m0.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_fe
edback_tb/metadata/437af48e-b989-48fc-bffc-0687208f44da-m0.avro
20/09/18 10:29:20 WARN RemoveSnapshots: Manifests Lists to delete:
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-2996380732886072886-1-ecaf23d3-083d-4ce5-bcdb-6768cbf952cc.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-7976082847581312140-1-716cce5b-5831-4d47-b69a-1847fc867356.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-7182735222234677122-1-18b7ea50-7378-420e-b1a4-b73f099ad5c7.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-8860502416692177744-1-867e6411-41e4-485a-9819-29e85a93369b.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-7216258923298830023-1-d1b7e3c8-1ded-4245-b015-7f8904a690b5.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-6732163628332851015-1-85196053-4320-4a83-b5c8-ce45f34246d2.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-4590522390924737287-1-437af48e-b989-48fc-bffc-06
87208f44da.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-7425044393989428200-1-67f056ad-64f9-4612-9645-e3a8f8e64728.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-4183102213851551112-1-56334be8-74b0-4966-994a-a2a660470825.avro,
hdfs://nameservice1/tmp/warehouse/ice/recmd_feedback_tb/metadata/snap-841739982298587714-1-0c988e30-f620-4981-9baa-29e546440a65.avro
scala> table.expireSnapshots().retainLast(2).commit()
scala> table.expireSnapshots().expireOlderThan(tsToExpire).commit()
scala> spark.sql("select * from
hadoop_prod.ice.recmd_feedback_tb.history").show(false)
+-----------------------+-------------------+-------------------+-------------------+
|made_current_at |snapshot_id |parent_id
|is_current_ancestor|
+-----------------------+-------------------+-------------------+-------------------+
|2020-09-18 09:28:55.788|6248485188751590692|7976082847581312140|true
|
+-----------------------+-------------------+-------------------+-------------------+
scala> spark.sql("select * from
hadoop_prod.ice.recmd_feedback_tb.snapshots").show
+--------------------+-------------------+-------------------+---------+--------------------+--------------------+
| committed_at| snapshot_id| parent_id|operation|
manifest_list| summary|
+--------------------+-------------------+-------------------+---------+--------------------+--------------------+
|2020-09-17 17:22:...|7182735222234677122| null|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-17 17:22:...|6732163628332851015|7182735222234677122|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-17 17:23:...|7216258923298830023|6732163628332851015|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-17 17:23:...| 841739982298587714|7216258923298830023|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-17 17:24:...|4590522390924737287| 841739982298587714|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-17 17:25:...|7425044393989428200|4590522390924737287|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-17 17:25:...|8860502416692177744|7425044393989428200|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-17 17:26:...|2996380732886072886|8860502416692177744|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-17 17:26:...|4183102213851551112|2996380732886072886|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-17 17:27:...|7976082847581312140|4183102213851551112|
append|hdfs://nameservic...|[spark.app.id -> ...|
|2020-09-18 09:28:...|6248485188751590692|7976082847581312140|
replace|hdfs://nameservic...|[added-data-files...|
+--------------------+-------------------+-------------------+---------+--------------------+--------------------+
```
after expireSnapshots there is only one history, but the snapshots can also
be query out , and the data files still on HDFS ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]