Hi Everyone,

Currently Spark-Procedures supports *expire_snapshots/remove_orphan_files *per
table.

Today, if someone has to run GCs on an entire catalog they will have to
manually run these procedures for every table.

Is it a good idea to do it in bulk as per catalog or with multiple tables ?

Current syntax:

CALL hive_prod.system.expire_snapshots(table => 'db.sample', <Options>)

Proposed Syntax something similar:

Per Namespace/Database

CALL hive_prod.system.expire_snapshots(database => 'db', <Options>)

Per Catalog

CALL hive_prod.system.expire_snapshots(<Options>)

Multiple Tables

CALL hive_prod.system.expire_snapshots(tables => Array('db1.table1',
'db2.table2), <Options>)

PS: There could be exceptions for individual catalogs. Like Nessie doesn't
support GC other than Nessie CLI. Hadoop can't list all the Namespaces.


Regards,
Naveen Kumar

Reply via email to