Hi, We found that after calling the deleteCube() rest api, the cube was deleted, but the data generated during the cube building process were still left, listed as: 1. Folders begin with "kylin_job_meta" and "meta_tmp" in local machine's tomcat/tmp; 2. Tables begin with "KYLIN_" in hbase; 3. Folders begin with "kylin-" in hdfs; 4. Tables begin with "kylin_intermediate" in hive. Then we found we can run com.kylinolap.job.hadoop.cube.StorageCleanupJob to clean all the hbase and hdfs data, but seems not available for hive tables now. We add some code in StorageCleanupJob.java, made some improvement for the storage cleaning job, features listed as follows: 1. Finish the function "cleanUnusedIntermediateHiveTable(Configuration conf)", remove all the unused hive tables begin with "kylin_intermediate"; 2. Add function "cleanUnusedHBaseTables(Configuration conf, String[] uuids)", to remove hbase tables with job uuids; 3. Add function "cleanUnusedHdfsFiles(Configuration conf, String[] uuids)", to remove hdfs data with job uuids; 4. Add function "cleanUnusedIntermediateHiveTable(Configuration conf, String[] uuids)", to remove hive tables with job uuids. 5. Add command line option "--file" and "--uuids", to specify the uuids written in a file or/and with a string.
The code change is based on kylin0.6, and tested work well. I noticed that the latest version of StorageCleanupJob.java filtered out the hbase tables related with invertedindex from the drop list, no other change, so I think our code can still work well on the latest kylin. May I make a pull request for our code of features? Best Regards, George/倪春恩 Software Engineer/软件工程师 Mobile:+86-13501723787| Fax:+8610-56842040 北京明略软件系统有限公司(MiningLamp.COM) 北京市昌平区东小口镇中东路398号中煤建设集团大厦1号楼4层 F4,1#,Zhongmei Construction Group Plaza,398# Zhongdong Road,Changping District,Beijing,102218 ----------------------------------------------------------------------------------------------------------------------------
