Hi,

We found that after calling the deleteCube() rest api, the cube was deleted, 
but the data generated during the cube building process were still left, listed 
as:
1. Folders begin with  "kylin_job_meta" and "meta_tmp" in local machine's 
tomcat/tmp;
2. Tables begin with "KYLIN_" in hbase;
3. Folders begin with "kylin-" in hdfs;
4. Tables begin with "kylin_intermediate" in hive.
Then we found we can run com.kylinolap.job.hadoop.cube.StorageCleanupJob to 
clean all the hbase and hdfs data, but seems not available for hive tables now. 
We add some code in StorageCleanupJob.java, made some improvement for the 
storage cleaning job, features listed as follows:
1. Finish the function "cleanUnusedIntermediateHiveTable(Configuration conf)", 
remove all the unused hive tables begin with "kylin_intermediate";
2. Add function "cleanUnusedHBaseTables(Configuration conf, String[] uuids)", 
to remove hbase tables with job uuids;
3. Add function "cleanUnusedHdfsFiles(Configuration conf, String[] uuids)", to 
remove hdfs data with job uuids;
4. Add function "cleanUnusedIntermediateHiveTable(Configuration conf, String[] 
uuids)", to remove hive tables with job uuids.
5. Add command line option "--file" and "--uuids", to specify the uuids written 
in a file or/and with a string.

The code change is based on kylin0.6, and tested work well. I noticed that the 
latest version of StorageCleanupJob.java filtered out the hbase tables related 
with invertedindex from the drop list, no other change, so I think our code can 
still work well on the latest kylin. 

May I  make a pull request for our code of features?

 


Best Regards,
 
George/倪春恩
Software Engineer/软件工程师
Mobile:+86-13501723787| Fax:+8610-56842040
北京明略软件系统有限公司(MiningLamp.COM)
北京市昌平区东小口镇中东路398号中煤建设集团大厦1号楼4层
F4,1#,Zhongmei Construction Group Plaza,398# Zhongdong Road,Changping 
District,Beijing,102218
----------------------------------------------------------------------------------------------------------------------------

Reply via email to