Hi, yu feng, I’ve also noticed these files and opened a jira: https://issues.apache.org/jira/browse/KYLIN-978, and I’ll post a patch tonight.
Here’s my opinions on your three question, feel free to correct me: First, the data path of intermediate hive table should be deleted after building, I agreed with that. Second, the cuboid files will be used for merge and will be deleted when merging job completed, we need and must leave them on hdfs. The fact_distint_columns should be deleted. In additionally, the path of rowkey_stats and hfile should also be deleted. Third, there’s no garbage collection steps if a job discard, maybe we need a patch for this. Short answer: KYLIN-978 will clean all hdfs path except cuboid files after buildJob and mergeJob completed. The hdfs path will not be cleanup if a job was discarded, we need improvement on this. Best Regards, Yerui Sun [email protected] > 在 2015年9月10日,18:20,yu feng <[email protected]> 写道: > > I see this core Improvement in release 1.0, JIRA url : > https://issues.apache.org/jira/browse/KYLIN-926 > > However, after my test and check the source code , I find some rubbish(I am > not > sure) file in HDFS. > > First, kylin only drop the Intermediate table in hive, but the table is an > EXTERNAL table, the file still exist in kylin tmp directory in HDFS(I check > that..) > > Second, the cuboid files take a large space in HDFS, and kylin do not > delete after the cube build(fact_distinct_columns files exist too). I am > not sure if those has other effects, remind me please if it has.. > > Third, After I discard a job, I think kylin should delete the Intermediate > files and drop Intermediate hive table, even though delete > them asynchronous. I think those data do not have any effects..remind me > please if it has.. > > These are rubbish datas still exist in current version(kylin-1.0), please > check, thanks..
