Hi, yu feng,
  I’ve also noticed these files and opened a jira: 
https://issues.apache.org/jira/browse/KYLIN-978, and I’ll post a patch tonight.

  Here’s my opinions on your three question, feel free to correct me:

  First, the data path of intermediate hive table should be deleted after 
building, I agreed with that.

  Second, the cuboid files will be used for merge and will be deleted when 
merging job completed, we need and must leave them on hdfs. The 
fact_distint_columns should be deleted. In additionally, the path of 
rowkey_stats and hfile 
should also be deleted.

  Third, there’s no garbage collection steps if a job discard, maybe we need a 
patch for this.


Short answer: 
  KYLIN-978 will clean all hdfs path except cuboid files after buildJob and 
mergeJob completed. 
  The hdfs path will not be cleanup if a job was discarded, we need improvement 
on this.
 

Best Regards,
Yerui Sun
[email protected]



> 在 2015年9月10日,18:20,yu feng <[email protected]> 写道:
> 
> I see this core Improvement in release 1.0, JIRA url :
> https://issues.apache.org/jira/browse/KYLIN-926
> 
> However, after my test and check the source code , I find some rubbish(I am 
> not
> sure) file in HDFS.
> 
> First, kylin only drop the Intermediate table in hive, but the table is an
> EXTERNAL table, the file still exist in kylin tmp directory in HDFS(I check
> that..)
> 
> Second, the cuboid files take a large space in HDFS, and kylin do not
> delete after the cube build(fact_distinct_columns files exist too). I am
> not sure if those has other effects, remind me please if it has..
> 
> Third, After I discard a job, I think kylin should delete the Intermediate
> files and drop Intermediate hive table, even though delete
> them asynchronous. I think those data do not have any effects..remind me
> please if it has..
> 
> These are rubbish datas still exist in current version(kylin-1.0), please
> check, thanks..

Reply via email to