Good summary and answer, thank you Yerui! On 9/10/15, 6:36 PM, "Yerui Sun" <[email protected]> wrote:
>Hi, yu feng, > I’ve also noticed these files and opened a jira: >https://issues.apache.org/jira/browse/KYLIN-978, and I’ll post a patch >tonight. > > Here’s my opinions on your three question, feel free to correct me: > > First, the data path of intermediate hive table should be deleted after >building, I agreed with that. > > Second, the cuboid files will be used for merge and will be deleted >when merging job completed, we need and must leave them on hdfs. The >fact_distint_columns should be deleted. In additionally, the path of >rowkey_stats and hfile >should also be deleted. > > Third, there’s no garbage collection steps if a job discard, maybe we >need a patch for this. > > >Short answer: > KYLIN-978 will clean all hdfs path except cuboid files after buildJob >and mergeJob completed. > The hdfs path will not be cleanup if a job was discarded, we need >improvement on this. > > >Best Regards, >Yerui Sun >[email protected] > > > >> 在 2015年9月10日,18:20,yu feng <[email protected]> 写道: >> >> I see this core Improvement in release 1.0, JIRA url : >> https://issues.apache.org/jira/browse/KYLIN-926 >> >> However, after my test and check the source code , I find some >>rubbish(I am not >> sure) file in HDFS. >> >> First, kylin only drop the Intermediate table in hive, but the table is >>an >> EXTERNAL table, the file still exist in kylin tmp directory in HDFS(I >>check >> that..) >> >> Second, the cuboid files take a large space in HDFS, and kylin do not >> delete after the cube build(fact_distinct_columns files exist too). I am >> not sure if those has other effects, remind me please if it has.. >> >> Third, After I discard a job, I think kylin should delete the >>Intermediate >> files and drop Intermediate hive table, even though delete >> them asynchronous. I think those data do not have any effects..remind me >> please if it has.. >> >> These are rubbish datas still exist in current version(kylin-1.0), >>please >> check, thanks.. >
