Good summary and answer, thank you Yerui!

On 9/10/15, 6:36 PM, "Yerui Sun" <[email protected]> wrote:

>Hi, yu feng,
>  I’ve also noticed these files and opened a jira:
>https://issues.apache.org/jira/browse/KYLIN-978, and I’ll post a patch
>tonight.
>
>  Here’s my opinions on your three question, feel free to correct me:
>
>  First, the data path of intermediate hive table should be deleted after
>building, I agreed with that.
>
>  Second, the cuboid files will be used for merge and will be deleted
>when merging job completed, we need and must leave them on hdfs. The
>fact_distint_columns should be deleted. In additionally, the path of
>rowkey_stats and hfile
>should also be deleted.
>
>  Third, there’s no garbage collection steps if a job discard, maybe we
>need a patch for this.
>
>
>Short answer: 
>  KYLIN-978 will clean all hdfs path except cuboid files after buildJob
>and mergeJob completed.
>  The hdfs path will not be cleanup if a job was discarded, we need
>improvement on this.
> 
>
>Best Regards,
>Yerui Sun
>[email protected]
>
>
>
>> 在 2015年9月10日,18:20,yu feng <[email protected]> 写道:
>> 
>> I see this core Improvement in release 1.0, JIRA url :
>> https://issues.apache.org/jira/browse/KYLIN-926
>> 
>> However, after my test and check the source code , I find some
>>rubbish(I am not
>> sure) file in HDFS.
>> 
>> First, kylin only drop the Intermediate table in hive, but the table is
>>an
>> EXTERNAL table, the file still exist in kylin tmp directory in HDFS(I
>>check
>> that..)
>> 
>> Second, the cuboid files take a large space in HDFS, and kylin do not
>> delete after the cube build(fact_distinct_columns files exist too). I am
>> not sure if those has other effects, remind me please if it has..
>> 
>> Third, After I discard a job, I think kylin should delete the
>>Intermediate
>> files and drop Intermediate hive table, even though delete
>> them asynchronous. I think those data do not have any effects..remind me
>> please if it has..
>> 
>> These are rubbish datas still exist in current version(kylin-1.0),
>>please
>> check, thanks..
>

Reply via email to