nichunen created KYLIN-998:
------------------------------

             Summary: Finish the hive intermediate table clean up job in 
org.apache.kylin.job.hadoop.cube.StorageCleanupJob
                 Key: KYLIN-998
                 URL: https://issues.apache.org/jira/browse/KYLIN-998
             Project: Kylin
          Issue Type: Improvement
          Components: Job Engine
            Reporter: nichunen
            Assignee: ZhouQianhao


Current kylin has its last cube building job step named “Garbage Collection” to 
remove the intermediate data in hdfs/hbase/hive. But if the job is accidentally 
stopped like problem in hadoop cluster, bad cube design, discarded by user, the 
data was left un-deleted. 

In such cases, we can run "hbase org.apache.hadoop.util.RunJar 
$KYLIN_HOME/lib/kylin-job-0.8.1-incubating-SNAPSHOT.jar 
org.apache.kylin.job.hadoop.cube.StorageCleanupJob --delete true" to remove the 
data. But the method "cleanUnusedIntermediateHiveTable" is unfinished.

My first patch is to finish the method, it will remove unused hive tables with 
names begin with "kylin_intermediate_".

My second patch add some methods to enable deleting unused data with uuids in 
command line, or stored in a file.

I don't know whether the second patch is useful to you, it's used in our kylin 
server to remove data after one cube is deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to