[ 
https://issues.apache.org/jira/browse/KYLIN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612285#comment-16612285
 ] 

Iñigo Martinez commented on KYLIN-3555:
---------------------------------------

Hi Shaofeng.

This is our config.

kylin.env.hdfs-working-dir=s3://XXXXXXX-emr-kylin/kylin/
kylin.storage.hbase.cluster-fs=s3://XXXXXXX-emr-kylin/hbase/

In 2.4.0 the config is exactly the same and this problem is not present.

We have compared 2.4.1 and 2.4.0 and it seems that some changes has been done 
in Garbage method.

https://github.com/apache/kylin/commit/3177d79ca5cd8533164319acda8676684a6d307e#diff-784d6aaca261296ea182222c7dd2de78

 

> Garbage collection on HBase step fails with S3 selected as storage
> ------------------------------------------------------------------
>
>                 Key: KYLIN-3555
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3555
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v2.4.1
>            Reporter: Iñigo Martinez
>            Priority: Major
>              Labels: build
>         Attachments: Screenshot from 2018-09-11 12-31-25.png
>
>
> When building a cube with S3 selected has storage, build process fails at 
> latest step.
> Although s3 has been defined as storage, cleanup task tries to delete from 
> HDFS and, of course, there is no file at HDFS.
>  
> {code:java}
> 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> s3://XXXXXXX-emr-kylin
> 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  is dropped.
> 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:87 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile
>  is dropped.
> 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: 
> hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020
> 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:90 : HDFS path 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns
>  not exists.
> 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job 
> f8416975-eea6-4500-9cb7-4374f28451dc-237] 
> steps.HDFSPathGarbageCollectionStep:68 : 
> job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception
> java.io.FileNotFoundException: File 
> /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.dropHdfsPathOnCluster(HDFSPathGarbageCollectionStep.java:95)
> at 
> org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:65)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69)
> at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162)
> at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748){code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to