[ https://issues.apache.org/jira/browse/KYLIN-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614536#comment-16614536 ]
Iñigo Martinez commented on KYLIN-3555: --------------------------------------- Hi Shaofeng. According to install documentation for S3 / EMR, we should use an absolute URI. [http://kylin.apache.org/docs23/install/kylin_aws_emr.html] By using a relative path, it fails into HDFS. This causes some troubles because we use a different cluster for Hive jobs and only way to share files between hive and our kylin build clusters is by using S3 storage. > Garbage collection on HBase step fails with S3 selected as storage > ------------------------------------------------------------------ > > Key: KYLIN-3555 > URL: https://issues.apache.org/jira/browse/KYLIN-3555 > Project: Kylin > Issue Type: Bug > Components: Job Engine > Affects Versions: v2.4.1 > Reporter: Iñigo Martinez > Priority: Major > Labels: build > Attachments: Screenshot from 2018-09-11 12-31-25.png > > > When building a cube with S3 selected has storage, build process fails at > latest step. > Although s3 has been defined as storage, cleanup task tries to delete from > HDFS and, of course, there is no file at HDFS. > > {code:java} > 2018-09-11 12:27:56,311 DEBUG [Scheduler 1407846257 Job > f8416975-eea6-4500-9cb7-4374f28451dc-237] > steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: > s3://XXXXXXX-emr-kylin > 2018-09-11 12:27:57,364 DEBUG [Scheduler 1407846257 Job > f8416975-eea6-4500-9cb7-4374f28451dc-237] > steps.HDFSPathGarbageCollectionStep:87 : HDFS path > /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns > is dropped. > 2018-09-11 12:27:58,104 DEBUG [Scheduler 1407846257 Job > f8416975-eea6-4500-9cb7-4374f28451dc-237] > steps.HDFSPathGarbageCollectionStep:87 : HDFS path > /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/hfile > is dropped. > 2018-09-11 12:27:58,140 DEBUG [Scheduler 1407846257 Job > f8416975-eea6-4500-9cb7-4374f28451dc-237] > steps.HDFSPathGarbageCollectionStep:78 : Drop HDFS path on FileSystem: > hdfs://ip-10-0-1-63.eu-west-1.compute.internal:8020 > 2018-09-11 12:27:58,142 DEBUG [Scheduler 1407846257 Job > f8416975-eea6-4500-9cb7-4374f28451dc-237] > steps.HDFSPathGarbageCollectionStep:90 : HDFS path > /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1/fact_distinct_columns > not exists. > 2018-09-11 12:27:58,147 ERROR [Scheduler 1407846257 Job > f8416975-eea6-4500-9cb7-4374f28451dc-237] > steps.HDFSPathGarbageCollectionStep:68 : > job:f8416975-eea6-4500-9cb7-4374f28451dc-15 execute finished with exception > java.io.FileNotFoundException: File > /kylin/kylin_metadata/kylin-f8416975-eea6-4500-9cb7-4374f28451dc/plataforma_transacciones_cubo_v1 > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:904) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971) > at > org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.dropHdfsPathOnCluster(HDFSPathGarbageCollectionStep.java:95) > at > org.apache.kylin.storage.hbase.steps.HDFSPathGarbageCollectionStep.doWork(HDFSPathGarbageCollectionStep.java:65) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:162) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)