[jira] [Resolved] (KYLIN-4299) Issue with building real-time segment cache into HBase when using S3 as working dir

nichunen (Jira) Thu, 06 Feb 2020 21:12:18 -0800


     [ 
https://issues.apache.org/jira/browse/KYLIN-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


nichunen resolved KYLIN-4299.
-----------------------------
    Resolution: Fixed

> Issue with building real-time segment cache into HBase when using S3 as 
> working dir
> -----------------------------------------------------------------------------------
>
>                 Key: KYLIN-4299
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4299
>             Project: Kylin
>          Issue Type: Bug
>          Components: Real-time Streaming
>    Affects Versions: v3.0.0-alpha2
>            Reporter: Andras Istvan Nagy
>            Assignee: Xiaoxiang Yu
>            Priority: Major
>             Fix For: v3.1.0
>
>
> We have an issue with using S3 as working dir for Kylin when using real-time 
> streaming. The reason why we would like to do this is to have no state in 
> HDFS, so the actual runtime environment running Kylin becomes stateless. 
> We already have HBase data on S3, but there is persistent data also in 
> {{kylin.env.hdfs-working-dir}} (cube dictionaries), so we need to have that 
> in S3 as well to have a setup where it's possible to fail over to a new 
> cluster without having to rebuild all cubes.
> We are using the real-time streaming feature in Kylin, which persists segment 
> caches hourly and a MR job merges those hourly segments into HBase. In these 
> MR jobs, we get the following exception:
> {code:java}
> Error: java.lang.IllegalArgumentException: Wrong FS: 
> s3://kylin-XXXXX/kylin-dev/hdfs-rootdir/kylin_metadata/stream/tops_jaywalks/20191206010000_20191206020000/1/1,
>  expected: hdfs://ip-24-0-3-243.us-west-2.compute.internal:8020 at 
> org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669) at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:214)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:897)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
>  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1551) at 
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1577) at 
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1625) at 
> org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:1808) at 
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1807) at 
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1785) at 
> org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1887) at 
> org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1885) at 
> org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.checkPath(ColumnarFilesReader.java:46)
>  at 
> org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.<init>(ColumnarFilesReader.java:41)
>  at 
> org.apache.kylin.engine.mr.streaming.DictsReader.<init>(DictsReader.java:43) 
> at 
> org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.init(ColumnarSplitDictReader.java:65)
>  at 
> org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.<init>(ColumnarSplitDictReader.java:52)
>  at 
> org.apache.kylin.engine.mr.streaming.ColumnarSplitDictInputFormat.createRecordReader(ColumnarSplitDictInputFormat.java:32)
>  at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:524)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) at 
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>  at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
>  at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
>  at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (KYLIN-4299) Issue with building real-time segment cache into HBase when using S3 as working dir

Reply via email to