[
https://issues.apache.org/jira/browse/KYLIN-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
nichunen resolved KYLIN-4299.
-----------------------------
Resolution: Fixed
> Issue with building real-time segment cache into HBase when using S3 as
> working dir
> -----------------------------------------------------------------------------------
>
> Key: KYLIN-4299
> URL: https://issues.apache.org/jira/browse/KYLIN-4299
> Project: Kylin
> Issue Type: Bug
> Components: Real-time Streaming
> Affects Versions: v3.0.0-alpha2
> Reporter: Andras Istvan Nagy
> Assignee: Xiaoxiang Yu
> Priority: Major
> Fix For: v3.1.0
>
>
> We have an issue with using S3 as working dir for Kylin when using real-time
> streaming. The reason why we would like to do this is to have no state in
> HDFS, so the actual runtime environment running Kylin becomes stateless.
> We already have HBase data on S3, but there is persistent data also in
> {{kylin.env.hdfs-working-dir}} (cube dictionaries), so we need to have that
> in S3 as well to have a setup where it's possible to fail over to a new
> cluster without having to rebuild all cubes.
> We are using the real-time streaming feature in Kylin, which persists segment
> caches hourly and a MR job merges those hourly segments into HBase. In these
> MR jobs, we get the following exception:
> {code:java}
> Error: java.lang.IllegalArgumentException: Wrong FS:
> s3://kylin-XXXXX/kylin-dev/hdfs-rootdir/kylin_metadata/stream/tops_jaywalks/20191206010000_20191206020000/1/1,
> expected: hdfs://ip-24-0-3-243.us-west-2.compute.internal:8020 at
> org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669) at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:214)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:897)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1551) at
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1577) at
> org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1625) at
> org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:1808) at
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1807) at
> org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1785) at
> org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1887) at
> org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1885) at
> org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.checkPath(ColumnarFilesReader.java:46)
> at
> org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.<init>(ColumnarFilesReader.java:41)
> at
> org.apache.kylin.engine.mr.streaming.DictsReader.<init>(DictsReader.java:43)
> at
> org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.init(ColumnarSplitDictReader.java:65)
> at
> org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.<init>(ColumnarSplitDictReader.java:52)
> at
> org.apache.kylin.engine.mr.streaming.ColumnarSplitDictInputFormat.createRecordReader(ColumnarSplitDictInputFormat.java:32)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:524)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at
> java.security.AccessController.doPrivileged(Native Method) at
> javax.security.auth.Subject.doAs(Subject.java:422) at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) at
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71)
> at
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179)
> at
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)