Luke Han created KYLIN-889:
------------------------------
Summary: Support more than one HDFS files of lookup table
Key: KYLIN-889
URL: https://issues.apache.org/jira/browse/KYLIN-889
Project: Kylin
Issue Type: Bug
Components: Job Engine
Affects Versions: v0.7.1
Reporter: Luke Han
Assignee: liyang
Fix For: v0.8.1, v0.7.4
There's assumption previous is lookup table should be small to fix into memory.
And there's validation rule to check if there's only one HDFS file for that
lookup table
But there are too many cases are facing such issue, also there's requirement to
support big lookup table.
Exception:
========================================
java.lang.IllegalStateException: Expect 1 and only 1 non-zero file under
hdfs://masters/apps/hive/warehouse/d_nw_ne_ecell2, but find 4
at
org.apache.kylin.dict.lookup.HiveTable.findOnlyFile(HiveTable.java:123)
at
org.apache.kylin.dict.lookup.HiveTable.computeHDFSLocation(HiveTable.java:107)
at
org.apache.kylin.dict.lookup.HiveTable.getHDFSLocation(HiveTable.java:83)
at
org.apache.kylin.dict.lookup.HiveTable.getFileTable(HiveTable.java:76)
at
org.apache.kylin.dict.lookup.HiveTable.getSignature(HiveTable.java:71)
at
org.apache.kylin.dict.DictionaryManager.buildDictionary(DictionaryManager.java:164)
at
org.apache.kylin.cube.CubeManager.buildDictionary(CubeManager.java:154)
at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:53)
at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
at
org.apache.kylin.job.hadoop.dict.CreateDictionaryJob.run(CreateDictionaryJob.java:53)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.kylin.job.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:107)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:132)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
result code:2
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)