Vsevolod Ostapenko created KYLIN-3686:
-----------------------------------------
Summary: Top_N metric code requires cube storage type to be
ID_SHARDED_HBASE, but the Web UI defaults to ID_HBASE and provides no
safeguards against storage type mismatch
Key: KYLIN-3686
URL: https://issues.apache.org/jira/browse/KYLIN-3686
Project: Kylin
Issue Type: Improvement
Components: Measure - TopN, Metadata, Web
Affects Versions: v2.5.0
Environment: HDP 2.5.6, Kylin 2.5
Reporter: Vsevolod Ostapenko
When new cube is defined via Kylin 2.5 UI, the default cube storage type is set
to 0 (ID_HBASE).
Top_N metric support is currently hard coded to expect cube storage type 2
(ID_SHARDED_HBASE), and it *_does not_* check if the cube storage type is the
"sharded HBASE".
UI provides no safeguards either to prevent a user from defining a cube with
Top_N metric that would blow up on the cube building stage with a perplexing
stack trace like the following:
{quote}2018-11-08 08:35:45,413 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.IllegalArgumentException: Can't read
partitions file at
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76) at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136) at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:701) at
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at
org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164) Caused by:
java.io.IOException: wrong key class:
org.apache.kylin.storage.hbase.steps.RowKeyWritable is not class
org.apache.hadoop.hbase.io.ImmutableBytesWritable at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2332) at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2384) at
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:306)
at
org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88)
... 10 more
{quote}
Please, either
** modify Top_N code to support all cube storage types (not only
ID_SHARDED_HBASE),
or
**modify Top_N code to perform explicit check for cube storage type and raise
descriptive exception, when cube storage is not the one that is expected. Plus
update the UI to prevent the user from creating cube definitions that are
incompatible with the storage type compatible with Top_N measure
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)