Richard Calaba created KYLIN-1835:
-------------------------------------
Summary: Error: java.lang.NumberFormatException: For input
count_distinct on Big Int ??? (#7 Step Name: Build Base Cuboid Data)
Key: KYLIN-1835
URL: https://issues.apache.org/jira/browse/KYLIN-1835
Project: Kylin
Issue Type: Bug
Affects Versions: v1.5.2, v1.5.2.1
Reporter: Richard Calaba
Priority: Critical
I believe I have discovered an error in Kylin realted to count_distinc with
exact precission.
I am not 100% sure - but all points to the fact tha there is a design limit dor
count_distinct ... please assess / confirm / reject my observation.
Background info:
=============
- large fact table ~ 100 mio rows.
- large customer dimension ~ 10 mio rows
Defined 2 KPIs of type COUNT_DISTINCT - with exact precision (return type
bitma) on 2 high-cardinality fields of type Bigint
Cube Build runs fine till #7 Step Name: Build Base Cuboid Data - where it
errors out without further details in Kylin Log - it shows only "no counters
for job job_1463699962519_16085".
The MR Logs of the job job_1463699962519_16085 sow exceptions:
2016-06-28 02:22:24,019 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.lang.NumberFormatException: For input string:
"-6628245177096591402"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:495)
at java.lang.Integer.parseInt(Integer.java:527)
at
org.apache.kylin.measure.bitmap.BitmapCounter.add(BitmapCounter.java:63)
at
org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:106)
at
org.apache.kylin.measure.bitmap.BitmapMeasureType$1.valueOf(BitmapMeasureType.java:98)
at
org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValueOf(BaseCuboidMapperBase.java:189)
at
org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.buildValue(BaseCuboidMapperBase.java:159)
at
org.apache.kylin.engine.mr.steps.BaseCuboidMapperBase.outputKV(BaseCuboidMapperBase.java:206)
at
org.apache.kylin.engine.mr.steps.HiveToBaseCuboidMapper.map(HiveToBaseCuboidMapper.java:53)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:345)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Just reading the signature of the exception and connecting the Measure
precision return type "bitmap" => looks like that because I have chosen exact
precision (which on UI says supported for int types) is causing this exception
because I am passing Bigint field ????
If so -> is that a bug or design limitation ??? Cannot be the count_distinct
implemented for bigint (with exact precision) or do I have to use
count_distinct with error rate instead ???
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)