[
https://issues.apache.org/jira/browse/KYLIN-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
PENG Zhengshuai updated KYLIN-4083:
-----------------------------------
Summary: Fact Distinct Column Step maybe failed or value lost when hashcode
of the UHC column value is Integer.MIN_VALUE (was: Fact Distinct Column Step
may fail or value lost when hashcode of the UHC column value is
Integer.MIN_VALUE)
> Fact Distinct Column Step maybe failed or value lost when hashcode of the UHC
> column value is Integer.MIN_VALUE
> ---------------------------------------------------------------------------------------------------------------
>
> Key: KYLIN-4083
> URL: https://issues.apache.org/jira/browse/KYLIN-4083
> Project: Kylin
> Issue Type: Bug
> Reporter: PENG Zhengshuai
> Assignee: PENG Zhengshuai
> Priority: Major
>
> In the Fact Distinct Column Step, kylin uses MR to de-dup the values of
> columns.
> If the column is UHC (ultra high cardinality) column and the value of the
> property *kylin.engine.mr.uhc-reducer-count* has been set greater than *1*,
> the Mapper task will write the output of UHC column values to different
> reducers by *FactDistinctColumnPartitioner* according to the reducer id
> The reducer id will be calculated by hash, the implementation in
> *FactDistinctColumnsReducerMapping#getReducerIdForCol*, in this method, *the
> reducer id = reducerBeginIndex + Math.abs(value.hashCode()) % uhcReducerCount*
> When the value.hashCode() is Integer.MIN_VALUE, the
> Math.abs(value.hashCode()) return also Integer.MIN_VALUE. Thus the reducer id
> may return a negative value. This may cause the FactDistinctColumn step
> failed, or the UHC column value may be redirected to another reducer which
> not belongs to UHC column
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)