[ https://issues.apache.org/jira/browse/HIVE-17124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16491927#comment-16491927 ]
Wang Haihua commented on HIVE-17124: ------------------------------------ Any review update? And which case does this patch fixed? We suffer from running repeatedly query with dynamic partition with distributed by rand , result in data count inconsistency Thanks [~gopalv] > PlanUtils: Rand() is not a failure-tolerant distribution column > --------------------------------------------------------------- > > Key: HIVE-17124 > URL: https://issues.apache.org/jira/browse/HIVE-17124 > Project: Hive > Issue Type: Bug > Components: Query Planning > Affects Versions: 2.3.0, 3.0.0 > Reporter: Gopal V > Assignee: Gopal V > Priority: Major > Attachments: HIVE-17124.1.patch > > > {code} > else { > // numPartitionFields = -1 means random partitioning > > partitionCols.add(TypeCheckProcFactory.DefaultExprProcessor.getFuncExprNodeDesc("rand")); > } > {code} > This causes known data corruption during failure tolerance operations. > There is a failure tolerant distribution function inside ReduceSinkOperator, > which kicks in automatically when using no partition columns > {code} > if (partitionEval.length == 0) { > // If no partition cols, just distribute the data uniformly > // to provide better load balance. If the requirement is to have a > single reducer, we should > // set the number of reducers to 1. Use a constant seed to make the > code deterministic. > if (random == null) { > random = new Random(12345); > } > keyHashCode = random.nextInt(); > } > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)