[
https://issues.apache.org/jira/browse/HIVE-7372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14058279#comment-14058279
]
Chengxiang Li commented on HIVE-7372:
-------------------------------------
{quote}
Thanks for working on this, Chengxiang Li. Patch looks good to me. One minor
nit, for cloning, it might be better to reuse some existing utility methods, or
put our implementation in a utility class for later reuse.
{quote}
I took this as a POC workround and do not pay more attention on clone
implementation, as we don't need to copy key/value in further SparkCollector
implementation. But you are write, we need reasonable coding style at anytime.:D
{quote}
Could you please also check if the sample problem exists in HiveReduceFunction,
where rows are clustered? If so, that can be addressed in a separate JIRA.
{quote}
HiveReduceFunction use SparkCollector as well, so it's ok.
> Select query gives unpredictable incorrect result when parallelism is greater
> than 1 [Spark Branch]
> ---------------------------------------------------------------------------------------------------
>
> Key: HIVE-7372
> URL: https://issues.apache.org/jira/browse/HIVE-7372
> Project: Hive
> Issue Type: Bug
> Components: Spark
> Reporter: Xuefu Zhang
> Assignee: Chengxiang Li
> Attachments: HIVE-7372.patch
>
>
> In SparkClient.java, if the following property is set, unpredictable,
> incorrect result may be observed.
> {code}
> sparkConf.set("spark.default.parallelism", "1");
> {code}
> It's suspected that there are some concurrency issues, as Spark may process
> multiple datasets in a single JVM when parallelism is greater than 1 in order
> to use multiple cores.
> NO PRECOMMIT TESTS. This is for spark branch only.
--
This message was sent by Atlassian JIRA
(v6.2#6252)