[ https://issues.apache.org/jira/browse/SPARK-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260766#comment-14260766 ]
Cheng Lian commented on SPARK-4963: ----------------------------------- [~yanboliang] Making {{HiveTableScan}} return copied mutable does fix this issue, but I'm afraid there can be noticeable performance regression. Would you mind to do a simple benchmark using code in [#758|https://github.com/apache/spark/pull/758]? I'm thinking maybe we can introduce a new {{Copy}} physical operator which inserts {{_.copy}} whenever an operator that may cache mutable row(s) as intermediate result (like {{Sample}} and {{Sort}}) is found. I'd expect this operator to simplify and unify all ad-hoc mutable row copying code. [~marmbrus] What do you think? > SchemaRDD.sample may return wrong results > ----------------------------------------- > > Key: SPARK-4963 > URL: https://issues.apache.org/jira/browse/SPARK-4963 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0 > Reporter: Cheng Lian > Assignee: Yanbo Liang > > This {{sbt/sbt hive/console}} session can easily reproduce this issue: > {code} > sql("SELECT * FROM src WHERE key % 2 = 0"). > sample(withReplacement = false, fraction = 0.05). > registerTempTable("sampled") > println(table("sampled").queryExecution) > val query = sql("SELECT * FROM sampled WHERE key % 2 = 1") > println(query.queryExecution) > // Should print `true' > println((1 to 10).map(_ => query.collect().isEmpty).reduce(_ && _)) > {code} > Notice that when fraction is less than 0.4, {{GapSamplingIterator}} is used > to do the sampling. My guess is that there’s something to do with the > underlying mutable row objects used in {{HiveTableScan}}, but haven't figured > out the root cause. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org