[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23275#discussion_r240234583 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -88,68 +88,49 @@ sealed trait UserDefinedFunction

[GitHub] spark issue #23228: [MINOR][DOC] Update the condition description of seriali...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23228 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23275#discussion_r240231883 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -4255,11 +4255,11 @@ object functions { * * @group udf_funcs

[GitHub] spark issue #23275: [SPARK-26323][SQL] Scala UDF should still check input ty...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23275 cc @maryannxue @gatorsmile @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23275#discussion_r240230970 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -47,25 +47,13 @@ case class ScalaUDF

[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23275 [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any ## What changes were proposed in this pull request? For Scala UDF, when checking input

[GitHub] spark pull request #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesM...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23272#discussion_r240189245 --- Diff: core/src/test/java/org/apache/spark/unsafe/map/AbstractBytesToBytesMapSuite.java --- @@ -667,4 +669,54 @@ public void testPeakMemoryUsed

[GitHub] spark issue #23251: [SPARK-26300][SS] Remove a redundant `checkForStreaming`...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23251 cc @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #23262: [SPARK-26312][SQL]Converting converters in RDDConversion...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23262 LGTM, can you update the PR title and description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #23262: [SPARK-26312][SQL]Converting converters in RDDCon...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23262#discussion_r240180713 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -416,7 +416,12 @@ case class

[GitHub] spark issue #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesMap.MapI...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23272 have you seen any bug report caused by this dead lock? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #23272: [SPARK-26265][Core] Fix deadlock in BytesToBytesM...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23272#discussion_r240178993 --- Diff: core/src/test/java/org/apache/spark/memory/TestMemoryConsumer.java --- @@ -38,12 +38,14 @@ public long spill(long size, MemoryConsumer trigger

[GitHub] spark pull request #23204: Revert "[SPARK-21052][SQL] Add hash map metrics t...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23204#discussion_r240104812 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala --- @@ -213,10 +213,6 @@ trait HashJoin { s

[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23255 thanks, merging to master/2.4/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicates and R...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23211 to make the PR smaller, can we add an individual rule `PushdownLeftSemiOrAntiJoin` first? --- - To unsubscribe, e-mail

[GitHub] spark pull request #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicate...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23211#discussion_r240097479 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -984,6 +1002,28 @@ object PushDownPredicate extends

[GitHub] spark pull request #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicate...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23211#discussion_r240097255 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -649,13 +664,16 @@ object CollapseProject extends

[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 can we follow https://github.com/apache/spark/pull/23204#issuecomment-445510026 and create a new ticket

[GitHub] spark pull request #23211: [SPARK-19712][SQL] Move PullupCorrelatedPredicate...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23211#discussion_r240092936 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala --- @@ -267,6 +267,17 @@ object ScalarSubquery

[GitHub] spark issue #23248: [SPARK-26293][SQL] Cast exception when having python udf...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23248 If it's fine for 2.4, I think it's also fine for master as a temporary fix? We can create another ticket to clean up the subquery optimization hack. IIUC https://github.com/apache/spark/pull

[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23258#discussion_r240090371 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -182,10 +182,13 @@ class SQLMetricsSuite extends

[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r240090192 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class

[GitHub] spark pull request #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with ...

2018-12-09 Thread cloud-fan
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/23265 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #23228: [MINOR][DOC] Update the condition description of seriali...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23228 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 If we can quickly finish #23214 (within several days), let's go for it. But if we can't, I'd suggest we do the partial revert first to fix the perf regression, and add back the metrics later

[GitHub] spark issue #23228: [MINOR][DOC]The condition description of serialized shuf...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23228 LGTM, cc @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23228: [MINOR][DOC]The condition description of serializ...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23228#discussion_r240036698 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala --- @@ -33,10 +33,10 @@ import org.apache.spark.shuffle

[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23253 LGTM except a code style comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23253#discussion_r240036498 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -347,17 +347,28 @@ class JacksonParser

[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23253#discussion_r240036489 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -347,17 +347,28 @@ class JacksonParser

[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r240036225 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class

[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23208 Let's move the high level discussion to the doc: https://docs.google.com/document/d/1vI26UEuDpVuOjWw4WPoH2T6y8WAekwtI7qoowhOFnI4/edit?usp=sharing

[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23266#discussion_r240029373 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchRead.java --- @@ -20,14 +20,27 @@ import

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r240028574 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,52 +17,49 @@ package

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r240028515 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -25,7 +25,10 @@ * The base interface for v2 data sources

[GitHub] spark pull request #23266: [SPARK-26313][SQL] move read related methods from...

2018-12-09 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23266 [SPARK-26313][SQL] move read related methods from Table to read related mix-in traits ## What changes were proposed in this pull request? As discussed in https://github.com/apache/spark

[GitHub] spark issue #23266: [SPARK-26313][SQL] move read related methods from Table ...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23266 cc @rdblue @HyukjinKwon @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23265 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...

2018-12-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23259 thanks @maropu for starting it! > Which SQL standard does Spark SQL follow (e.g., 2011 or 2016)? I think SQL 2011 is good, but if we can't find a public version, maybe it's also

[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23258#discussion_r240026727 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -182,10 +182,13 @@ class SQLMetricsSuite extends

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r240026485 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -118,10 +115,12 @@ case class

[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23255 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partit...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23255#discussion_r240026441 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala --- @@ -752,6 +752,17 @@ class InsertSuite extends QueryTest

[GitHub] spark pull request #23262: [SPARK-26312][SQL]Converting converters in RDDCon...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23262#discussion_r240026394 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -53,7 +53,7 @@ object RDDConversions

[GitHub] spark pull request #23262: [SPARK-26312][SQL]Converting converters in RDDCon...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23262#discussion_r240026388 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala --- @@ -33,7 +33,7 @@ object RDDConversions

[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23248#discussion_r240026330 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -131,8 +131,20 @@ object ExtractPythonUDFs

[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23253#discussion_r240026245 --- Diff: docs/sql-migration-guide-upgrade.md --- @@ -35,7 +35,9 @@ displayTitle: Spark SQL Upgrading Guide - Since Spark 3.0, CSV datasource

[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23253#discussion_r240026237 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala --- @@ -229,6 +229,11 @@ private[json] trait

[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 +1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with ...

2018-12-08 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23265 [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and -0.0 in UnsafeWriter backport https://github.com/apache/spark/pull/23239 to 2.4 - ## What changes were proposed

[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23265 cc @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r240022552 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class

[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23207 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23204 according to https://github.com/apache/spark/pull/23214#issuecomment-443999282 , the hash join metrics is wrongly implemented. I think it's fine to revert it and re-implement it later

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-07 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23249 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #22104: [SPARK-24721][SQL] Extract Python UDFs at the end...

2018-12-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22104#discussion_r239738437 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala --- @@ -31,7 +31,8 @@ class SparkOptimizer( override

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239736660 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +80,7 @@ object SQLMetrics { private val

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239735814 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +80,7 @@ object SQLMetrics { private val

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239735425 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +80,7 @@ object SQLMetrics { private val

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239735015 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239734920 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriteProcessor.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #22514: [SPARK-25271][SQL] Hive ctas commands should use ...

2018-12-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22514#discussion_r239733875 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -95,9 +77,116 @@ case class

[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-07 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23239 I checked the original PR that handles NaN: https://github.com/apache/spark/commit/c032b0bf92130dc4facb003f0deaeb1228aefded It didn't add end-to-end tests, so I added 2 new tests

[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23239 Yes it is. `UnsafeProjection` always normalize NaN and -0.0, and Spark uses `UnsafeProjection` to produce output. So users can't distinguish them

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239690226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -22,13 +22,12 @@ import

[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r239687264 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class

[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r239687213 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class

[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23248#discussion_r239686156 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -131,8 +131,20 @@ object ExtractPythonUDFs

[GitHub] spark issue #23215: [SPARK-26263][SQL] Validate partition values with user p...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23215 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23239 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239684697 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -22,13 +22,12 @@ import

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239684490 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchWrite.java --- @@ -25,14 +25,14 @@ import

[GitHub] spark issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch wr...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23208 @rdblue I tried to add `WriteBuilder`, but there is a difference between read and write: 1. for read, the `ScanBuilder` can collect many information, like column pruning, filter pushdown, etc

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239683592 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala --- @@ -17,52 +17,49 @@ package

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239682984 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -241,32 +241,28 @@ final class DataFrameWriter[T] private[sql](ds

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239682239 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -25,7 +25,10 @@ * The base interface for v2 data sources

[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23207 the code looks much cleaner now! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239677846 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala --- @@ -78,6 +80,7 @@ object SQLMetrics { private val

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239677653 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala --- @@ -333,8 +343,19 @@ object ShuffleExchangeExec

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239677477 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriterProcessor.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23207#discussion_r239677325 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleWriterProcessor.scala --- @@ -0,0 +1,82 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #23244: [SPARK-26289][CORE]cleanup enablePerfMetrics parameter f...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23244 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23244: [SPARK-26289][CORE]cleanup enablePerfMetrics para...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23244#discussion_r239675382 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -209,23 +205,14 @@ public BytesToBytesMap

[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r239539848 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class

[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23201#discussion_r239534668 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala --- @@ -121,7 +122,26 @@ private[sql] class

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239508488 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -118,10 +116,13 @@ case class

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23249#discussion_r239508437 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala --- @@ -118,10 +116,13 @@ case class

[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23239#discussion_r239507673 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java --- @@ -198,11 +198,45 @@ protected final void

[GitHub] spark issue #23249: [SPARK-26297][SQL] improve the doc of Distribution/Parti...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23249 cc @maryannxue @hvanhovell @gatorsmile @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-06 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23249 [SPARK-26297][SQL] improve the doc of Distribution/Partitioning ## What changes were proposed in this pull request? Some documents of `Distribution/Partitioning` are stale and misleading

[GitHub] spark issue #23248: [SPARK-26293][SQL] Cast exception when having python udf...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23248 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239469368 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SupportsBatchWrite.java --- @@ -25,14 +25,14 @@ import

[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23215#discussion_r239453312 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1396,6 +1396,16 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #23215: [SPARK-26263][SQL] Validate partition values with...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23215#discussion_r239453026 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala --- @@ -95,6 +95,31 @@ class FileIndexSuite extends

[GitHub] spark issue #23213: [SPARK-26262][SQL] Runs SQLQueryTestSuite on mixed confi...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23213 these 3 combinations LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23244: [SPARK-26289][CORE]cleanup enablePerfMetrics para...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23244#discussion_r239451274 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -209,23 +205,14 @@ public BytesToBytesMap

[GitHub] spark issue #23248: [SPARK-26293][SQL] Cast exception when having python udf...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/23248 cc @icexelloss @HyukjinKwon @ueshin @viirya @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23248#discussion_r239430315 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -60,8 +60,12 @@ private class BatchIterator[T

[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...

2018-12-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23248#discussion_r239430084 --- Diff: python/pyspark/sql/tests/test_udf.py --- @@ -23,7 +23,7 @@ from pyspark import SparkContext from pyspark.sql import SparkSession

[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...

2018-12-06 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23248 [SPARK-26293][SQL] Cast exception when having python udf in subquery ## What changes were proposed in this pull request? This is a regression introduced by https://github.com/apache

  1   2   3   4   5   6   7   8   9   10   >