[GitHub] spark pull request #22509: [SPARK-25384][SQL] Clarify fromJsonForceNullableS...

2018-09-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22509 [SPARK-25384][SQL] Clarify fromJsonForceNullableSchema will be removed in Spark 3.0 See above. This should go into the 2.4 release. You can merge this pull request into a Git repository by running

[GitHub] spark issue #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy.compar...

2018-09-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22508 cc @gatorsmile who merged the original pr. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22508: [SPARK-23549][SQL] Rename config spark.sql.legacy...

2018-09-20 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22508 [SPARK-23549][SQL] Rename config spark.sql.legacy.compareDateTimestampInTimestamp ## What changes were proposed in this pull request? See title. ## How was this patch tested? Make

[GitHub] spark issue #22505: Revert "[SPARK-23715][SQL] the input of to/from_utc_time...

2018-09-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22505 lgtm - let's make sure tests pass --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22442: [SPARK-25447][SQL] Support JSON options by schema...

2018-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22442#discussion_r219297029 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3611,6 +3611,20 @@ object functions { */ def schema_of_json(e

[GitHub] spark pull request #22471: [SPARK-25470][SQL][Performance] Concat.eval shoul...

2018-09-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22471#discussion_r219023998 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -2274,33 +2274,41 @@ case class Concat

[GitHub] spark issue #22476: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streamin...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22476 Merged in master/2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSc...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22475 jenkins, retest this again --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22475: [SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSc...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22475 done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22476: [SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streamin...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22476 done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21169: [SPARK-23715][SQL] the input of to/from_utc_timestamp ca...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21169 i'm actually not sure if we should do this, given impala treats timestamp as timestamp without timezone, whereas spark treats it as a utc timestamp (with timezone). these functions are super confusing

[GitHub] spark issue #22476: [SPARK-24157] spark.sql.streaming.noDataMicroBatches.ena...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22476 cc @tdas @marmbrus @jose-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22476: [SPARK-24157] spark.sql.streaming.noDataMicroBatc...

2018-09-19 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22476 [SPARK-24157] spark.sql.streaming.noDataMicroBatches.enabled ## What changes were proposed in this pull request? This patch changes the config option

[GitHub] spark pull request #22475: [SPARK-4502][SQL] spark.sql.optimizer.nestedSchem...

2018-09-19 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22475 [SPARK-4502][SQL] spark.sql.optimizer.nestedSchemaPruning.enabled ## What changes were proposed in this pull request? This patch adds an "optimizer" prefix to nested sche

[GitHub] spark issue #22475: [SPARK-4502][SQL] spark.sql.optimizer.nestedSchemaPrunin...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22475 cc @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22472: [SPARK-23173][SQL] Reverting of spark.sql.fromJsonForceN...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22472 im ok either way --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22471: [SPARK-25470][SQL][Performance] Concat.eval should use p...

2018-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22471 @ueshin can you review? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #20858: [SPARK-23736][SQL] Extending the concat function ...

2018-09-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/20858#discussion_r218677837 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala --- @@ -665,3 +667,219 @@ case class ElementAt

[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19868 can somebody explain to me what the pr description has to do with missingFiles? I'm probably missing something but i feel the implementation is very different from the pr description

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16677 ok after thinking about it more, i think we should just revert all of these changes and go back to the drawing board. here's why: 1. the prs change some of the most common/core parts of spark

[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22456#discussion_r218666270 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -31,7 +31,7 @@ import org.apache.spark.util.Utils /** * Result

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218665902 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #21527: [SPARK-24519] Make the threshold for highly compr...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21527#discussion_r218640616 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -50,7 +50,9 @@ private[spark] sealed trait MapStatus { private[spark

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218640368 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #21527: [SPARK-24519] Make the threshold for highly compr...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21527#discussion_r218639496 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -50,7 +50,9 @@ private[spark] sealed trait MapStatus { private[spark

[GitHub] spark pull request #22459: [SPARK-23173] rename spark.sql.fromJsonForceNulla...

2018-09-18 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22459 [SPARK-23173] rename spark.sql.fromJsonForceNullableSchema ## What changes were proposed in this pull request? `spark.sql.fromJsonForceNullableSchema

[GitHub] spark issue #22459: [SPARK-23173][SQL] rename spark.sql.fromJsonForceNullabl...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22459 cc @mswit-databricks @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16677 actually looking at the design - this could cause perf regressions in some cases too right? it introduces a barrier that was previously non-existent. if the number of records to take isn't

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218633220 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -68,22 +68,42 @@ abstract class SparkStrategies extends

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218632551 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218631745 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218631682 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -93,25 +96,93 @@ trait BaseLimitExec extends UnaryExecNode

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218631461 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218630599 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218630513 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruningSuite.scala --- @@ -22,21 +22,29 @@ import scala.collection.JavaConverters

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218630488 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala --- @@ -557,11 +557,13 @@ class DataFrameAggregateSuite extends QueryTest

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218630324 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -204,6 +204,13 @@ object SQLConf { .intConf

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218629650 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark issue #22344: [SPARK-25352][SQL] Perform ordered global limit when lim...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22344 guys - the whole sequence of prs for this feature are contributing a lot of cryptic code with arcane documentation everywhere. i worry a lot about the maintainability of the code that's coming in. can

[GitHub] spark pull request #22344: [SPARK-25352][SQL] Perform ordered global limit w...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22344#discussion_r218623478 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala --- @@ -98,7 +98,8 @@ case class LocalLimitExec(limit: Int, child: SparkPlan

[GitHub] spark pull request #22457: [SPARK-24626] Add statistics prefix to parallelFi...

2018-09-18 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22457 [SPARK-24626] Add statistics prefix to parallelFileListingInStatsComputation ## What changes were proposed in this pull request? To be more consistent with other statistics based configs

[GitHub] spark issue #22456: [SPARK-19355][SQL] Fix variable names numberOfOutput -> ...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22456 cc @hvanhovell @cloud-fan also @viirya please don't use such cryptic variable names ... we also need to fix the documentation for the config flag - it's arcane

[GitHub] spark pull request #22456: [SPARK-19355][SQL] Fix variable names numberOfOut...

2018-09-18 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22456 [SPARK-19355][SQL] Fix variable names numberOfOutput ## What changes were proposed in this pull request? SPARK-19355 introduced a variable / method called numberOfOutput, which is a really bad

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-09-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16677 two questions about this (i just saw this from a different place): 1. is numOutput about number of records? 2. how much memory usage will be increased by, for the driver, at scale

[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-09-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/16677#discussion_r218614872 --- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala --- @@ -44,18 +45,23 @@ private[spark] sealed trait MapStatus { * necessary

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22395 Looks like a use case for a legacy config. On Mon, Sep 17, 2018 at 6:41 PM Wenchen Fan wrote: > To clarify, it's not following hive, but following the behavior of > pr

[GitHub] spark issue #22395: [SPARK-16323][SQL] Add IntegralDivide expression

2018-09-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22395 why are we always returning long type here? shouldn't they be the same as the left expr's type? see mysql ```mysql> create temporary table rxin_temp select 4 div 2, 123456789124 div 2, 4

[GitHub] spark pull request #22442: [SPARK-25447][SQL] Support JSON options by schema...

2018-09-17 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22442#discussion_r218250393 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3611,6 +3611,20 @@ object functions { */ def schema_of_json(e

[GitHub] spark issue #21433: [SPARK-23820][CORE] Enable use of long form of callsite ...

2018-09-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21433 Yea we can add this back easily. On Tue, Sep 11, 2018 at 12:50 PM Sean Owen wrote: > Given lack of certainty, and that's this is small and easy to add back in > a differen

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-09-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22010 Actually @holdenk is this change even correct? RDD.distinct is not key based. It is based on the value of the elements in RDD. Even if `numPartitions == partitions.length`, it doesn't mean the RDD

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-09-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r216145892 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,26 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing

[GitHub] spark issue #22332: [SPARK-25333][SQL] Ability add new columns in Dataset in...

2018-09-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22332 Thanks guys. On Thu, Sep 6, 2018 at 2:12 AM Hyukjin Kwon wrote: > Thanks, @wmellouli <https://github.com/wmellouli>. > > — > You are receiving thi

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-09-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 BTW I think this is probably SPIP-worthy. At the very least we should write a design doc on this, similar to the other docs for dsv2 sub-components. We should really think about whether it'd

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-09-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 Given the uncertainty about how this works across batch, streaming, and CP, and given we are still flushing out the main APIs, I think we should revert this, and revisit when the main APIs are done

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 I will take a look at this tomorrow, since I’m already looking at data source apis myself. Can provide opinion after another look on whether we should keep it unstable or revert

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 I'm confused by this api. Is this for streaming only? If yes, why are they not in the stream package? If not, I only found streaming implementation. Maybe I missed

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-08-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21721 Stuff like this merits api discussions. Not just implementation changes ... --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22227: [SPARK-25202] [SQL] Implements split with limit sql func...

2018-08-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/7 Please remove the 0 semantics. IMO the zero vs negative number difference is too subtle. I only find Java String supporting that. Python doesn't

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214135400 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -229,33 +229,58 @@ case class RLike(left

[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r214131195 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -229,33 +229,58 @@ case class RLike(left

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-08-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22010 Thanks for pinging. Please don't merge this until you've addressed the OOM issue. The aggregators were created to handle incoming data larger than size of memory. We should never use a Scala or Java

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r214103667 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing

[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...

2018-08-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r214103223 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing

[GitHub] spark issue #22258: [SPARK-25266] Fix memory leak vulnerability in Barrier E...

2018-08-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22258 Can you remove vulnerability from the title? Otherwise it sounds like a security vulnerability here. --- - To unsubscribe, e-mail

[GitHub] spark pull request #22205: [SPARK-25212][SQL] Support Filter in ConvertToLoc...

2018-08-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22205#discussion_r213100428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -130,6 +130,10 @@ abstract class Optimizer

[GitHub] spark issue #18447: [SPARK-21232][SQL][SparkR][PYSPARK] New built-in SQL fun...

2018-08-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18447 Yea I'd probably reject this for now, until we see bigger needs for it. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22162: [spark-24442][SQL] Added parameters to control th...

2018-08-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/22162#discussion_r213026874 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -815,6 +815,24 @@ class Dataset[T] private[sql]( println(showString

[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...

2018-08-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212815703 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left

[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...

2018-08-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212815685 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left

[GitHub] spark issue #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatal...

2018-08-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22185 cc @rdblue @cloud-fan @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #22185: [SPARK-25127] DataSourceV2: Remove SupportsPushDo...

2018-08-22 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/22185 [SPARK-25127] DataSourceV2: Remove SupportsPushDownCatalystFilters ## What changes were proposed in this pull request? They depend on internal Expression APIs. Let's see how far we can get

[GitHub] spark issue #16600: [SPARK-19242][SQL] SHOW CREATE TABLE should generate new...

2018-08-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16600 Can you close this pr? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22065 Are we talking about a 0.7% margin improvement? It doesn't seem like it's worth the complexity. --- - To unsubscribe, e-mail

[GitHub] spark issue #22157: [SPARK-25126] Avoid creating Reader for all orc files

2018-08-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22157 Do we have a similar issue for Parquet? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22160: Revert "[SPARK-24418][BUILD] Upgrade Scala to 2.11.12 an...

2018-08-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22160 Can you add to the pr description why we are reverting? Just copy paste what you had above. Thanks. --- - To unsubscribe, e-mail

[GitHub] spark issue #22134: [SPARK-25143][SQL] Support data source name mapping conf...

2018-08-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/22134 I think it's premature to introduce this. The extra layer of abstraction actually makes it more difficult to reason about what's going on. We don't have that many data sources that require flexibility

[GitHub] spark issue #21944: [SPARK-24988][SQL]Add a castBySchema method which casts ...

2018-08-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21944 Thanks, Mahmoud! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21951 LGTM. On Thu, Aug 2, 2018 at 1:14 AM Xiao Li wrote: > This will simplify the code and improve the readability. We can do the > same in the other expression. >

[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21951 Why would we want to use the DSL here? Do we use it in other expressions? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21938: [SPARK-24982][SQL] UDAF resolution should not thr...

2018-07-31 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21938 [SPARK-24982][SQL] UDAF resolution should not throw AssertionError ## What changes were proposed in this pull request? When user calls anUDAF with the wrong number of arguments, Spark previously

[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...

2018-07-31 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21934#discussion_r206681479 --- Diff: sql/core/src/test/resources/sql-tests/results/table-valued-functions.sql.out --- @@ -83,8 +83,13 @@ select * from range(1, null) -- !query 6

[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...

2018-07-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21934 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21934: [SPARK-24951][SQL] Table valued functions should throw A...

2018-07-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21934 cc @gatorsmile @ericl who originally wrote this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21934: [SPARK-24951][SQL] Table valued functions should ...

2018-07-31 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21934 [SPARK-24951][SQL] Table valued functions should throw AnalysisException ## What changes were proposed in this pull request? Previously TVF resolution could throw IllegalArgumentException

[GitHub] spark issue #21932: [SPARK-24979][SQL] add AnalysisHelper#resolveOperatorsUp

2018-07-31 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21932 Do we really need this? It's almost always the case for resolution that you'd want to do bottom up, so I thought Michael's original design to just call it resolveOperators make a lot of sense

[GitHub] spark issue #21923: [SPARK-24918][Core] Executor Plugin api

2018-07-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21923 Are there more specific use cases? I always feel it'd be impossible to design APIs without seeing couple different use cases

[GitHub] spark issue #21922: [WIP] Add an ANSI SQL parser mode

2018-07-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21922 what are the actual changes? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-07-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 LGTM. On Fri, Jul 27, 2018 at 10:58 PM Hyukjin Kwon wrote: > @rxin <https://github.com/rxin> re: #21318 (comment) > <https://github.com/apache/s

[GitHub] spark pull request #21318: [minor] Update docs for functions.scala to make i...

2018-07-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21318#discussion_r20582 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -39,7 +39,21 @@ import org.apache.spark.util.Utils

[GitHub] spark issue #21897: [minor] Improve documentation for HiveStringType's

2018-07-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21897 cc @gatorsmile cc @hvanhovell why did we expose these types as public Scala APIs? I feel they should not have been public. If they are public, we should have more generic VarcharType

[GitHub] spark pull request #21897: [minor] Improve documentation for HiveStringType'...

2018-07-27 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21897 [minor] Improve documentation for HiveStringType's The diff should be self-explanatory. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin

[GitHub] spark pull request #21706: [SPARK-24702] Fix Unable to cast to calendar inte...

2018-07-27 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21706#discussion_r205851385 --- Diff: sql/core/src/test/resources/sql-tests/inputs/cast.sql --- @@ -42,4 +42,38 @@ SELECT CAST('9223372036854775808' AS long); DESC FUNCTION

[GitHub] spark issue #21896: [SPARK-24865][SQL] Remove AnalysisBarrier addendum

2018-07-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21896 cc @gatorsmile @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21896: [SPARK-24865][SQL] Remove AnalysisBarrier addendu...

2018-07-27 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21896 [SPARK-24865][SQL] Remove AnalysisBarrier addendum ## What changes were proposed in this pull request? I didn't want to pollute the diff in the previous PR and left some TODOs. This is a follow

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-07-27 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 Yup will do. On Fri, Jul 27, 2018 at 10:23 AM Sean Owen wrote: > Just browsing old PRs .. want to finish this one up @rxin > <https://github.com/rxin>?

[GitHub] spark issue #21699: [SPARK-24722][SQL] pivot() with Column type argument

2018-07-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21699 I'm OK with it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21873: [SPARK-24919][BUILD] New linter rule for sparkCon...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21873#discussion_r205252848 --- Diff: scalastyle-config.xml --- @@ -150,6 +150,19 @@ This file is divided into 3 sections: // scalastyle:on println

[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21758 What's the failure mode if there are not enough slots for the barrier mode? We should throw an exception right

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205250930 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ActiveJob.scala --- @@ -60,4 +60,10 @@ private[spark] class ActiveJob( val finished

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205250352 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1839,6 +1847,20 @@ abstract class RDD[T: ClassTag]( def toJavaRDD() : JavaRDD[T

<    1   2   3   4   5   6   7   8   9   10   >