[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249547 --- Diff: core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala --- @@ -27,7 +27,8 @@ import org.apache.spark.{Partition, TaskContext} private

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249449 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskContext.scala --- @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249225 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r205249297 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #21875: [SPARK-24288][SQL] Enable preventing predicate pushdown

2018-07-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21875 Can you add JDBC to the title? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21866: [SPARK-24768][FollowUp][SQL]Avro migration follow...

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21866#discussion_r204961291 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala --- @@ -56,7 +56,7 @@ private[avro] class AvroFileFormat extends

[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21867#discussion_r204959300 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -731,7 +731,14 @@ private[spark] class BlockManager

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204957474 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -751,7 +751,8 @@ object TypeCoercion

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 I changed the way we do the checks in test to use a thread local rather than checking the stacktrace, so they should run faster now. Also added test cases for the various new methods. Also moved

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204955869 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right

[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21845 If that's the only one I think that PR itself needs to be fixed (significantly increases test runtime), and I wouldn't increase the time here. On Mon, Jul 23, 2018 at 11:44 PM

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 Yea the extra check in test cases might've contributed to the longer test time. Let me think about how to reduce it. On Mon, Jul 23, 2018 at 11:28 PM Hyukjin Kwon wrote

[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21845 Are more pull requests failing due to time out right now? On Mon, Jul 23, 2018 at 6:30 PM Hyukjin Kwon wrote: > @rxin <https://github.com/rxin>, btw you want me close

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-23 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21758#discussion_r204504127 --- Diff: core/src/main/scala/org/apache/spark/BarrierTaskInfo.scala --- @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...

2018-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21826 No we can't because you can still use string concat in filters, e.g. colA || colB == "ab" What is

[GitHub] spark issue #21845: [SPARK-24886][INFRA] Fix the testing script to increase ...

2018-07-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21845 This helps, but it is not sustainable to keep increasing the threshold. What we need to do is to look at test time distribution and figure out what test suites are unnecessarily long and actually cut

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21802 Do we really need full codegen for all of these collection functions? They seem pretty slow and specialization with full codegen won't help perf that much (and might even hurt by blowing up the code

[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21826 cc @gatorsmile @cloud-fan @HyukjinKwon this is a good thing to do? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21826: [SPARK-24872] Remove the symbol “||” of the “OR”...

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21826 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21822 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163484 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -33,6 +49,116 @@ abstract class LogicalPlan

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163424 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -23,8 +23,24 @@ import

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2390,16 +2375,21 @@ class Analyzer( * scoping

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204160853 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204160150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right

[GitHub] spark issue #21803: [SPARK-24849][SQL] Converting a value of StructType to a...

2018-07-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21803 Should we do schema.toDDL, or StructType.toDDL(schema)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r203918981 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends

[GitHub] spark issue #18784: [SPARK-21559][Mesos] remove mesos fine-grained mode

2018-07-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18784 Let's remove it in 3.0 then. We can do it after 2.4 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21742: [SPARK-24768][SQL] Have a built-in AVRO data sour...

2018-07-18 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21742#discussion_r203496489 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala --- @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #21766: [SPARK-24803][SQL] add support for numeric

2018-07-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21766 Why did you need this change? Given it's very difficult to revert the change (or introduce a proper numeric type if ever needed in the future), I would not merge this pull request unless

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 To me it is actually confusing to have the decimal one in there at all, by defining a list of queries that are reused for different functional testing. It is very easy to just ignore the subtle

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 What are the use cases other than decimal? I am not sure if we need to build a lot of infrastructure just for one or two use cases

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 If they produce different results why do you need any infrastructure for them? They are just part of the normal test flow. If they produce the same result, and you don't want to define

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 Can you just define a config matrix in the beginning of the file, and each file is run with the config matrix

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-07-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 I think it's super confusing to have the config names encoded in file names. Makes the names super long and difficult to read, and also hard to verify what was set, and difficult to get multiple

[GitHub] spark pull request #21705: [SPARK-24727][SQL] Add a static config to control...

2018-07-03 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21705#discussion_r199940775 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala --- @@ -66,6 +66,12 @@ object StaticSQLConf { .checkValue

[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...

2018-07-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21686 Thanks. Awesome. This matches what I had in mind then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21459: [SPARK-24420][Build] Upgrade ASM to 6.1 to support JDK9+

2018-07-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21459 SGTM. On Mon, Jul 2, 2018 at 4:38 PM DB Tsai wrote: > There are three approvals from the committers, and the changes are pretty > trivial to revert if we see any perfo

[GitHub] spark issue #21686: [SPARK-24709][SQL] schema_of_json() - schema inference f...

2018-07-02 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21686 Does this actually work in SQL? How does it work when we don't have a data type that's a schema? --- - To unsubscribe, e-mail

[GitHub] spark issue #21626: [SPARK-24642][SQL] New function infers schema for JSON c...

2018-06-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21626 It is on the public list: https://issues.apache.org/jira/browse/SPARK-24642 --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21598: [SPARK-24605][SQL] size(null) returns null instea...

2018-06-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21598#discussion_r198364343 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1324,6 +1324,12 @@ object SQLConf { "Other column v

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 OK I double checked. I don't think we should be adding this functionality, since different databases implemented it differently, and it is somewhat difficult to create Infinity in Spark SQL given we

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 Hey I have an additional thought on this. Will leave it in the next ten mins. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 Here: https://en.wikipedia.org/wiki/Bug_compatibility On Tue, Jun 26, 2018 at 9:28 AM Reynold Xin wrote: > It’s actually common software engineering practice to keep “bug

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 It’s actually common software engineering practice to keep “buggy” semantics if a bug has been out there long enough and a lot of applications depend on the semantics. On Tue, Jun

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 Do we have other "legacy" configs that we haven't released and can change to match this prefix? It's pretty nice to have a single prefix for

[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21598 This is not a "bug" and there is no "right" behavior in APIs. It's been defined as -1 since the very beginning (when was it added?), so we can't just change the default value

[GitHub] spark issue #21544: add one supported type missing from the javadoc

2018-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21544 Thanks. Merging in master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21568: [SPARK-24562][TESTS] Support different configs for same ...

2018-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21568 I'm confused by the description. What does this PR actually do? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21574: [SPARK-24478][SQL][followup] Move projection and filter ...

2018-06-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21574 Does this move actually make sense? It'd destroy stats estimation for partition pruning. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21502 How does this solve the problem you described? If the container is gone, the process is gone and users can't destroy things anymore

[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...

2018-06-08 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19498 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-07 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 Thanks, Henry. In general I'm not a huge fan of adding something because hypothetically somebody might want it. Also if you want this to be compatible with Impala, wouldn't you want to name

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 @henryr 1.0/0.0 also returns null in Spark SQL ... ``` scala> sql("select cast(1.0 as double)/cast(0 as double)").show() +-+

[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21482 How is this done in other databases? I don't think we want to invent new ways on these basic primitives. --- - To unsubscribe, e

[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21482#discussion_r193230476 --- Diff: R/pkg/NAMESPACE --- @@ -281,6 +281,8 @@ exportMethods("%<=>%", "initcap",

[GitHub] spark issue #21448: [SPARK-24408][SQL][DOC] Move abs, bitwiseNOT, isnan, nan...

2018-05-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21448 I'd only move abs and nothing else. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21459: [SPARK-24420][Build] Upgrade ASM to 6.1 to support JDK9+

2018-05-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21459 What's driving this (is it java 9)? I'm in general scared by core library updates like this. Maybe Spark 3.0 is a good time (and we should just do it this year

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21453 Jenkins, add to whitelist. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21453: Test branch to see how Scala 2.11.12 performs

2018-05-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21453 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21416: [SPARK-24371] [SQL] Added isInCollection in DataFrame AP...

2018-05-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21416 LGTM (I didn't look that carefully though) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...

2018-05-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r191306678 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest

[GitHub] spark pull request #21416: [SPARK-24371] [SQL] Added isInCollection in DataF...

2018-05-28 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21416#discussion_r191306654 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -392,9 +396,97 @@ class ColumnExpressionSuite extends QueryTest

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 If we can fix it without breaking existing behavior that would be awesome. On Fri, May 25, 2018 at 9:59 AM Bryan Cutler <notificati...@github.com> wrote: > I've been think

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 On the config part, I haven’t looked at the code but can’t we just reorder the columns on the JVM side? Why do we need to reorder them on the Python side? On Fri, May 25, 2018

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 I agree it should have started experimental. It is pretty weird to after the fact mark something experimental though. On Fri, May 25, 2018 at 12:23 AM Hyukjin Kwon <notificati...@github.

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 Why is it difficult? On Fri, May 25, 2018 at 12:03 AM Hyukjin Kwon <notificati...@github.com> wrote: > but as I said it's difficult to have a configuration there. Shal

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803873 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803855 --- Diff: python/pyspark/sql/dataframe.py --- @@ -347,13 +347,30 @@ def show(self, n=20, truncate=True, vertical=False): name | Bob

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803772 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r190803641 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM

[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21427 If this has been released you can't just change it like this; it will break users' programs immediately. At the very least introduce a flag so it can be set by the user to avoid breaking their code

[GitHub] spark issue #21242: [SPARK-23657][SQL] Document and expose the internal data...

2018-05-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21242 Thanks Ryan. I'm not a fan of just exposing internal classes like this. The APIs haven't really been designed or audited for the purpose of external consumption. If we want to expose the internal APIs

[GitHub] spark pull request #21370: [SPARK-24215][PySpark] Implement _repr_html_ for ...

2018-05-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21370#discussion_r189669772 --- Diff: docs/configuration.md --- @@ -456,6 +456,29 @@ Apart from these, the following properties are also available, and may be useful from JVM

[GitHub] spark issue #21370: [SPARK-24215][PySpark] Implement _repr_html_ for datafra...

2018-05-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21370 Can we also do something a bit more generic that works for non-Jupyter notebooks as well? For example, in IPython or just plain Python REPL

[GitHub] spark issue #21329: [SPARK-24277][SQL] Code clean up in SQL module: HadoopMa...

2018-05-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21329 Why are we cleaning up stuff like this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...

2018-05-17 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21192 my point is that i don't consider a sequence of chars an array to begin with. it is not natural to me. I'd want an array if it is a different set of separators

[GitHub] spark issue #21192: [SPARK-24118][SQL] Flexible format for the lineSep optio...

2018-05-16 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21192 eh I actually think separated makes it much simpler to look at, compared with an array. Why complicate the API and require users to understand how to specify an array (in all languages)? One

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-05-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 It's still going to fail because I haven't updated it yet. Will do tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-14 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21316#discussion_r188104204 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1607,7 +1607,9 @@ class Dataset[T] private[sql]( */ @Experimental

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-05-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 Hm the failure doesn't look like it's caused by this PR. Do you guys know what's going on? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21318: [minor] Update docs for functions.scala to make it clear...

2018-05-13 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21318 cc @gatorsmile @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21318: [minor] Update docs for functions.scala to make i...

2018-05-13 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21318 [minor] Update docs for functions.scala to make it clear not all the built-in functions are defined there The title summarizes the change. You can merge this pull request into a Git repository

[GitHub] spark pull request #21316: [SPARK-20538][SQL] Wrap Dataset.reduce with withN...

2018-05-13 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21316#discussion_r187838099 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1607,7 +1607,9 @@ class Dataset[T] private[sql]( */ @Experimental

[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21309 Better compile time error. Plus a lot of people are already using these. On Fri, May 11, 2018 at 7:35 PM Hyukjin Kwon <notificati...@github.com> wrote: > Yup, then why

[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21309 Adding it to sql would allow it to be available everywhere (through expr) right? On Fri, May 11, 2018 at 7:30 PM Hyukjin Kwon <notificati...@github.com> wrote: > Thing

[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21309 Btw it’s been always the case that the less commonly used functions are not part of this file. There is just a lot of overhead to maintaining all of them. I’m not even sure

[GitHub] spark issue #21054: [SPARK-23907][SQL] Add regr_* functions

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21054 There is not a single function that can’t be called by expr. It mainly adds some type safety. On Fri, May 11, 2018 at 7:18 PM Hyukjin Kwon <notificati...@github.com>

[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21309 cc @gatorsmile @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21309: [SPARK-23907] Removes regr_* functions in functio...

2018-05-11 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/21309 [SPARK-23907] Removes regr_* functions in functions.scala ## What changes were proposed in this pull request? This patch removes the various regr_* functions in functions.scala. They are so

[GitHub] spark pull request #21054: [SPARK-23907][SQL] Add regr_* functions

2018-05-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21054#discussion_r187751801 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -775,6 +775,178 @@ object functions { */ def var_pop(columnName

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-05-01 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21121 @lokm01 wouldn't @ueshin's suggestion on adding a second parameter to transform work for you? You can just do something similar to `transform(x, (entry, index) -> struct(entry, index))`. Perh

[GitHub] spark pull request #21187: [SPARK-24035][SQL] SQL syntax for Pivot

2018-04-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21187#discussion_r185084802 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/PivotSuite.scala --- @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-26 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21169#discussion_r184596334 --- Diff: docs/sql-programming-guide.md --- @@ -1805,12 +1805,13 @@ working with timestamps in `pandas_udf`s to get the best performance, see

[GitHub] spark issue #20560: [SPARK-23375][SQL] Eliminate unneeded Sort in Optimizer

2018-04-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20560 Just saw this - this seems like a somewhat awkward way to do it by just matching on filter / project. Is the main thing lacking a way to do back propagation for properties? (We can only do forward

[GitHub] spark issue #21071: [SPARK-21962][CORE] Distributed Tracing in Spark

2018-04-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21071 @devaraj-kavali can you close this PR first? Looks like there isn't any reason to really use htrace anymore

[GitHub] spark issue #19222: [SPARK-10399][SPARK-23879][CORE][SQL] Introduce multiple...

2018-04-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19222 @kiszk do you have more data now? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19222: [SPARK-10399][SPARK-23879][CORE][SQL] Introduce multiple...

2018-04-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/19222 OK thanks please do that. Does TPC-DS even trigger 2 call sites? E.g. ByteArrayMemoryBlock and OnHeapMemoryBlock. Even there it might introduce a conditional branch after JIT that could lead to perf

<    1   2   3   4   5   6   7   8   9   10   >