GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/18273
[SPARK-21059][SQL] LikeSimplification can NPE on null pattern
## What changes were proposed in this pull request?
This patch fixes a bug that can cause NullPointerException in
LikeSimplification
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18257
Sorry it doesn't make sense to do this. Range is used primarily for
testing, and it doesn't make sense to have an optimizer rule that removes it.
If there is a correctness issue in it, we should fix
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18258
That's a good idea. In that case, create a subtask on jira for this and
another one for join?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18258
If there is no regression, I'd remove the flag.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18258
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18258
Can you run it a few more times to tell? Right now it's a difference of 7%
almost
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18258
16.8 vs 15.8?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18258
Can you test the perf degradation?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18258
Why would the tracking have perf impact? It's just a simple counter
increase isn't it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18209
The next one to add is probably Kubernetes. Even the Spark on Kubernetes is
going through this cycle of maintaining a separate project for it first.
---
If your project is set up for it, you can
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18228
Are these ANSI SQL functions? If it is just some esoteric MySQL function I
don't think we should add them.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18236
Why do we want this check? If the user passes in null value, it is ok if it
is not found, isn't it?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18252#discussion_r121246583
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
---
@@ -32,7 +32,7 @@ import
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18256
Merging in master/branch-2.2. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/18256
[SPARK-21042][SQL] Document Dataset.union is resolution by position
## What changes were proposed in this pull request?
Document Dataset.union is resolution by position, not by name, since
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18142
Guys - please in the future separate bug fixes with refactoring. Don't mix
a bunch of cosmetic changes with actual bug fixes together.
---
If your project is set up for it, you can reply
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18113#discussion_r121025561
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/typedaggregators.scala
---
@@ -26,43 +26,64 @@ import
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18217
Merging in master/branch-2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18221
Question: why are these files written in Java?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18207
OK great then we have officially deprecated it, haven't we?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18207
I believe we still support Python 2.6, given Jenkins runs 2.6... There
seems to be no point in removing that support this late in the release cycle.
---
If your project is set up for it, you can
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18202
Wouldn't this break compatibility?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18189
But isn't it in a hint? If you are worried about user, I'd just change it
to "broadcast".
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18159
hmm anyway to shorten the change? this change is a bit too big for metrics
...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18159#discussion_r119995109
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala
---
@@ -17,38 +17,97 @@
package
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18189
tbh the difference is so small that i don't think it is worth spending time
here ... as pointed out it is not forceBroadcast either.
---
If your project is set up for it, you can reply to this email
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18086#discussion_r119271262
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
---
@@ -533,13 +533,16 @@ class AstBuilder(conf: SQLConf) extends
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16598
@gatorsmile this didn't run any tests!!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18132
Thanks - merging in master/branch-2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18086#discussion_r118473083
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
---
@@ -533,13 +533,16 @@ class AstBuilder(conf: SQLConf) extends
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18042
Does this really matter? I'd rather not complicate the actual code for it
to display properly in some niche hardware that very few people use.
---
If your project is set up for it, you can reply
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18086
cc @gatorsmile @cloud-fan @hvanhovell
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18016
hm guys please donât use the end-to-end tests to test expression
behavior. use unit tests which automatically tests code gen, interpreted, and
different data types.
---
If your project is set up
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18087#discussion_r118353924
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
---
@@ -195,9 +195,9 @@ case class Intersect
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18087
cc @hvanhovell, @bogdanrdc
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/18087
[SPARK-20867][SQL] Move hints from Statistics into HintInfo class
## What changes were proposed in this pull request?
This is a follow-up to SPARK-20857 to move the broadcast hint from
Statistics
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18082
Hm I'm not sure if it is a good idea to run so many "unit test" style tests
for expressions in the end to end suites. It takes a lot of time than just
running unit tests.
---
If your proj
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18072
Merging in master / branch-2.2 ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/18072
[SPARK-20857][SQL] Generic resolved hint node
## What changes were proposed in this pull request?
This patch renames BroadcastHint to ResolvedHint so it is more generic and
would allow us
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18064
That works too, if we can attach metrics to these commands.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/18070
cc @ericl
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17999
hmnmm
seems like we should be following how we test tan, cos, etc in
MathExpressionsSuite?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18023#discussion_r117540055
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
---
@@ -2624,4 +2624,92 @@ class SQLQuerySuite extends QueryTest
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/18023#discussion_r117539904
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -795,6 +795,12 @@ object SQLConf {
.intConf
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16478
I don't know how important it is. It seems like it's primarily used by
MLlib and very few other things ...
---
If your project is set up for it, you can reply to this email and have your
reply
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17997#discussion_r116878495
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
---
@@ -601,22 +601,32 @@ object DateTimeUtils
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/15821
@BryanCutler even though the json is long, it is still so much clearer than
reading a pile of code that generates json ...
---
If your project is set up for it, you can reply to this email and have
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17941
@felixcheung what's your concern with this one? seems like just for api
parity sake we should add this?
---
If your project is set up for it, you can reply to this email and have your
reply
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17711
I feel both are pretty complicated. Can we just do something similar to
CombineUnion:
```
/**
* Combines all adjacent [[Union]] operators into a single [[Union]].
*/
object
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17942#discussion_r116143097
--- Diff: core/src/main/scala/org/apache/spark/util/taskListeners.scala ---
@@ -55,14 +55,16 @@ class TaskCompletionListenerException(
extends
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17923
sry too long ago
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17931
What's the issue with SQL metrics?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/16781
Did we conduct any performance tests on this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/17915
[SPARK-20674][SQL] Support registering UserDefinedFunction as named UDF
## What changes were proposed in this pull request?
For some reason we don't have an API to register UserDefinedFunction
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17875
Merging in master/branch-2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17875
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17851
@felixcheung was this merged only in master but not branch-2.2?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17770
@srinathshankar also thinks it's weird to add a barrier node. I suggest
@hvanhovell and @srinathshankar duke it out.
---
If your project is set up for it, you can reply to this email and have your
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17723
I'm saying avoid exposing Hadoop APIs. Wrap them around something if
possible.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17723
I didn't read through the super long debate here, but I have a strong
preference to not expose Hadoop APIs directly. I'm seeing more and more
deployments out there that do not use Hadoop (e.g. connect
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17850
Merging in master/2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17850
LGTM pending Jenkins.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17850#discussion_r114677412
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -380,6 +380,35 @@ def withWatermark(self, eventTime, delayThreshold):
jdf = self
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17842
Merging in master/branch-2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17839
BTW I filed follow-up tickets for Python/R at
https://issues.apache.org/jira/browse/SPARK-20576
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17839
Merging in master/branch-2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17839
@felixcheung do you worry about conflicts?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17678
cc @gatorsmile can you review this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17770
Let's see what other people say before going too far...
cc @cloud-fan / @hvanhovell / @marmbrus / @gatorsmile see my proposal:
https://github.com/apache/spark/pull/17770#issuecomment-298833348
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17770
What self join case are you talking about? The one that we manually rewrite
half of the plan? That one would be a special case anyway, wouldn't it?
---
If your project is set up for it, you can
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17770
I'm actually wondering if we should just introduce a variant of transform
that takes a stop condition, e.g.
```
def transform(stopCondition: BaseType => Boolean)(rule:
PartialFunct
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17839
Actually somebody should add the Python / R wrapper.
cc @felixcheung
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17770
why don't we always add this to the dataset's logicalPlan? we can change
that in one place.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17770#discussion_r114478015
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -1134,7 +1138,7 @@ class Dataset[T] private[sql
GitHub user rxin opened a pull request:
https://github.com/apache/spark/pull/17839
[SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame
## What changes were proposed in this pull request?
We allow users to specify hints (currently only "broadcast" is
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17806
@gatorsmile i will let you merge ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17806
Maybe get rid of the Some? If it is not defined, we probably just shouldn't
show anything.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17780
Can we at least include the serde?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17773
Merging in master/branch-2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17772
Merging in master / branch-2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17770
Can we fix the description? It is really confusing since it uses the word
exchange. Also can we just skip a plan if it is resolved in transform?
---
If your project is set up for it, you can reply
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17727
Hm I don't think the comment makes sense ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17753
Merging in master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/14731
Steve I think the main point is you should also respect the time of
reviewers. The way most of your pull requests manifest have been suboptimal:
they often start with a very early WIP (which
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17648
sgtm
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17736
cc @hvanhovell for review ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17712
Why use a map? That's super unstructured and easy to break ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17712
cc @gatorsmile
This is related to the deterministic thing you want to do?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17717
LGTM pending Jenkins.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17717#discussion_r112803232
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17717#discussion_r112803234
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17712#discussion_r112803097
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
---
@@ -45,14 +45,33 @@ import
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17712#discussion_r112800640
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
---
@@ -45,14 +45,33 @@ import
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17648
I was saying rather than implementing them, just rewrite them into an
aggregate on the conditions and compare them against the value.
---
If your project is set up for it, you can reply
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17712#discussion_r112754224
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
---
@@ -47,12 +47,20 @@ case class UserDefinedFunction protected
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17710
Merging in master/branch-2.2.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17712#discussion_r112622098
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
---
@@ -47,12 +47,20 @@ case class UserDefinedFunction protected
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17711
can you add a test case in sql query file tests?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/17711#discussion_r112590613
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -1483,4 +1483,12 @@ class SparkSqlAstBuilder(conf: SQLConf
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/17705
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
601 - 700 of 14826 matches
Mail list logo