[GitHub] spark pull request #18273: [SPARK-21059][SQL] LikeSimplification can NPE on ...

2017-06-12 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18273 [SPARK-21059][SQL] LikeSimplification can NPE on null pattern ## What changes were proposed in this pull request? This patch fixes a bug that can cause NullPointerException in LikeSimplification

[GitHub] spark issue #18257: [SPARK-21044][SPARK-21041][SQL] Add RemoveInvalidRange o...

2017-06-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18257 Sorry it doesn't make sense to do this. Range is used primarily for testing, and it doesn't make sense to have an optimizer rule that removes it. If there is a correctness issue in it, we should fix

[GitHub] spark issue #18258: [SPARK-20953][SQL] Add hash map metrics to aggregate

2017-06-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 That's a good idea. In that case, create a subtask on jira for this and another one for join? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 If there is no regression, I'd remove the flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Can you run it a few more times to tell? Right now it's a difference of 7% almost --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 16.8 vs 15.8? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Can you test the perf degradation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18258: [SPARK-20953][SQL][WIP] Add hash map metrics to aggregat...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18258 Why would the tracking have perf impact? It's just a simple counter increase isn't it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18209: [SPARK-20992][Scheduler] Add support for Nomad as a sche...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18209 The next one to add is probably Kubernetes. Even the Spark on Kubernetes is going through this cycle of maintaining a separate project for it first. --- If your project is set up for it, you can

[GitHub] spark issue #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEFT

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18228 Are these ANSI SQL functions? If it is just some esoteric MySQL function I don't think we should add them. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18236: [SPARK-21015] Check field name is not null and empty in ...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18236 Why do we want this check? If the user passes in null value, it is ok if it is not found, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18252: [SPARK-17914][SQL] Fix parsing of timestamp strin...

2017-06-09 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18252#discussion_r121246583 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -32,7 +32,7 @@ import

[GitHub] spark issue #18256: [SPARK-21042][SQL] Document Dataset.union is resolution ...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18256 Merging in master/branch-2.2. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18256: [SPARK-21042][SQL] Document Dataset.union is reso...

2017-06-09 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18256 [SPARK-21042][SQL] Document Dataset.union is resolution by position ## What changes were proposed in this pull request? Document Dataset.union is resolution by position, not by name, since

[GitHub] spark issue #18142: [SPARK-20918] [SQL] Use FunctionIdentifier as function i...

2017-06-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18142 Guys - please in the future separate bug fixes with refactoring. Don't mix a bunch of cosmetic changes with actual bug fixes together. --- If your project is set up for it, you can reply

[GitHub] spark pull request #18113: [SPARK-20890][SQL] Added min and max typed aggreg...

2017-06-08 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18113#discussion_r121025561 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/typedaggregators.scala --- @@ -26,43 +26,64 @@ import

[GitHub] spark issue #18217: [SPARK-20854][TESTS] Removing duplicate test case

2017-06-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18217 Merging in master/branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18221: [SPARK-20655][core] In-memory KVStore implementation.

2017-06-06 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18221 Question: why are these files written in Java? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18207 OK great then we have officially deprecated it, haven't we? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18207 I believe we still support Python 2.6, given Jenkins runs 2.6... There seems to be no point in removing that support this late in the release cycle. --- If your project is set up for it, you can

[GitHub] spark issue #18202: [SPARK-20980] [SQL] Rename `wholeFile` to `multiLine` fo...

2017-06-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18202 Wouldn't this break compatibility? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18189: [SPARK-20972][SQL] rename HintInfo.isBroadcastable to fo...

2017-06-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18189 But isn't it in a hint? If you are worried about user, I'd just change it to "broadcast". --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-06-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18159 hmm anyway to shorten the change? this change is a bit too big for metrics ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...

2017-06-03 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18159#discussion_r119995109 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala --- @@ -17,38 +17,97 @@ package

[GitHub] spark issue #18189: [SPARK-20972][SQL] rename HintInfo.isBroadcastable to fo...

2017-06-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18189 tbh the difference is so small that i don't think it is worth spending time here ... as pointed out it is not forceBroadcast either. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18086: [SPARK-20854][SQL] Extend hint syntax to support ...

2017-05-30 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18086#discussion_r119271262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -533,13 +533,16 @@ class AstBuilder(conf: SQLConf) extends

[GitHub] spark issue #16598: [SPARK-19236][Core] Added createOrReplaceGlobalTempView ...

2017-05-30 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16598 @gatorsmile this didn't run any tests!!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18132: [SPARK-8184][SQL] Add additional function description fo...

2017-05-29 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18132 Thanks - merging in master/branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18086: [SPARK-20854][SQL] Extend hint syntax to support ...

2017-05-25 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18086#discussion_r118473083 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -533,13 +533,16 @@ class AstBuilder(conf: SQLConf) extends

[GitHub] spark issue #18042: [SPARK-20817][core] Fix to return "Unknown processor" on...

2017-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18042 Does this really matter? I'd rather not complicate the actual code for it to display properly in some niche hardware that very few people use. --- If your project is set up for it, you can reply

[GitHub] spark issue #18086: [SPARK-20854][SQL] Extend hint syntax to support express...

2017-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18086 cc @gatorsmile @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18016: [SPARK-20786][SQL]Improve ceil and floor handle the valu...

2017-05-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18016 hm guys please don’t use the end-to-end tests to test expression behavior. use unit tests which automatically tests code gen, interpreted, and different data types. --- If your project is set up

[GitHub] spark pull request #18087: [SPARK-20867][SQL] Move hints from Statistics int...

2017-05-24 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18087#discussion_r118353924 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala --- @@ -195,9 +195,9 @@ case class Intersect

[GitHub] spark issue #18087: [SPARK-20867][SQL] Move hints from Statistics into HintI...

2017-05-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18087 cc @hvanhovell, @bogdanrdc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18087: [SPARK-20867][SQL] Move hints from Statistics int...

2017-05-24 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18087 [SPARK-20867][SQL] Move hints from Statistics into HintInfo class ## What changes were proposed in this pull request? This is a follow-up to SPARK-20857 to move the broadcast hint from Statistics

[GitHub] spark issue #18082: [SPARK-20665][SQL][FOLLOW-UP]Move test case to SQLQueryT...

2017-05-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18082 Hm I'm not sure if it is a good idea to run so many "unit test" style tests for expressions in the end to end suites. It takes a lot of time than just running unit tests. --- If your proj

[GitHub] spark issue #18072: [SPARK-20857][SQL] Generic resolved hint node

2017-05-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18072 Merging in master / branch-2.2 ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #18072: [SPARK-20857][SQL] Generic resolved hint node

2017-05-23 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/18072 [SPARK-20857][SQL] Generic resolved hint node ## What changes were proposed in this pull request? This patch renames BroadcastHint to ResolvedHint so it is more generic and would allow us

[GitHub] spark issue #18064: [SPARK-20213][SQL] Fix DataFrameWriter operations in SQL...

2017-05-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18064 That works too, if we can attach metrics to these commands. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

2017-05-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18070 cc @ericl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17999: [SPARK-20751][SQL] Add built-in SQL Function - COT

2017-05-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17999 hmnmm seems like we should be following how we test tan, cos, etc in MathExpressionsSuite? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117540055 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2624,4 +2624,92 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #18023: [SPARK-12139] [SQL] REGEX Column Specification

2017-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18023#discussion_r117539904 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -795,6 +795,12 @@ object SQLConf { .intConf

[GitHub] spark issue #16478: [SPARK-7768][SQL] Revise user defined types (UDT)

2017-05-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16478 I don't know how important it is. It seems like it's primarily used by MLlib and very few other things ... --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #17997: [SPARK-20763][SQL]The function of `month` and `da...

2017-05-16 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17997#discussion_r116878495 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala --- @@ -601,22 +601,32 @@ object DateTimeUtils

[GitHub] spark issue #15821: [SPARK-13534][PySpark] Using Apache Arrow to increase pe...

2017-05-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15821 @BryanCutler even though the json is long, it is still so much clearer than reading a pile of code that generates json ... --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #17941: [SPARK-20684][R] Expose createGlobalTempView and dropGlo...

2017-05-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17941 @felixcheung what's your concern with this one? seems like just for api parity sake we should add this? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-05-12 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17711 I feel both are pretty complicated. Can we just do something similar to CombineUnion: ``` /** * Combines all adjacent [[Union]] operators into a single [[Union]]. */ object

[GitHub] spark pull request #17942: [SPARK-20702][Core]TaskContextImpl.markTaskComple...

2017-05-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17942#discussion_r116143097 --- Diff: core/src/main/scala/org/apache/spark/util/taskListeners.scala --- @@ -55,14 +55,16 @@ class TaskCompletionListenerException( extends

[GitHub] spark issue #17923: [SPARK-20591][WEB UI] Succeeded tasks num not equal in a...

2017-05-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17923 sry too long ago --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17931: [SPARK-12837][CORE][FOLLOWUP] getting name should not fa...

2017-05-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17931 What's the issue with SQL metrics? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #16781: [SPARK-12297][SQL] Hive compatibility for Parquet Timest...

2017-05-09 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16781 Did we conduct any performance tests on this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17915: [SPARK-20674][SQL] Support registering UserDefine...

2017-05-09 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/17915 [SPARK-20674][SQL] Support registering UserDefinedFunction as named UDF ## What changes were proposed in this pull request? For some reason we don't have an API to register UserDefinedFunction

[GitHub] spark issue #17875: [SPARK-20616] RuleExecutor logDebug of batch results sho...

2017-05-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17875 Merging in master/branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17875: [SPARK-20616] RuleExecutor logDebug of batch results sho...

2017-05-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17875 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17851: [SPARK-20585][SPARKR] R generic hint support

2017-05-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17851 @felixcheung was this merged only in master but not branch-2.2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17770 @srinathshankar also thinks it's weird to add a barrier node. I suggest @hvanhovell and @srinathshankar duke it out. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

2017-05-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17723 I'm saying avoid exposing Hadoop APIs. Wrap them around something if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

2017-05-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17723 I didn't read through the super long debate here, but I have a strong preference to not expose Hadoop APIs directly. I'm seeing more and more deployments out there that do not use Hadoop (e.g. connect

[GitHub] spark issue #17850: [SPARK-20584][PYSPARK][SQL] Python generic hint support

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17850 Merging in master/2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17850: [SPARK-20584][PYSPARK][SQL] Python generic hint support

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17850 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17850: [SPARK-20584][PYSPARK][SQL] Python generic hint s...

2017-05-03 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17850#discussion_r114677412 --- Diff: python/pyspark/sql/dataframe.py --- @@ -380,6 +380,35 @@ def withWatermark(self, eventTime, delayThreshold): jdf = self

[GitHub] spark issue #17842: [MINOR][SQL] Fix the test title from =!= to <=>, remove ...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17842 Merging in master/branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17839: [SPARK-20576][SQL] Support generic hint function in Data...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17839 BTW I filed follow-up tickets for Python/R at https://issues.apache.org/jira/browse/SPARK-20576 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17839: [SPARK-20576][SQL] Support generic hint function in Data...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17839 Merging in master/branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17839: [SPARK-20576][SQL] Support generic hint function in Data...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17839 @felixcheung do you worry about conflicts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17678: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17678 cc @gatorsmile can you review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17770 Let's see what other people say before going too far... cc @cloud-fan / @hvanhovell / @marmbrus / @gatorsmile see my proposal: https://github.com/apache/spark/pull/17770#issuecomment-298833348

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17770 What self join case are you talking about? The one that we manually rewrite half of the plan? That one would be a special case anyway, wouldn't it? --- If your project is set up for it, you can

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17770 I'm actually wondering if we should just introduce a variant of transform that takes a stop condition, e.g. ``` def transform(stopCondition: BaseType => Boolean)(rule: PartialFunct

[GitHub] spark issue #17839: [SPARK-20576][SQL] Support generic hint function in Data...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17839 Actually somebody should add the Python / R wrapper. cc @felixcheung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17770: [SPARK-20392][SQL] Set barrier to prevent re-entering a ...

2017-05-03 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17770 why don't we always add this to the dataset's logicalPlan? we can change that in one place. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #17770: [SPARK-20392][SQL] Set barrier to prevent re-ente...

2017-05-03 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17770#discussion_r114478015 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1134,7 +1138,7 @@ class Dataset[T] private[sql

[GitHub] spark pull request #17839: [SPARK-20576][SQL] Support generic hint function ...

2017-05-03 Thread rxin
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/17839 [SPARK-20576][SQL] Support generic hint function in Dataset/DataFrame ## What changes were proposed in this pull request? We allow users to specify hints (currently only "broadcast" is

[GitHub] spark issue #17806: [SPARK-20487][SQL] Display `serde` for `HiveTableScan` n...

2017-04-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17806 @gatorsmile i will let you merge ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17806: [SPARK-20487][SQL] Display `serde` for `HiveTableScan` n...

2017-04-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17806 Maybe get rid of the Some? If it is not defined, we probably just shouldn't show anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17780: [SPARK-20487][SQL] `HiveTableScan` node is quite verbose...

2017-04-28 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17780 Can we at least include the serde? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17773: [SPARK-20474] Fixing OnHeapColumnVector reallocation

2017-04-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17773 Merging in master/branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17772: [SPARK-20473] Enabling missing types in ColumnVector.Arr...

2017-04-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17772 Merging in master / branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17770: [SPARK-20392][SQL][WIP] Set barrier to prevent re-enteri...

2017-04-26 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17770 Can we fix the description? It is really confusing since it uses the word exchange. Also can we just skip a plan if it is resolved in transform? --- If your project is set up for it, you can reply

[GitHub] spark issue #17727: [SQL][MINOR] Remove misleading comment (and tags do bett...

2017-04-25 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17727 Hm I don't think the comment makes sense ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17753: [SPARK-20453] Bump master branch version to 2.3.0-SNAPSH...

2017-04-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17753 Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2017-04-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14731 Steve I think the main point is you should also respect the time of reviewers. The way most of your pull requests manifest have been suboptimal: they often start with a very early WIP (which

[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17648 sgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-24 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17736 cc @hvanhovell for review ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-23 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17712 Why use a map? That's super unstructured and easy to break ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-22 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17712 cc @gatorsmile This is related to the deterministic thing you want to do? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #17717: [SPARK-20430][SQL] Initialise RangeExec parameters in a ...

2017-04-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17717 LGTM pending Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #17717: [SPARK-20430][SQL] Initialise RangeExec parameter...

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17717#discussion_r112803232 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest

[GitHub] spark pull request #17717: [SPARK-20430][SQL] Initialise RangeExec parameter...

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17717#discussion_r112803234 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1732,4 +1732,10 @@ class DataFrameSuite extends QueryTest

[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112803097 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -45,14 +45,33 @@ import

[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112800640 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -45,14 +45,33 @@ import

[GitHub] spark issue #17648: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2017-04-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17648 I was saying rather than implementing them, just rewrite them into an aggregate on the conditions and compare them against the value. --- If your project is set up for it, you can reply

[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112754224 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -47,12 +47,20 @@ case class UserDefinedFunction protected

[GitHub] spark issue #17710: [SPARK-20420][SQL] Add events to the external catalog

2017-04-21 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17710 Merging in master/branch-2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17712: [SPARK-20416][SQL] Print UDF names in EXPLAIN

2017-04-21 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17712#discussion_r112622098 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -47,12 +47,20 @@ case class UserDefinedFunction protected

[GitHub] spark issue #17711: [SPARK-19951][SQL] Add string concatenate operator || to...

2017-04-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17711 can you add a test case in sql query file tests? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #17711: [SPARK-19951][SQL] Add string concatenate operato...

2017-04-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17711#discussion_r112590613 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1483,4 +1483,12 @@ class SparkSqlAstBuilder(conf: SQLConf

[GitHub] spark issue #17705: [SPARK-20410][SQL] Make sparkConf a def in SharedSQLCont...

2017-04-20 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/17705 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

<    2   3   4   5   6   7   8   9   10   11   >