date:20161104

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-11-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15432 Thanks @gatorsmile! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15432 **[Test build #68185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68185/consoleFull)** for PR 15432 at commit

[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15432#discussion_r86658659 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala --- @@ -97,17 +101,15 @@ case class Rand(seed:

[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15432#discussion_r86658562 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala --- @@ -97,17 +101,15 @@ case class Rand(seed:

[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15432#discussion_r86658461 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala --- @@ -97,17 +101,15 @@ case class Rand(seed:

[GitHub] spark issue #15767: [SPARK-18269][SQL] CSV datasource should read null prope...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15767 **[Test build #68184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68184/consoleFull)** for PR 15767 at commit

[GitHub] spark issue #15767: [SPARK-18269][SQL] CSV datasource should read null prope...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15767 **[Test build #68182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68182/consoleFull)** for PR 15767 at commit

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #68183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68183/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin

Github user rxin commented on the issue: https://github.com/apache/spark/pull/15637 We already have those don't we? sparks own hash expresssion. On Friday, November 4, 2016, Zhenhua Wang wrote: > In that way, we can only get the hashed value

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15132 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15132 **[Test build #68179 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68179/consoleFull)** for PR 15132 at commit

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15132 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68179/ Test PASSed. ---

[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14750 **[Test build #68181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68181/consoleFull)** for PR 14750 at commit

[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread cloud-fan

Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15766 Only the owner(yourself) can close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null/long as input ...

2016-11-04 Thread gatorsmile

Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15432 LGTM except a few minor comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14498: [SPARK-16904] [SQL] Removal of Hive Built-in Hash...

2016-11-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14498#discussion_r86658169 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala --- @@ -487,24 +487,6 @@ private[hive] class TestHiveQueryExecution(

[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15432#discussion_r86658154 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala --- @@ -97,17 +101,15 @@ case class Rand(seed:

[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15432#discussion_r86658156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala --- @@ -87,6 +87,10 @@ case class Rand(seed:

[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null/long as...

2016-11-04 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15432#discussion_r86658157 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/randomExpressions.scala --- @@ -64,17 +66,15 @@ abstract class RDG

[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12135 **[Test build #68180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68180/consoleFull)** for PR 12135 at commit

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68178/ Test FAILed. ---

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #68178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68178/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15132 **[Test build #68179 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68179/consoleFull)** for PR 15132 at commit

[GitHub] spark pull request #15767: [SPARK-18269][SQL] CSV datasource should read nul...

2016-11-04 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15767#discussion_r86658040 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -232,7 +232,7 @@ private[csv] object

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread koeninger

Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 @rxin thanks, changed to abstract class. If you think that's sufficient future proofing I otherwise think this is a worthwhile change, seems like it meets a real user need. --- If your project

[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15693 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68173/ Test PASSed. ---

[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15693 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15693 **[Test build #68173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68173/consoleFull)** for PR 15693 at commit

[GitHub] spark issue #14660: [SPARK-17071][SQL] Add an option to support for reading ...

2016-11-04 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14660 @rxin, I thought reading a small file is a possible corner case. In this case, it would not be only a small fraction. Improving it without a regression might be a legitimate optimization by

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #68178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68178/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14750 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68176/ Test FAILed. ---

[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14750 **[Test build #68176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68176/consoleFull)** for PR 14750 at commit

[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14750 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread windpiger

Github user windpiger commented on the issue: https://github.com/apache/spark/pull/15766 @cloud-fan I will appreciate that you can help to close this PR~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread windpiger

Github user windpiger commented on the issue: https://github.com/apache/spark/pull/15766 @rxin @cloud-fan you are rigth,hash should be unregistered and replace with Hive's hash, or we could put the failed hash testcase into blacklist as @gatorsmile 's work #14498 . I will close the

[GitHub] spark pull request #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put ...

2016-11-04 Thread ericl

Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14750#discussion_r86657628 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -583,6 +633,50 @@ private[spark] class HiveExternalCatalog(conf:

[GitHub] spark pull request #15746: [SPARK-18239][SPARKR] Gradient Boosted Tree for R

2016-11-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/15746#discussion_r86657586 --- Diff: R/pkg/R/mllib.R --- @@ -1828,13 +1849,13 @@ setMethod("summary", signature(object = "RandomForestRegressionModel"), #' @note

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15314 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68175/ Test PASSed. ---

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15314 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15314 **[Test build #68175 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68175/consoleFull)** for PR 15314 at commit

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68177/ Test FAILed. ---

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #68177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68177/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #12904: [SPARK-15125][SQL] Changing CSV data source mapping of e...

2016-11-04 Thread sureshthalamati

Github user sureshthalamati commented on the issue: https://github.com/apache/spark/pull/12904 I was testing the fix with different scenarios mentioned in the comments. I can not make CSV writer write quoted empty string for empty strings in the data. One of the issue I filed

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15637 In that way, we can only get the hashed value instead of real value of the column, right? So I think we still need to implement a hash code for fractional types. --- If your project is set up for

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #68177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68177/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #15776: [SPARK-17710][Follow UP] Add comments to state why 'Util...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15776 **[Test build #3416 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3416/consoleFull)** for PR 15776 at commit

[GitHub] spark issue #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive se...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14750 **[Test build #68176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68176/consoleFull)** for PR 14750 at commit

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15763 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68170/ Test PASSed. ---

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15763 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin

Github user rxin commented on the issue: https://github.com/apache/spark/pull/15637 Yea good point - the underlying implementation really only needs a hash code, it would be trivial to support all types. But even easier, I think you can just compute the hash (using sql expression)

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15763 **[Test build #68170 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68170/consoleFull)** for PR 15763 at commit

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15637 Yes, but in its impl, it actually only supports above types. ``` @Override public void add(Object item, long count) { if (item instanceof String) { addString((String)

[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread sethah

Github user sethah commented on the issue: https://github.com/apache/spark/pull/15779 +1 on removing the use of exceptions. I thought it was a bit of an awkward solution to begin with. Thanks a lot for this pr, I will take a look soon. --- If your project is set up for it, you can

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin

Github user rxin commented on the issue: https://github.com/apache/spark/pull/15637 It supports arbitrary objects ``` /** * Increments {@code item}'s count by one. */ public abstract void add(Object item); ``` --- If your project is set up for

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15637 OK, I'll try to use count min sketch. BTW, seems it only supports Integral and String types for now? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15314 **[Test build #68175 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68175/consoleFull)** for PR 15314 at commit

[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread gatorsmile

Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15766 I am working on the related issue in https://github.com/apache/spark/pull/14498 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-04 Thread zhengruifeng

Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/15314 Typo fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #15766: [SPARK-18271][SQL]hash udf in HiveSessionCatalog.hiveFun...

2016-11-04 Thread cloud-fan

Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15766 @rxin good catch! We do unregister the spark builtin hash in test:

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin

Github user rxin commented on the issue: https://github.com/apache/spark/pull/15637 It supports any data types, uses less space and can provide probabilistic frequencies for a large number of distinct values, and it's already implemented in Spark. We just need to add a wrapper for it

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15637 Oh, do you mean using count min sketch for equi-width histogram? Actually I don't know about the sketch, I need to look into it to see if it's easy to use as an agg function and also support

[GitHub] spark pull request #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put ...

2016-11-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14750#discussion_r86656669 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -521,15 +521,15 @@ class SQLQuerySuite extends QueryTest

[GitHub] spark pull request #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put ...

2016-11-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14750#discussion_r86656654 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -442,7 +468,9 @@ private[spark] class

[GitHub] spark pull request #14750: [SPARK-17183][SPARK-17983][SPARK-18101][SQL] put ...

2016-11-04 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14750#discussion_r86656644 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -417,11 +437,17 @@ private[spark] class

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15637 For equi-height hsitogram, we need extra info like ndv's in each bin. Does count min sketch also have this information? I had a discussion with Herman and Tim about histogram construction before, we

[GitHub] spark pull request #15756: [SPARK-18256] Improve the performance of event lo...

2016-11-04 Thread asfgit

Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15756 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15779 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15779 **[Test build #68174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68174/consoleFull)** for PR 15779 at commit

[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15779 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68174/ Test FAILed. ---

[GitHub] spark issue #15756: [SPARK-18256] Improve the performance of event log repla...

2016-11-04 Thread yhuai

Github user yhuai commented on the issue: https://github.com/apache/spark/pull/15756 Cool. Merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread jkbradley

Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15779 @sethah @yanboliang I just saw your PRs for SPARK-17748. Awesome change. I just saw a few nits along the way. The only major item is making SingularMatrixException private ml. This

[GitHub] spark issue #15779: [SPARK-17748][ML] Minor cleanups to one-pass linear regr...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15779 **[Test build #68174 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68174/consoleFull)** for PR 15779 at commit

[GitHub] spark pull request #15779: [SPARK-17748][ML] Minor cleanups to one-pass line...

2016-11-04 Thread jkbradley

GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/15779 [SPARK-17748][ML] Minor cleanups to one-pass linear regression with elastic net ## What changes were proposed in this pull request? * Made SingularMatrixException private ml *

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin

Github user rxin commented on the issue: https://github.com/apache/spark/pull/15637 Why not use count min sketch then? You would get more signal (with some error) for a much larger range of values. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15637 Yes, in our design, equi-width histogram is a seq of single valued bins, which is used for columns with low cardinality, so that we can get accurate estimation. When cardinality is high, equi-height

[GitHub] spark issue #15693: [SPARK-18125][SQL] Fix a compilation error in codegen du...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15693 **[Test build #68173 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68173/consoleFull)** for PR 15693 at commit

[GitHub] spark issue #15726: [SPARK-18107][SQL][FOLLOW-UP] Insert overwrite statement...

2016-11-04 Thread ericl

Github user ericl commented on the issue: https://github.com/apache/spark/pull/15726 @viirya that makes sense to me --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15693: [SPARK-18125][SQL] Fix a compilation error in cod...

2016-11-04 Thread viirya

Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15693#discussion_r86656043 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ReferenceToExpressions.scala --- @@ -63,15 +63,33 @@ case class

[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15769 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15769 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68168/ Test FAILed. ---

[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15769 **[Test build #68168 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68168/consoleFull)** for PR 15769 at commit

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #68172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68172/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68172/ Test FAILed. ---

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread rxin

Github user rxin commented on the issue: https://github.com/apache/spark/pull/15637 How would this pr help you with equi-width histogram? This function just gives you the frequency count for keys when the cardinality is low. --- If your project is set up for it, you can reply to

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #68171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68171/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11105 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68171/ Test FAILed. ---

[GitHub] spark issue #15726: [SPARK-18107][SQL][FOLLOW-UP] Insert overwrite statement...

2016-11-04 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/15726 @ericl Currently I prefer the first one, let `HiveClientImpl` create multiple internal thrift clients, since I don't like to change external catalog for this. --- If your project is set up for it,

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #68172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68172/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15637 @rxin Moreover, with this pr we can compute accurate equi-width histogram, while count min sketch only gets an estimated result? I think it's better to have an accurate result given that both methods

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11105 **[Test build #68171 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68171/consoleFull)** for PR 11105 at commit

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15637 @rxin About the name "map_aggregate", actually it was suggested by srinath, do you have a better name? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #15637: [SPARK-18000] [SQL] Aggregation function for computing b...

2016-11-04 Thread wzhfy

Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/15637 @rxin By "histograms" for numeric columns, do you mean equi-height histogram? The main purpose of this pr is to construct equi-width histogram without prior knowledge of ndv, so that we can compute

[GitHub] spark issue #15778: [SPARK-18283][Structured Streaming][Kafka] Added test to...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15778 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68169/ Test PASSed. ---

[GitHub] spark issue #15778: [SPARK-18283][Structured Streaming][Kafka] Added test to...

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15778 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15778: [SPARK-18283][Structured Streaming][Kafka] Added test to...

2016-11-04 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15778 **[Test build #68169 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68169/consoleFull)** for PR 15778 at commit

[GitHub] spark issue #15771: [SPARK-18260] Make from_json null safe

2016-11-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15771 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

1 2 3 4 5 6 7 >

1 - 100 of 637 matches

Mail list logo