[GitHub] spark pull request #12575: [SPARK-14803][SQL][Optimizer] A bug in EliminateS...

2016-09-10 Thread sun-rui
Github user sun-rui closed the pull request at: https://github.com/apache/spark/pull/12575 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-30 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14783 @clarkfitzg, your patch is for bug fix but not for performance improvement, right? If so, since there is no performance regression according to your benchmark, let's focus on the functionality. We

[GitHub] spark issue #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr s...

2016-08-30 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14744 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14046: [SPARK-16366][SPARKR] Fix time comparison failure...

2016-08-24 Thread sun-rui
Github user sun-rui closed the pull request at: https://github.com/apache/spark/pull/14046 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr s...

2016-08-24 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14744 @felixcheung, I guess that spark conf is preferred over env variable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set sparkr s...

2016-08-24 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14744 @zjffdu, basically LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #14744: [SPARK-17178][SPARKR][SPARKSUBMIT] Allow to set s...

2016-08-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14744#discussion_r76067417 --- Diff: docs/configuration.md --- @@ -1752,6 +1752,13 @@ showDF(properties, numRows = 200, truncate = FALSE) Executable for executing R scripts

[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...

2016-08-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76008164 --- Diff: R/pkg/NAMESPACE --- @@ -363,4 +363,9 @@ S3method(structField, jobj) S3method(structType, jobj) S3method(structType, structField

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r76007826 --- Diff: R/pkg/inst/worker/worker.R --- @@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer, deserializer, key, # available si

[GitHub] spark pull request #14775: [SPARK-16581][SPARKR] Make JVM backend calling fu...

2016-08-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14775#discussion_r76007335 --- Diff: R/pkg/NAMESPACE --- @@ -363,4 +363,9 @@ S3method(structField, jobj) S3method(structType, jobj) S3method(structType, structField

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r76005367 --- Diff: R/pkg/inst/worker/worker.R --- @@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer, deserializer, key, # available si

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r76000970 --- Diff: R/pkg/inst/worker/worker.R --- @@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer, deserializer, key, # available si

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r76000958 --- Diff: R/pkg/inst/tests/testthat/test_utils.R --- @@ -183,4 +183,13 @@ test_that("overrideEnvs", { expect_equal(config[["conf

[GitHub] spark pull request #14783: SPARK-16785 R dapply doesn't return array or raw ...

2016-08-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14783#discussion_r76000918 --- Diff: R/pkg/R/SQLContext.R --- @@ -183,6 +183,8 @@ getDefaultSqlSource <- function() { # TODO(davies): support sampling and infer type from

[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-19 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14639 Does this API get only the Spark SQL configurations or including SparkConf? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-18 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14639 If in the future SparkConf is needed, instead of passing all spark conf to R via env variables, we can expose API for accessing SparkConf in the R backend, similar to that in Pyspark. https

[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-18 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14639 I think there may be a simpler solution. Just as my comment in the JIRA, "EXISTING_SPARKR_BACKEND_PORT" env variable can be checked, instead of getting the whole spark conf from

[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-17 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14639 @zjffdu, yes, no need to download spark in yarn-client provided that spark-submit is called to launch an R script. I just want to verify that your change works in this case. But note if yarn

[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-17 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14639 @zjffdu, does your change work on launching an R script in yarn-client mode? It seems that it won't --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #14639: [SPARK-17054][SPARKR] SparkR can not run in yarn-cluster...

2016-08-16 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14639 This is not only about the correct cache dir under MAC OS, but also in yarn-cluster mode, there should not be downloading of Spark. --- If your project is set up for it, you can reply

[GitHub] spark issue #11157: [SPARK-11714][Mesos] Make Spark on Mesos honor port rest...

2016-08-15 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/11157 has this change be documented to spark on mesos guide? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14575: [SPARK-16522][MESOS] Spark application throws exc...

2016-08-10 Thread sun-rui
Github user sun-rui closed the pull request at: https://github.com/apache/spark/pull/14575 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14575: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-09 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14575 @mgummelt, @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14575: [SPARK-16522][MESOS] Spark application throws exc...

2016-08-09 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/14575 [SPARK-16522][MESOS] Spark application throws exception on exit. This is backport of https://github.com/apache/spark/pull/14175 to branch 2.0 You can merge this pull request into a Git repository

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-09 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14175 ok, will submit another PR for 2.0 branch --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-08-07 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r73823225 --- Diff: core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala --- @@ -396,6 +425,10 @@ class

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-08-07 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r73823074 --- Diff: core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala --- @@ -341,6 +344,32 @@ class

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-08-07 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r73823040 --- Diff: core/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala --- @@ -341,6 +344,32 @@ class

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-07 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14175 rebased to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-06 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14175 @mgummelt, regression test case added. Not sure it is the expected one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-08-04 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14175 @mgummelt, will do it soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-07-26 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r72368797 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -552,7 +552,12 @@ private[spark

[GitHub] spark issue #14175: [SPARK-16522][MESOS] Spark application throws exception ...

2016-07-26 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14175 Sure, will add it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

2016-07-25 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14309 @cloud-fan, could you help to review it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-07-25 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r72176972 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -552,7 +552,12 @@ private[spark

[GitHub] spark issue #14309: [SPARK-11977][SQL] Support accessing a column contains "...

2016-07-24 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14309 Does this solution co-operate with the access pattern of "table.column"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request #14309: [SPARK-11977][SQL] Support accessing a column con...

2016-07-23 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14309#discussion_r71982269 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -641,6 +641,10 @@ class DataFrameSuite extends QueryTest

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-07-23 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r71982037 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -552,7 +552,12 @@ private[spark

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r71728669 --- Diff: R/pkg/inst/extdata/spark_download.csv --- @@ -0,0 +1,2 @@ +"url","default" +"http://apache.osuosl.org",TRUE

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r71726598 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,160 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] spark pull request #14258: [Spark-16579][SparkR] add install_spark function

2016-07-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14258#discussion_r71726450 --- Diff: R/pkg/R/install.R --- @@ -0,0 +1,160 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license

[GitHub] spark issue #14264: [SPARK-11976][SPARKR] Support "." character in DataFrame...

2016-07-21 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14264 @rerngvit, I modifed the title of SPARK-11977 to a narrow scope. You can go for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #14264: [SPARK-11976][SPARKR] Support "." character in DataFrame...

2016-07-20 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14264 @rerngvit, sorry, I mean https://issues.apache.org/jira/browse/SPARK-11977. If your PR can enable accesses to columns with "." in their names without backticks, please first submit a PR

[GitHub] spark issue #14264: [SPARK-11976][SPARKR] Support "." character in DataFrame...

2016-07-19 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14264 @rerngvit, could you share the background that this PR can fix the issue. I see that https://issues.apache.org/jira/browse/SPARK-11976 is still open. Any other PR in Spark 2.0 make this possible

[GitHub] spark issue #14243: [SPARK-10683][SPARK-16510][SPARKR] Move SparkR include j...

2016-07-19 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14243 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-07-19 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r71284361 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -552,7 +552,9 @@ private[spark

[GitHub] spark issue #14243: [SPARK-10683][SPARK-16510][SPARKR] Move SparkR include j...

2016-07-19 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14243 better to add the SparkR installation check to the existing disable test? ignore("correctly builds R packages included in a jar with --packages") { ...} My expected solu

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-07-18 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/12836 no, go ahead to submit one:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14243: [SPARK-10683][SPARK-16510][SPARKR] Move SparkR include j...

2016-07-18 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14243 Will this test be run always no matter if the "sparkr" profile is specified or not? In other words, does R need to installed for all spark tests to pass? --- If your project is set up f

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-07-17 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14175#discussion_r71095619 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -552,7 +552,9 @@ private[spark

[GitHub] spark pull request #14192: [SPARK-16509][SPARKR] Rename window.partitionBy a...

2016-07-13 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/14192 [SPARK-16509][SPARKR] Rename window.partitionBy and window.orderBy to windowPartitionBy and windowOrderBy. ## What changes were proposed in this pull request? Rename window.partitionBy

[GitHub] spark pull request #14175: [SPARK-16522][MESOS] Spark application throws exc...

2016-07-13 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/14175 [SPARK-16522][MESOS] Spark application throws exception on exit. ## What changes were proposed in this pull request? Spark applications running on Mesos throw exception upon exit. For details

[GitHub] spark pull request #14046: [SPARK-16366][SPARKR] Fix time comparison failure...

2016-07-04 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/14046#discussion_r69501135 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1258,10 +1258,12 @@ test_that("date functions on a DataFrame", { df2 <- creat

[GitHub] spark pull request #14046: [SPARK-16366][SPARKR] Fix time comparison failure...

2016-07-04 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/14046 [SPARK-16366][SPARKR] Fix time comparison failures in SparkR unit tests. ## What changes were proposed in this pull request? Fix time comparison failures in SparkR unit tests. For details

[GitHub] spark issue #13760: [SPARK-16012][SparkR] Implement gapplyCollect which will...

2016-06-30 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/13760 no --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13975: [SPARK-16299][SPARKR] Capture errors from R worke...

2016-06-30 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13975#discussion_r69089989 --- Diff: R/pkg/inst/worker/daemon.R --- @@ -44,7 +44,7 @@ while (TRUE) { if (inherits(p, "masterProcess")) { clos

[GitHub] spark issue #13975: [SPARK-16299][SPARKR] Capture errors from R workers in d...

2016-06-29 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/13975 @shivaram, @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13975: [SPARK-16299][SPARKR] Capture errors from R worke...

2016-06-29 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/13975 [SPARK-16299][SPARKR] Capture errors from R workers in daemon.R to avoid deletion of R session temporary directory. ## What changes were proposed in this pull request? Capture errors from R

[GitHub] spark issue #13760: [SPARK-16012][SparkR] gapplyCollect - applies a R functi...

2016-06-21 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/13760 LGTM except on minor comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] gapplyCollect - applies a R...

2016-06-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r67983701 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2236,12 +2236,15 @@ test_that("gapply() on a DataFrame", { actual <

[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13660#discussion_r67881287 --- Diff: docs/sparkr.md --- @@ -262,6 +262,83 @@ head(df) {% endhighlight %} +### Applying User-defined Function +In SparkR, we

[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13660#discussion_r67880977 --- Diff: docs/sparkr.md --- @@ -262,6 +262,83 @@ head(df) {% endhighlight %} +### Applying User-defined Function +In SparkR, we

[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13660#discussion_r67880749 --- Diff: docs/sparkr.md --- @@ -262,6 +262,83 @@ head(df) {% endhighlight %} +### Applying User-defined Function +In SparkR, we

[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13660#discussion_r67880530 --- Diff: docs/sparkr.md --- @@ -262,6 +262,83 @@ head(df) {% endhighlight %} +### Applying User-defined Function +In SparkR, we

[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13660#discussion_r67880209 --- Diff: docs/sparkr.md --- @@ -262,6 +262,83 @@ head(df) {% endhighlight %} +### Applying User-defined Function +In SparkR, we

[GitHub] spark pull request #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13660#discussion_r67880117 --- Diff: docs/sparkr.md --- @@ -262,6 +262,83 @@ head(df) {% endhighlight %} +### Applying User-defined Function +In SparkR, we

[GitHub] spark issue #13660: [SPARK-15672][R][DOC] R programming guide update

2016-06-21 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/13660 Can you add documentation for gapply() and gapplyCollect() together here? or @NarineK will do in another PR? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #13752: [SPARK-16028][SPARKR] spark.lapply can work with active ...

2016-06-21 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/13752 I think spark.lapply() is a case that demonstrates the need for supporting Dataset in SparkR. Removing the explicit sc parameter is quite helpful to moving to Dataset internally in the future

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R...

2016-06-20 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r67799970 --- Diff: R/pkg/R/DataFrame.R --- @@ -1347,6 +1347,65 @@ setMethod("gapply", gapply(grouped, fu

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R...

2016-06-20 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r67799929 --- Diff: R/pkg/R/DataFrame.R --- @@ -1347,6 +1347,65 @@ setMethod("gapply", gapply(grouped, fu

[GitHub] spark pull request #13760: [SPARK-16012][SparkR] GapplyCollect - applies a R...

2016-06-20 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13760#discussion_r67799741 --- Diff: R/pkg/R/DataFrame.R --- @@ -1347,6 +1347,65 @@ setMethod("gapply", gapply(grouped, fu

[GitHub] spark issue #13790: remove duplicated docs in dapply

2016-06-20 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/13790 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

2016-06-19 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13763#discussion_r67610942 --- Diff: R/pkg/R/SQLContext.R --- @@ -330,6 +330,30 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio =

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

2016-06-19 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13763#discussion_r67610931 --- Diff: R/pkg/R/SQLContext.R --- @@ -330,6 +330,30 @@ jsonRDD <- function(sqlContext, rdd, schema = NULL, samplingRatio =

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

2016-06-19 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13763#discussion_r67610926 --- Diff: R/pkg/R/DataFrame.R --- @@ -701,6 +701,33 @@ setMethod("write.json", invisible(callJMethod(write, &q

[GitHub] spark issue #13684: [SPARK-15908][R] Add varargs-type dropDuplicates() funct...

2016-06-16 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/13684 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13635: [SPARK-15159][SPARKR] SparkR SparkSession API

2016-06-16 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13635#discussion_r67385138 --- Diff: R/pkg/R/sparkR.R --- @@ -270,27 +291,97 @@ sparkRSQL.init <- function(jsc = NULL) { #'} sparkRHive.init <- function(jsc

[GitHub] spark pull request #13635: [SPARK-15159][SPARKR] SparkR SparkSession API

2016-06-16 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13635#discussion_r67384711 --- Diff: R/pkg/NAMESPACE --- @@ -6,10 +6,15 @@ importFrom(methods, setGeneric, setMethod, setOldClass) #useDynLib(SparkR, stringHashCode

[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...

2016-06-15 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67287926 --- Diff: R/pkg/R/DataFrame.R --- @@ -1856,10 +1856,11 @@ setMethod("where", #' the subset of columns. #' #' @param x A Spar

[GitHub] spark issue #13635: [SPARK-15159][SPARKR] SparkR SparkSession API

2016-06-15 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/13635 @shivaram, I probably take a look at this tonight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-15 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/12836 @shivaram, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-15 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r67161527 --- Diff: R/pkg/inst/worker/worker.R --- @@ -79,75 +127,72 @@ if (numBroadcastVars > 0) { # Timing broadcast broadcastElap <- elaps

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-15 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/12836 @NarineK, there is one comment left un-addressed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66733080 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRunner.scala --- @@ -40,7 +40,8 @@ private[spark] class RRunner[U]( broadcastVars: Array

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-12 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/12836 @shivaram, I think we are reaching the final version:). It would be better that you can have a detailed review on the examples and test cases. --- If your project is set up for it, you can reply

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66721674 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/objects.scala --- @@ -325,6 +330,71 @@ case class MapGroupsExec

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66721643 --- Diff: R/pkg/inst/worker/worker.R --- @@ -79,75 +127,72 @@ if (numBroadcastVars > 0) { # Timing broadcast broadcastElap <- elaps

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66721611 --- Diff: R/pkg/inst/worker/worker.R --- @@ -79,75 +127,72 @@ if (numBroadcastVars > 0) { # Timing broadcast broadcastElap <- elaps

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66721551 --- Diff: R/pkg/inst/worker/worker.R --- @@ -27,6 +27,54 @@ elapsedSecs <- function() { proc.time()[3] } +compute <- functio

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66721485 --- Diff: R/pkg/R/group.R --- @@ -142,3 +142,58 @@ createMethods <- function() { } createMethods() + +#' gap

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66721436 --- Diff: R/pkg/R/group.R --- @@ -142,3 +142,58 @@ createMethods <- function() { } createMethods() + +#' gap

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-12 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/12836 yes, let's do it in a separate PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66721354 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -381,6 +385,50 @@ class RelationalGroupedDataset protected[sql

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r66721347 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -381,6 +385,50 @@ class RelationalGroupedDataset protected[sql

[GitHub] spark pull request #13394: [SPARK-15490][R][DOC] SparkR 2.0 QA: New R APIs a...

2016-06-07 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/13394#discussion_r66019351 --- Diff: R/pkg/R/functions.R --- @@ -249,6 +249,10 @@ col <- function(x) { #' #' Returns a Column based on the given column n

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-07 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/12836 I guess the byte array of the serialized R function is dumped. Let me find which commit caused this. I guess something like overriding toString may solve this --- If your project is set up

[GitHub] spark issue #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on DataFra...

2016-06-02 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/12836 @NarineK, thanks for hard work. Left some comments for you. @shivaram, do we still have time window for this to be in 2.0? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-02 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65563636 --- Diff: R/pkg/inst/worker/worker.R --- @@ -84,68 +136,51 @@ broadcastElap <- elapsedSecs() # as number of partitions to create. numPartiti

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-02 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65563441 --- Diff: R/pkg/inst/worker/worker.R --- @@ -27,6 +27,58 @@ elapsedSecs <- function() { proc.time()[3] } +computeHelper <- fu

[GitHub] spark pull request #12836: [SPARK-12922][SparkR][WIP] Implement gapply() on ...

2016-06-02 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r65563298 --- Diff: R/pkg/R/DataFrame.R --- @@ -1266,6 +1266,83 @@ setMethod("dapplyCollect", ldf })

  1   2   3   4   5   6   7   8   9   >