Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-222454756
LGTM - my internet connection is a bit patchy as I'm traveling. Will merge
later today.
---
If your project is set up for it, you can reply to this email and have your
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r65036342
--- Diff: docs/ml-features.md ---
@@ -145,9 +148,11 @@ for more details on the API.
passed to other algorithms like LDA.
During the fitting
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-222409058
@MLnick Please let me know if there is anything else that I can help with
this PR
---
If your project is set up for it, you can reply to this email and have
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221966410
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221966408
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221966136
**[Test build #59399 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59399/consoleFull)**
for PR 13176 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221962590
**[Test build #59399 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59399/consoleFull)**
for PR 13176 at commit
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64799207
--- Diff: docs/ml-features.md ---
@@ -145,9 +148,11 @@ for more details on the API.
passed to other algorithms like LDA.
During the
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64698276
--- Diff: docs/ml-features.md ---
@@ -145,9 +148,11 @@ for more details on the API.
passed to other algorithms like LDA.
During the fitting
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64698136
--- Diff: docs/ml-features.md ---
@@ -53,7 +53,10 @@ collisions, where different raw features may become the
same term after hashing.
chance of
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221761348
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221761350
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221761281
**[Test build #59328 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59328/consoleFull)**
for PR 13176 at commit
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64683245
--- Diff: docs/ml-features.md ---
@@ -53,7 +53,10 @@ collisions, where different raw features may become the
same term after hashing.
chance of
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221760381
**[Test build #59328 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59328/consoleFull)**
for PR 13176 at commit
Github user yanboliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64535542
--- Diff: docs/ml-features.md ---
@@ -53,7 +53,10 @@ collisions, where different raw features may become the
same term after hashing.
chance of
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64523585
--- Diff: docs/ml-features.md ---
@@ -151,7 +151,7 @@ for more details on the API.
term frequency across the corpus. An optional parameter `minDF` also
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64523407
--- Diff: docs/ml-features.md ---
@@ -1100,7 +1100,7 @@ for more details on the API.
categorical features. The number of bins is set by the `numBuckets`
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64523300
--- Diff: docs/ml-features.md ---
@@ -1098,9 +1098,9 @@ for more details on the API.
`QuantileDiscretizer` takes a column with continuous features
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221410132
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221410130
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221409973
**[Test build #59222 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59222/consoleFull)**
for PR 13176 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221408041
**[Test build #59222 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59222/consoleFull)**
for PR 13176 at commit
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64476690
--- Diff: docs/ml-features.md ---
@@ -1098,9 +1098,9 @@ for more details on the API.
`QuantileDiscretizer` takes a column with continuous
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221222998
@GayathriMurali thanks. Made another small comment to make the descirption
of the binary parameter consistent. Also please check the `QuantileDiscretizer`
example in
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64362121
--- Diff: docs/ml-features.md ---
@@ -151,7 +151,7 @@ for more details on the API.
term frequency across the corpus. An optional parameter `minDF` also
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64358698
--- Diff: docs/ml-features.md ---
@@ -1098,9 +1098,9 @@ for more details on the API.
`QuantileDiscretizer` takes a column with continuous features
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64355101
--- Diff: docs/ml-features.md ---
@@ -1098,9 +1098,9 @@ for more details on the API.
`QuantileDiscretizer` takes a column with continuous features
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221145417
**[Test build #59171 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59171/consoleFull)**
for PR 13176 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221145486
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221145485
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221144374
**[Test build #59171 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59171/consoleFull)**
for PR 13176 at commit
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64299667
--- Diff: docs/ml-features.md ---
@@ -145,9 +148,11 @@ for more details on the API.
passed to other algorithms like LDA.
During the
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64273246
--- Diff: docs/ml-features.md ---
@@ -1092,14 +1097,11 @@ for more details on the API.
## QuantileDiscretizer
`QuantileDiscretizer` takes a
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64273057
--- Diff: docs/ml-features.md ---
@@ -1092,14 +1097,11 @@ for more details on the API.
## QuantileDiscretizer
`QuantileDiscretizer` takes a
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-221020100
@MLnick I fixed all review comments. Can you please let me know if there is
anything else to be done to help get this merged?
---
If your project is set up
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220727902
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220727900
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220727811
**[Test build #59036 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59036/consoleFull)**
for PR 13176 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220726383
**[Test build #59036 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59036/consoleFull)**
for PR 13176 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220725160
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220725078
**[Test build #59032 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59032/consoleFull)**
for PR 13176 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220725163
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64111073
--- Diff: docs/ml-features.md ---
@@ -145,9 +148,11 @@ for more details on the API.
passed to other algorithms like LDA.
During the
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64110989
--- Diff: docs/ml-features.md ---
@@ -53,7 +53,10 @@ collisions, where different raw features may become the
same term after hashing.
chance of
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220723607
**[Test build #59032 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59032/consoleFull)**
for PR 13176 at commit
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220723548
@MLnick The latest commit includes just the ml-feature.md changes. I
removed all the other example files and feature.py.
---
If your project is set up for it,
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64101972
--- Diff: docs/ml-features.md ---
@@ -1093,13 +,10 @@ for more details on the API.
`QuantileDiscretizer` takes a column with continuous
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64100861
--- Diff: docs/ml-features.md ---
@@ -1093,13 +,10 @@ for more details on the API.
`QuantileDiscretizer` takes a column with continuous
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64100666
--- Diff: docs/ml-features.md ---
@@ -1093,13 +,10 @@ for more details on the API.
`QuantileDiscretizer` takes a column with continuous
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220704753
@oliverpierson @GayathriMurali I opened #13228 for the `relativeError`
param as well as cleaned up doc for `QuantileDiscretizer`
---
If your project is set up for it,
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220703852
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220703848
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220703632
**[Test build #59014 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59014/consoleFull)**
for PR 13176 at commit
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220698824
Something messed up the `git push`. I will send another commit
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220692810
**[Test build #59014 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59014/consoleFull)**
for PR 13176 at commit
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64087912
--- Diff: docs/ml-features.md ---
@@ -26,7 +26,9 @@ This section covers algorithms for working with features,
roughly divided into t
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64080140
--- Diff: docs/ml-features.md ---
@@ -114,7 +116,10 @@ for more details on the API.
During the fitting process, `CountVectorizer` will select the
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64079535
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and dividing it
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64079652
--- Diff: docs/ml-features.md ---
@@ -26,7 +26,9 @@ This section covers algorithms for working with features,
roughly divided into t
Github user thunterdb commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64079509
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and dividing it
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64079147
--- Diff: docs/ml-features.md ---
@@ -114,7 +116,10 @@ for more details on the API.
During the fitting process, `CountVectorizer` will select
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64078981
--- Diff: docs/ml-features.md ---
@@ -26,7 +26,9 @@ This section covers algorithms for working with features,
roughly divided into t
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64078816
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaCountVectorizerExample.java
---
@@ -54,6 +54,7 @@ public static void main(String[]
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64077742
--- Diff: docs/ml-features.md ---
@@ -114,7 +116,10 @@ for more details on the API.
During the fitting process, `CountVectorizer` will select the
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64077625
--- Diff: docs/ml-features.md ---
@@ -26,7 +26,9 @@ This section covers algorithms for working with features,
roughly divided into t
Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64077367
--- Diff: docs/ml-features.md ---
@@ -26,7 +26,9 @@ This section covers algorithms for working with features,
roughly divided into t
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64076959
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaCountVectorizerExample.java
---
@@ -54,6 +54,7 @@ public static void main(String[] args) {
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64075252
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and
Github user GayathriMurali commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64073253
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaCountVectorizerExample.java
---
@@ -54,6 +54,7 @@ public static void main(String[]
Github user oliverpierson commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64053917
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64051081
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and dividing it
Github user oliverpierson commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64050222
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220566097
Build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220566098
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220565892
**[Test build #58967 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58967/consoleFull)**
for PR 13176 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220538455
**[Test build #58967 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58967/consoleFull)**
for PR 13176 at commit
Github user MLnick commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220537993
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64001816
--- Diff: docs/ml-features.md ---
@@ -114,7 +116,10 @@ for more details on the API.
During the fitting process, `CountVectorizer` will select the top
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64001752
--- Diff: docs/ml-features.md ---
@@ -114,7 +116,10 @@ for more details on the API.
During the fitting process, `CountVectorizer` will select the top
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64001546
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and dividing it
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64001345
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and dividing it
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64001064
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaQuantileDiscretizerExample.java
---
@@ -58,7 +58,8 @@ public static void main(String[]
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64001008
--- Diff:
examples/src/main/java/org/apache/spark/examples/ml/JavaCountVectorizerExample.java
---
@@ -54,6 +54,7 @@ public static void main(String[] args) {
Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13176#discussion_r64000156
--- Diff: docs/ml-features.md ---
@@ -1064,7 +1069,8 @@ categorical features.
The bin ranges are chosen by taking a sample of the data and dividing it
Github user GayathriMurali commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220513197
@hhbyyh Can you please help review this? I will resolve the branch conflict
along with review comments
---
If your project is set up for it, you can reply to
Github user HyukjinKwon commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220202679
(@GayathriMurali It seems the title is incomplete ending with ... Maybe it
would be nicer if the title is complete and rebased for the conflict)
---
If your
GitHub user GayathriMurali opened a pull request:
https://github.com/apache/spark/pull/13176
[SPARK-15100][DOC] Modified user guide and examples for CountVectorizâ¦
## What changes were proposed in this pull request?
This is partial document changes to ml.feature. Made
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/13176#issuecomment-220125864
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
89 matches
Mail list logo