e setosa, virginica would be created with 0 and 1 as
> values
On Mon, Jan 25, 2016 at 12:37 PM, Deborah Siegel <deborah.sie...@gmail.com>
wrote:
> Maybe not ideal, but since read.df is inferring all columns from the csv
> containing "NA" as type of strings, one could filter
you are right.
>
> I think the problem is with reading of csv files. read.df is not
> considering NAs in the CSV file
>
> So what would be a workable solution in dealing with NAs in csv files?
>
>
>
> On Mon, Jan 25, 2016 at 2:31 PM, Deborah Siegel <deborah.sie...@gmail
Hi,
Can PCA be implemented in a SparkR-MLLib integration?
perhaps 2 separate issues..
1) Having the methods in SparkRWrapper and RFormula which will send the
right input types through the pipeline
MLLib PCA operates either on a RowMatrix, or the feature vector of an
RDD[LabeledPoint]. The
` exists ? The
error message seems to indicate it is trying to pick up Spark from
that location and can't seem to find Spark installed there.
Thanks
Shivaram
On Thu, Aug 20, 2015 at 3:30 PM, Deborah Siegel
deborah.sie...@gmail.com wrote:
Hello,
I have previously successfully run SparkR
Hello,
I have previously successfully run SparkR in RStudio, with:
Sys.setenv(SPARK_HOME=~/software/spark-1.4.1-bin-hadoop2.4)
.libPaths(c(file.path(Sys.getenv(SPARK_HOME), R, lib), .libPaths()))
library(SparkR)
sc - sparkR.init(master=local[2],appName=SparkR-example)
Then I tried putting some
I think I just answered my own question. The privitization of the RDD API
might have resulted in my error, because this worked:
randomMatBr - SparkR:::broadcast(sc, randomMat)
On Mon, Aug 3, 2015 at 4:59 PM, Deborah Siegel deborah.sie...@gmail.com
wrote:
Hello,
In looking at the SparkR
Hello,
In looking at the SparkR codebase, it seems as if broadcast variables ought
to be working based on the tests.
I have tried the following in sparkR shell, and similar code in RStudio,
but in both cases got the same message
randomMat - matrix(nrow=10, ncol=10, data=rnorm(100))
Hi,
I selected a starter task in JIRA, and made changes to my github fork of
the current code.
I assumed I would be able to build and test.
% mvn clean compile was fine
but
%mvn package failed
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-surefire-plugin:2.18:test (default-test)
Hello,
I'm new to ec2. I've set up a spark cluster on ec2 and am using
persistent-hdfs with the data nodes mounting ebs. I launched my cluster
using spot-instances
./spark-ec2 -k mykeypair -i ~/aws/mykeypair.pem -t m3.xlarge -s 4 -z
us-east-1c --spark-version=1.2.0 --spot-price=.0321
Harika,
I think you can modify existing spark on ec2 cluster to run Yarn mapreduce,
not sure if this is what you are looking for.
To try:
1) logon to master
2) go into either ephemeral-hdfs/conf/ or persistent-hdfs/conf/
and add this to mapred-site.xml :
property
Hi,
Someone else will have a better answer. I think that for standalone mode,
executors will grab whatever cores they can based on either configurations
on the worker, or application specific configurations. Could be wrong, but
I believe mesos is similar to this- and that YARN is alone in the
Hello,
I am running through examples given on
http://spark.apache.org/docs/1.2.1/graphx-programming-guide.html
The section for Map Reduce Triplets Transition Guide (Legacy) indicates
that one can run the following .aggregateMessages code
val graph: Graph[Int, Float] = ...
def msgFun(triplet:
Hi Michael,
Would you help me understand the apparent difference here..
The Spark 1.2.1 programming guide indicates:
Note that if you call schemaRDD.cache() rather than
sqlContext.cacheTable(...), tables will *not* be cached using the in-memory
columnar format, and therefore
Hi Abe,
I'm new to Spark as well, so someone else could answer better. A few
thoughts which may or may not be the right line of thinking..
1) Spark properties can be set on the SparkConf, and with flags in
spark-submit, but settings on SparkConf take precedence. I think your jars
flag for
14 matches
Mail list logo