I'm trying to understand the basics of Spark internals and Spark
documentation for submitting applications in local mode says for
spark-submit --master setting:
local[K] Run Spark locally with K worker threads (ideally, set this to the
number of cores on your machine).
local[*] Run Spark locally
The question is about the ways to create a Windows desktop-based and/or
web-based application client that is able to connect and talk to the server
containing Spark application (either local or on-premise cloud
distributions) in the run-time.
Any language/architecture may work. So far, I've seen
Hi there,
Is there a way to specify the AWS AMI with particular OS (say Ubuntu) when
launching Spark on Amazon cloud with provided scripts?
What is the default AMI, operating system that is launched by EC-2 script?
Thanks
--
View this message in context:
Hi there,
I'm trying out Spark Job Server (REST) to submit jobs to spark cluster. I
believe that my problem is unrelated to this specific software, but
otherwise generic issue with missing jars on paths. So every application
implements the trait with SparkJob class:
/object LongPiJob extends
I've set up the EC2 cluster with Spark. Everything works, all master/slaves
are up and running.
I'm trying to submit a sample job (SparkPi). When I ssh to cluster and
submit it from there - everything works fine. However when driver is created
on a remote host (my laptop), it doesn't work. I've
I'm trying to launch Spark cluster on AWS EC2 with custom AMI (Ubuntu) using
the following:
./ec2/spark-ec2 --key-pair=*** --identity-file='/home/***.pem'
--region=us-west-2 --zone=us-west-2b --spark-version=1.2.1 --slaves=2
--instance-type=t2.micro --ami=ami-29ebb519 --user=ubuntu launch
Assuming there is a text file with unknown number of columns, how one would
create a data frame? I have followed the example in Spark Docs where one
first creates a RDD of Rows, but it seems that you have to know exact number
of columns in file and can't to just this:
val rowRDD =
What would be the most efficient neat method to add a column with row ids to
dataframe?
I can think of something as below, but it completes with errors (at line 3),
and anyways doesn't look like the best route possible:
var dataDF = sc.textFile(path/file).toDF()
val rowDF = sc.parallelize(1 to
More generic version of a question below:
Is it possible to append a column to existing DataFrame at all? I understand
that this is not an easy task in Spark environment, but is there any
workaround?
--
View this message in context: