Yes, did you see the PR for SPARK-2808?
https://github.com/apache/spark/pull/3631/files
It requires more than just changing the version.
On Tue, Feb 10, 2015 at 3:11 PM, Ted Yu yuzhih...@gmail.com wrote:
Compiling Spark master branch against Kafka 0.8.2, I got:
[WARNING]
Thanks, Akhil. I had high hopes for #2, but tried all and no luck.
I was looking at the source and found something interesting. The Stack
Trace (below) directs me to FileInputDStream.scala (line 141). This is
version 1.1.1, btw. Line 141 has:
private def fs: FileSystem = {
if (fs_ ==
Compiling Spark master branch against Kafka 0.8.2, I got:
[WARNING]
/home/hbase/spark/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala:62:
no valid targets for annotation on value ssc_ - it is discarded unused.
You may specify targets with
That PR hasn't been updated since the new kafka streaming stuff (including
KafkaCluster) got merged to master, it will require more changes than
what's in there currently.
On Tue, Feb 10, 2015 at 9:25 AM, Sean Owen so...@cloudera.com wrote:
Yes, did you see the PR for SPARK-2808?
Try the following:
1. Set the access key and secret key in the sparkContext:
ssc.sparkContext.hadoopConfiguration.set(AWS_ACCESS_KEY_ID,yourAccessKey)
ssc.sparkContext.hadoopConfiguration.set(AWS_SECRET_ACCESS_KEY,yourSecretKey)
2. Set the access key and secret key in the environment before
I'm using ElasticSearch with elasticsearch-spark-BUILD-SNAPSHOT and
Spark/SparkSQL 1.2.0, from Costin Leau's advice.
I want to query ElasticSearch for a bunch of JSON documents from within
SparkSQL, and then use a SQL query to simply query for a column, which is
actually a JSON key -- normal
Hi,
I'm new to Spark. I have built small spark on yarn cluster, which contains
1 master(20GB RAM, 8 core), 3 worker(4GB RAM, 4 core). When trying to run a
command sc.parallelize(1 to 1000).count() through
$SPARK_HOME/bin/spark-shell, sometimes the command can submit a job
successfully, sometimes
[Additional info] I was using the master branch of 9 Feb 2015, the latest
commit in git info is:
commit 0793ee1b4dea1f4b0df749e8ad7c1ab70b512faf
Author: Sandy Ryza sa...@cloudera.com
Date: Mon Feb 9 10:12:12 2015 +
SPARK-2149. [MLLIB] Univariate kernel density estimation
Author:
Hi Silvio,
So the Initial SQL is executing now, I did not have the * added that
and it worked fine. FWIW the * is not needed for the parquet files:
create temporary table test
using org.apache.spark.sql.json
options (path '/data/out/*')
;
cache table test;
select count(1) from test;
I'm running a small job on a cluster with 15G of mem and 8G of disk per
machine.
The job always get into a deadlock where the last error message is:
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at
Hi,
Now I need to upgrade my spark cluster from version 1.1.0 to 1.2.1 , if
there is convenient way to do this. something like ./start-dfs.sh
(http://start-dfs.sh) -upgrade in hadoop
Best Wishs
THX
--
qiaou
已使用 Sparrow (http://www.sparrowmailapp.com/?sig)
It looks like this is related to the underlying Hadoop configuration.
Try to deploy the Hadoop configuration with your job with --files and
--driver-class-path, or to the default /etc/hadoop/conf core-site.xml.
If that is not an option (depending on how your Hadoop cluster is setup), then
hard
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory
was introduced in Hive 0.14 and Spark SQL only supports Hive 0.12 and
0.13.1. Can you change the setting of hive.security.authorization.manager
to someone accepted by 0.12 or 0.13.1?
On Thu, Feb 5, 2015
Hi Zsolt,
spark.executor.memory, spark.executor.cores, and spark.executor.instances
are only honored when launching through spark-submit. Marcelo is working
on a Spark launcher (SPARK-4924) that will enable using these
programmatically.
That's correct that the error comes up when
Is the SparkContext you're using the same one that the StreamingContext
wraps? If not, I don't think using two is supported.
-Sandy
On Tue, Feb 10, 2015 at 9:58 AM, Jon Gregg jonrgr...@gmail.com wrote:
I'm still getting an error. Here's my code, which works successfully when
tested using
First off, I'd recommend using the latest es-hadoop beta (2.1.0.Beta3) or even
better, the dev build [1].
Second, using the native Java/Scala API [2] since the configuration and
performance are both easier.
Third, when you are using JSON input, tell es-hadoop/spark that. the connector
can work
hi costin i upgraded the es hadoop connector , and at this point i can't
use scala, but still getting same error
On Tue, Feb 10, 2015 at 10:34 PM, Costin Leau costin.l...@gmail.com wrote:
Hi shahid,
I've sent the reply to the group - for some reason I replied to your
address instead of the
One more question: Is there reason why Spark throws an error when
requesting too much memory instead of capping it to the maximum value (as
YARN would do by default)?
Thanks!
2015-02-10 17:32 GMT+01:00 Zsolt Tóth toth.zsolt@gmail.com:
Hi,
I'm using Spark in yarn-cluster mode and submit
I was looking at updateStateByKey documentation,
It passes in a values Seq which contains values that have the same key.
I would like to know if there is any ordering to these values. My feeling is
that there is no ordering, but maybe it does preserve RDD ordering.
Example: RDD[ (a,2), (a,3),
I'm still getting an error. Here's my code, which works successfully when
tested using spark-shell:
val badIPs = sc.textFile(/user/sb/badfullIPs.csv).collect
val badIpSet = badIPs.toSet
val badIPsBC = sc.broadcast(badIpSet)
The job looks OK from my end:
15/02/07 18:59:58
Since the stacktrace shows kryo is being used, maybe, you could also try
increasing spark.kryoserializer.buffer.max.mb. Hope this help.
Kelvin
On Tue, Feb 10, 2015 at 1:26 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
You could try increasing the driver memory. Also, can you be more specific
Hi,
I'm using Spark in yarn-cluster mode and submit the jobs programmatically
from the client in Java. I ran into a few issues when tried to set the
resource allocation properties.
1. It looks like setting spark.executor.memory, spark.executor.cores and
spark.executor.instances have no effect
Can you try using backticks to quote the field name? Like `f:price`.
On Tue, Feb 10, 2015 at 5:47 AM, presence2001 neil.andra...@thefilter.com
wrote:
Hi list,
I have some data with a field name of f:price (it's actually part of a JSON
structure loaded from ElasticSearch via
Also long due given our usage of Spark ..
Radius Intelligence:
URL: radius.com
Description:
Spark, MLLib
Using Scala, Spark and MLLib for Radius Marketing and Sales intelligence
platform including data aggregation, data processing, data clustering, data
analysis and predictive modeling of
Unfortunately no. I just removed the persist statements to get the job to run,
but now it sometimes fails with
Job aborted due to stage failure: Task 162 in stage 2.1 failed 4 times, most
recent failure: Lost task 162.3 in stage 2.1 (TID 1105, xxx.compute.internal):
Agree. PySpark would call spark-submit. Check out the command line there.
--- Original Message ---
From: Mohit Singh mohit1...@gmail.com
Sent: February 9, 2015 11:26 PM
To: Ashish Kumar ashish.ku...@innovaccer.com
Cc: user@spark.apache.org
Subject: Re: ImportError: No module named pyspark, when
You could try increasing the driver memory. Also, can you be more specific
about the data volume?
Thanks
Best Regards
On Mon, Feb 9, 2015 at 3:30 PM, Yifan LI iamyifa...@gmail.com wrote:
Hi,
I just found the following errors during computation(graphx), anyone has
ideas on this? thanks so
Hello,
as the original message never got accepted to the mailinglist, I quote it
here completely:
Kevin Jung wrote
Hi all,
The size of shuffle write showing in spark web UI is much different when I
execute same spark job on same input data(100GB) in both spark 1.1 and
spark 1.2.
At the
Hello,
as the original message from Kevin Jung never got accepted to the
mailinglist, I quote it here completely:
Kevin Jung wrote
Hi all,
The size of shuffle write showing in spark web UI is much different when I
execute same spark job on same input data(100GB) in both spark 1.1 and
spark
When can we expect the latest kafka and scala 2.11 support in spark
streaming?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Version-Update-0-8-2-status-tp21573.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi,
I have a cluster setup with three slaves, 4 cores each(12 cores in total).
When I try to run multiple applications, using 4 cores each, only the first
application is running(with 2,1,1 cores used in corresponding slaves). Every
other application is going to WAIT state. Following the solution
Not quiet sure, but one assumption would be that you are not having
sufficient memory to hold that much of data and the process gets busy in
cleaning the garbage and it could be the reason it works when you set
MEMORY_AND_DISK_SER_2.
Thanks
Best Regards
On Mon, Feb 9, 2015 at 8:38 PM, Jong Wook
I have a rowMatrix x and I would like to apply a function to each element of
x.
I was thinking something likex map(u=exp(-u*u)) . How can I do
something like that?
--
View this message in context:
Hi Akhil,
Excuse me, I am trying a random-walk algorithm over a not that large graph(~1GB
raw dataset, including ~5million vertices and ~60million edges) on a cluster
which has 20 machines.
And, the property of each vertex in graph is a hash map, of which size will
increase dramatically
You can look at http://spark.apache.org/docs/1.2.0/job-scheduling.html
I would go with mesos
http://spark.apache.org/docs/1.2.0/running-on-mesos.html
Thanks
Best Regards
On Tue, Feb 10, 2015 at 2:59 PM, matha.harika matha.har...@gmail.com
wrote:
Hi,
I have a cluster setup with three slaves,
Have you been able to confirm this behaviour since posting? Have you tried
this out on multiple workers and viewed their memory consumption?
I'm new to Spark and don't have a cluster to play with at present, and want
to do similar loading from NFS files.
My understanding is that calls to
Yes, I have read it, and am trying to find some way to do that… Thanks :)
Best,
Yifan LI
On 10 Feb 2015, at 12:06, Akhil Das ak...@sigmoidanalytics.com wrote:
Did you have a chance to look at this doc
http://spark.apache.org/docs/1.2.0/tuning.html
Did you have a chance to look at this doc
http://spark.apache.org/docs/1.2.0/tuning.html
Thanks
Best Regards
On Tue, Feb 10, 2015 at 4:13 PM, Yifan LI iamyifa...@gmail.com wrote:
Hi Akhil,
Excuse me, I am trying a random-walk algorithm over a not that large
graph(~1GB raw dataset, including
I believe Kafka 0.8.2 is imminent for master / 1.4.0 -- see SPARK-2808.
Anecdotally I have used Spark 1.2 with Kafka 0.8.2 without problem already.
On Feb 10, 2015 10:53 AM, critikaled isasmani@gmail.com wrote:
When can we expect the latest kafka and scala 2.11 support in spark
streaming?
Hi list,
I have some data with a field name of f:price (it's actually part of a JSON
structure loaded from ElasticSearch via elasticsearch-hadoop connector, but
I don't think that's significant here). I'm struggling to figure out how to
express that in a Spark SQL SELECT statement without
I am getting the following error when I kill the spark driver and restart
the job:
15/02/10 17:31:05 INFO CheckpointReader: Attempting to load checkpoint from
file
hdfs://hdfs-namenode.vagrant:9000/reporting/SampleJob$0.1.0/checkpoint-142358910.bk
15/02/10 17:31:05 WARN CheckpointReader:
What's the signature of your RDD? It looks to be a List which can't be mapped automatically to a document - you are
probably thinking of a tuple or better yet a PairRDD.
Convert your RDDList to a PairRDD and use that instead.
This is a guess - a gist with a simple test/code would make it easier
Just in case you are in San Francisco, we are having a meetup by Prof John
Canny
http://www.meetup.com/SF-Big-Analytics/events/220427049/
Chester
Please take a look at:
examples/scala-2.10/src/main/java/org/apache/spark/examples/streaming/JavaDirectKafkaWordCount.java
which was checked in yesterday.
On Sat, Feb 7, 2015 at 10:53 AM, Eduardo Costa Alfaia
e.costaalf...@unibs.it wrote:
Hi Ted,
I’ve seen the codes, I am using
They're separate in my code, how can I combine them? Here's what I have:
val sparkConf = new SparkConf()
val ssc = new StreamingContext(sparkConf, Seconds(bucketSecs))
val sc = new SparkContext()
On Tue, Feb 10, 2015 at 1:02 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
Is
I'm trying to use a broadcasted dictionary inside a map function and am
consistently getting Java null pointer exceptions. This is inside an IPython
session connected to a standalone spark cluster. I seem to recall being able
to do this before but at the moment I am at a loss as to what to try
You should be able to replace that second line with
val sc = ssc.sparkContext
On Tue, Feb 10, 2015 at 10:04 AM, Jon Gregg jonrgr...@gmail.com wrote:
They're separate in my code, how can I combine them? Here's what I have:
val sparkConf = new SparkConf()
val ssc = new
OK that worked and getting close here ... the job ran successfully for a
bit and I got output for the first couple buckets before getting a
java.lang.Exception: Could not compute split, block input-0-1423593163000
not found error.
So I bumped up the memory at the command line from 2 gb to 5 gb,
INFO scheduler.TaskSetManager: Starting task 2.1 in stage 2.0 (TID 9,
ip-10-80-98-118.ec2.internal, PROCESS_LOCAL, 1025 bytes)
15/02/10 15:54:08 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 2.0
(TID 6) on executor ip-10-80-15-145.ec2.internal:
org.apache.spark.SparkException (Data of type
I am new to spark. I am trying to compile and run a spark application that
requires classes from an (external) jar file on my local machine. If I open
the jar (on ~/Desktop) I can see the missing class in the local jar but when
I run spark I get
NoClassDefFoundError:
Hi,
I'm running Spark on Yarn from an edge node, and the tasks on the run Data
Nodes. My job fails with the Too many open files error once it gets to
groupByKey(). Alternatively I can make it fail immediately if I repartition
the data when I create the RDD.
Where do I need to make sure that
Spark is an framework to do things in parallel very easy, it
definitely will help your cases.
def read_file(path):
lines = open(path).readlines() # bzip2
return lines
filesRDD = sc.parallelize(path_to_files, N)
lines = filesRDD.flatMap(read_file)
Then you could do other transforms on
Hi,
I am trying a very simple registerFunction and it is giving me errors.
I have a parquet file which I register as temp table.
Then I define a UDF.
def toSeconds(timestamp: Long): Long = timestamp/10
sqlContext.registerFunction(toSeconds, toSeconds _)
val result = sqlContext.sql(select
I had a similar use case before. I found:
1. textFile() produced one partition per file. It can result in many
partitions. I found that calling coalecse() without shuffle helped.
2. If you used persist(), count() will do I/O and put the result into
cache. Transformation later did computation out
Hi,
I'm trying to understand how and what the Tableau connector to SparkSQL is
able to access. My understanding is it needs to connect to the
thriftserver and I am not sure how or if it exposes parquet, json,
schemaRDDs, or does it only expose schemas defined in the metastore / hive.
For
Hi all,
I have the following use case:
One job consists of reading from 500-2000 small bzipped logs that are on an
nfs.
(Small means, that the zipped logs are between 0-100KB, average file size is
20KB.)
We read the log lines, do some transformations, and write them to one output
file.
When we
Hi Su,
Out of the box, no. But, I know people integrate it with Spark Streaming to
do real-time visualization. It will take some work though.
Kelvin
On Mon, Feb 9, 2015 at 5:04 PM, Su She suhsheka...@gmail.com wrote:
Hello Everyone,
I was reading this blog post:
sometimes I'm getting this exception:
Traceback (most recent call last):
File /opt/spark-1.2.0-bin-hadoop2.4/python/pyspark/daemon.py, line 162,
in manager
code = worker(sock)
File /opt/spark-1.2.0-bin-hadoop2.4/python/pyspark/daemon.py, line 64,
in worker
outfile.flush()
IOError:
Looks like the latest version 1.2.1 actually does use the configured hadoop
conf. I tested it out and that does resolve my problem.
thanks,
marc
On Tue, Feb 10, 2015 at 10:57 AM, Marc Limotte mslimo...@gmail.com wrote:
Thanks, Akhil. I had high hopes for #2, but tried all and no luck.
I
I am getting the following error when I kill the spark driver and restart
the job:
15/02/10 17:31:05 INFO CheckpointReader: Attempting to load checkpoint from
file
hdfs://hdfs-namenode.vagrant:9000/reporting/SampleJob$0.1.0/checkpoint-142358910.bk
15/02/10 17:31:05 WARN CheckpointReader:
I am getting the following error when I kill the spark driver and restart
the job:
15/02/10 17:31:05 INFO CheckpointReader: Attempting to load checkpoint from
file
hdfs://hdfs-namenode.vagrant:9000/reporting/SampleJob$0.1.0/checkpoint-142358910.bk
15/02/10 17:31:05 WARN CheckpointReader:
Hi ,
I am getting the following error when I am trying query a hive table from
spark shell.
I have placed my hive-site.xml in the spark/conf directory.
Please suggest how to resolve this error.
scala sqlContext.sql(select count(*) from
offers_new).collect().foreach(println)
15/02/11 01:48:01
You may also go through these posts
https://docs.sigmoidanalytics.com/index.php/Spark_Installation
Thanks
Best Regards
On Fri, Feb 6, 2015 at 9:39 PM, King sami kgsam...@gmail.com wrote:
Hi,
I'm new in Spark, I'd like to install Spark with Scala. The aim is to
build a data processing system
See this answer by Josh
http://stackoverflow.com/questions/26692658/cant-connect-from-application-to-the-standalone-cluster
You may also find this post useful
http://mail-archives.apache.org/mod_mbox/spark-user/201407.mbox/%3c7a889b1c-aa14-4cf2-8375-37f9cf827...@gmail.com%3E
Thanks
Best Regards
Hi,
I have built a Sentiment Analyzer using the Naive Bayes model, the
model works fine by learning from a list of 200 movie reviews and correctly
predicting with an accuracy of close to 77% to 80%.
After a while of predicting I get the following stacktrace...
By the way...I have only one
Hi Mike,
I developed a Solution with cassandra and spark, using DSE.
The main difficult is about cassandra, you need to understand very well its
data model and its Query patterns.
Cassandra has better performance than hdfs and it has DR and stronger
availability.
Hdfs is a filesystem, cassandra
Refer this blog
http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/
for step by step installation of Spark on Ubuntu
On 7 February 2015 at 03:12, Matei Zaharia matei.zaha...@gmail.com wrote:
You don't need HDFS or virtual machines to run Spark. You can just
I think we made the binary protocol compatible across all versions, so you
should be fine with using any one of them. 1.2.1 is probably the best since
it is the most recent stable release.
On Tue, Feb 10, 2015 at 8:43 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
I need to use
Its with the timezone actually, you can either use an NTP to maintain
accurate system clock or you can adjust your system time to match with the
AWS one. You can do it as:
telnet s3.amazonaws.com 80
GET / HTTP/1.0
[image: Inline image 1]
Thanks
Best Regards
On Wed, Feb 11, 2015 at 6:43 AM,
I am a little confused here, why do you want to create the tables in hive.
You want to create the tables in spark-sql, right?
If you are not able to find the same tables through tableau then thrift is
connecting to a diffrent metastore than your spark-shell.
One way to specify a metstore to
Seems that the HDFS path for the table dosnt contains any file/data.
Does the metastore contain the right path for HDFS data.
You can find the HDFS path in TBLS in your metastore.
On Wed, Feb 11, 2015 at 12:20 PM, kundan kumar iitr.kun...@gmail.com
wrote:
Hi ,
I am getting the following
BTW what tableau connector are you using?
On Wed, Feb 11, 2015 at 12:55 PM, Arush Kharbanda
ar...@sigmoidanalytics.com wrote:
I am a little confused here, why do you want to create the tables in
hive. You want to create the tables in spark-sql, right?
If you are not able to find the same
Hi,
I need to use branch-1.2 and sometimes master builds of Spark for my
project. However the officially supported Spark version by our Hadoop admin
is only 1.2.0.
So, my question is which version/build of spark-yarn-shuffle.jar should I
use that works for all four versions? (1.2.0, 1.2.1,
You should look at https://github.com/amplab/spark-indexedrdd
On Tue, Feb 10, 2015 at 2:27 PM, Debasish Das debasish.da...@gmail.com
wrote:
Hi Michael,
I want to cache a RDD and define get() and set() operators on it.
Basically like memcached. Is it possible to build a memcached like
Hi Yitong,
It's not as simple as that.
In your very simple example, the only things referenced by the closure
are (i) the input arguments and (ii) a Scala object. So there are no
external references to serialize in that case, just the closure
instance itself - see, there is still something being
Hi Michael,
I want to cache a RDD and define get() and set() operators on it. Basically
like memcached. Is it possible to build a memcached like distributed cache
using Spark SQL ? If not what do you suggest we should use for such
operations...
Thanks.
Deb
On Fri, Jul 18, 2014 at 1:00 PM,
2015-02-06 17:28 GMT+00:00 King sami kgsam...@gmail.com:
The purpose is to build a data processing system for door events. An event
will describe a door unlocking
with a badge system. This event will differentiate unlocking by somebody
from the inside and by somebody
from the outside.
I get this in the driver log:
java.lang.NullPointerException
at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:590)
at
org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(PythonRDD.scala:233)
at
No, zeromq api is not supported in python as of now.
On 5 Feb 2015 21:27, Sasha Kacanski skacan...@gmail.com wrote:
Does pyspark supports zeroMQ?
I see that java does it, but I am not sure for Python?
regards
--
Aleksandar Kacanski
1. Can the connector fetch or query schemaRDD's saved to Parquet or JSON
files? NO
2. Do I need to do something to expose these via hive / metastore other
than creating a table in hive? Create a table in spark sql to expose via
spark sql
3. Does the thriftserver need to be configured to expose
The simple SQL parser doesn't yet support UDFs. Try using a HiveContext.
On Tue, Feb 10, 2015 at 1:44 PM, Mohnish Kodnani mohnish.kodn...@gmail.com
wrote:
Hi,
I am trying a very simple registerFunction and it is giving me errors.
I have a parquet file which I register as temp table.
Then I
Hi Todd,
What you could do is run some SparkSQL commands immediately after the Thrift
server starts up. Or does Tableau have some init SQL commands you could run?
You can actually load data using SQL, such as:
create temporary table people using org.apache.spark.sql.json options (path
The inaugural Spark Summit East, an event to bring the Apache Spark
community together, will be in New York City on March 18, 2015. We are
excited about the growth of Spark and to bring the event to the east coast.
At Spark Summit East you can look forward to hearing from Matei Zaharia,
I see that StreamingContext has a hadoopConfiguration() method, which can
be used like this sample I found:
sc.hadoopConfiguration().set(fs.s3.awsAccessKeyId, XX);
sc.hadoopConfiguration().set(fs.s3.awsSecretAccessKey, XX);
But StreamingContext doesn't have the same thing. I want to
Todd,
I just tried it in bin/spark-sql shell. I created a folder json and just put 2
copies of the same people.json file
This is what I ran:
spark-sql create temporary table people
using org.apache.spark.sql.json
options (path 'examples/src/main/resources/json/*')
It's brave to broadcast 8G pickled data, it will take more than 15G in
memory for each Python worker,
how much memory do you have in executor and driver?
Do you see any other exceptions in driver and executors? Something
related to serialization in JVM.
On Tue, Feb 10, 2015 at 2:16 PM, Rok Roskar
Arush,
As for #2 do you mean something like this from the docs:
// sc is an existing SparkContext.val sqlContext = new
org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value
STRING))sqlContext.sql(LOAD DATA LOCAL INPATH
Hi,
I'm new in Spark. I want to install it on my local machine (Ubunti 12.04)
Could you help me please to install step by step Spark on may machine and
run some Scala programms.
Thanks
Thanks...this is what I was looking for...
It will be great if Ankur can give brief details about it...Basically how
does it contrast with memcached for example...
On Tue, Feb 10, 2015 at 2:32 PM, Michael Armbrust mich...@databricks.com
wrote:
You should look at
Arush,
Thank you will take a look at that approach in the morning. I sort of
figured the answer to #1 was NO and that I would need to do 2 and 3 thanks
for clarifying it for me.
-Todd
On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda ar...@sigmoidanalytics.com
wrote:
1. Can the connector
Also I wanted to run get() and set() from mapPartitions (from spark workers
and not master)...
To be able to do that I think I have to create a separate spark context for
the cache...
But I am not sure how SparkContext from job1 can access SparkContext from
job2 !
On Tue, Feb 10, 2015 at 3:25
For local machine, I dont think there is any to install.. Just unzip and go
to $SPARK_DIR/bin/spark-shell and that will open up a repl...
On Tue, Feb 10, 2015 at 3:25 PM, King sami kgsam...@gmail.com wrote:
Hi,
I'm new in Spark. I want to install it on my local machine (Ubunti 12.04)
Could
Hi Silvio,
Ah, I like that, there is a section in Tableau for Initial SQL to be
executed upon connecting this would fit well there. I guess I will need to
issue a collect(), coalesce(1,true).saveAsTextFile(...) or use
repartition(1), as the file currently is being broken into multiple parts.
Hi Arun,
The limit for the YARN user on the cluster nodes should be all that
matters. What version of Spark are you using? If you can turn on
sort-based shuffle it should solve this problem.
-Sandy
On Tue, Feb 10, 2015 at 1:16 PM, Arun Luthra arun.lut...@gmail.com wrote:
Hi,
I'm running
I am able to get around the problem by doing a map and getting the Event
out of the EventWritable before I do my collect. I think I'll do that for
now.
On Tue, Feb 10, 2015 at 6:04 PM, Corey Nolet cjno...@gmail.com wrote:
I am using an input format to load data from Accumulo [1] in to a Spark
I'm getting this warning when using s3 input:
15/02/11 00:58:37 WARN RestStorageService: Adjusted time offset in response
to
RequestTimeTooSkewed error. Local machine and S3 server disagree on the
time by approximately 0 seconds. Retrying connection.
After that there are tons of 403/forbidden
96 matches
Mail list logo