Hi, TD
i tried on v1.0.0-rc3 and still got the error
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/sbt-run-with-spark-ContextCleaner-ERROR-tp5304p5421.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thank you for your prompt reply.
Regards,
prabeesh
On Tue, May 6, 2014 at 11:44 AM, Mayur Rustagi mayur.rust...@gmail.comwrote:
All three have different usecases. If you are looking for more of a
warehouse you are better off with Shark.
SparkSQL is a way to query regular data in sql like
Add export SPARK_JAVA_OPTS=“-Xss16m” to conf/spark-env.sh. Then it should apply
to the executor.
Matei
On May 5, 2014, at 2:20 PM, Andrea Esposito and1...@gmail.com wrote:
Hi there,
i'm doing an iterative algorithm and sometimes i ended up with
StackOverflowError, doesn't matter if i do
Hi all,
I have make HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which
contains the (client side) configuration files for the hadoop cluster.
The command to launch the YARN Client which I run is like this:
#
Hi Jacob,
I agree, we need to address both driver and workers bidirectionally.
If the subnet is isolated and self-contained, only limited ports are configured
to access the driver via a dedicated gateway for the user, could you explain
your concern? or what doesn't satisfy the security criteria?
Hi all,
#./sbt/sbt assembly
Launching sbt from sbt/sbt-launch-0.12.4.jar
Invalid or corrupt jarfile sbt/sbt-launch-0.12.4.jar
Why cannot I run sbt well?
Best regards,
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-run-sbt-tp5429.html
Sent from
Ok Andrew,
Thanks
I sent informations of test with 8 worker and the gap is grown up.
On May 4, 2014, at 2:31, Andrew Ash and...@andrewash.com wrote:
From the logs, I see that the print() starts printing stuff 10 seconds
after the context is started. And that 10 seconds is taken by the
Thanks Koert, very useful!
On Tue, Apr 29, 2014 at 6:41 PM, Koert Kuipers ko...@tresata.com wrote:
SparkContext.getRDDStorageInfo
On Tue, Apr 29, 2014 at 12:34 PM, Andras Nemeth
andras.nem...@lynxanalytics.com wrote:
Hi,
Is it possible to know from code about an RDD if it is cached,
Thanks all for helping.
Following the Earthson's tip i resolved. I have to report that if you
materialized the RDD and after you try to checkpoint it the operation
doesn't perform.
newRdd = oldRdd.map(myFun).persist(myStorageLevel)
newRdd.foreach(x = myFunLogic(x)) // Here materialized for other
Hi there,
sorry if i'm posting a lot lately.
i'm trying to add the KryoSerializer but i receive this exception:
2014 - 05 - 06 11: 45: 23 WARN TaskSetManager: 62 - Loss was due to
java.io.EOFException
java.io.EOFException
at
Hi all,
[root@sophia spark-0.9.1]#
SPARK_JAR=.assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar
./bin/spark-class org.apache.spark.deploy.yarn.Client\--jar
examples/target/scala-2.10/spark-examples_2.10-assembly-0.9.1.jar\--class
org.apache.spark.examples.SparkPi\--args
i found that the small broadcast variable always took about 10s, not 5s or
else.
If there is some property/conf(which is default 10) that control the
timeout?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/about-broadcast-tp5416p5439.html
Sent from the
Mostly your JAVA_HOME variable is wrong. Can you configure that in sparkenv
file.
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi
On Tue, May 6, 2014 at 5:53 PM, Sophia sln-1...@163.com wrote:
Hi all,
[root@sophia
I have modified it in spark-env.sh,but it turns out that it does not work.So
coufused.
Best Regards
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/If-it-due-to-my-file-has-been-breakdown-tp5438p5442.html
Sent from the Apache Spark User List mailing list
Hi
I just read an article [1] about Spark, CDH5 and Java 8 but did not get
exactly how Spark can run Java 8 on a YARN cluster at runtime. Is Spark
using a separate JVM that run on data nodes or is it reusing the YARN JVM
runtime somehow, like hadoop1?
CDH5 only supports Java 7 [2] as far as I
Howdy,
You might find the discussion Andrew and I have been having about Docker
and network security [1] applicable.
Also, I posted an answer [2] to your stackoverflow question.
[1]
Howdy Scott,
Please see the discussions about securing the Spark network [1] [2].
In a nut shell, Spark opens up a couple of well known ports. And,then the
workers and the shell open up dynamic ports for each job. These dynamic
ports make securing the Spark network difficult.
Jacob
[1]
Hi Kristoffer,
You're correct that CDH5 only supports up to Java 7 at the moment. But
Yarn apps do not run in the same JVM as Yarn itself (and I believe MR1
doesn't either), so it might be possible to pass arguments in a way
that tells Yarn to launch the application master / executors with the
I think the distinction there might be they never said they ran that code
under CDH5, just that spark supports it and spark runs under CDH5. Not that
you can use these features while running under CDH5.
They could use mesos or the standalone scheduler to run them
On Tue, May 6, 2014 at 6:16 AM,
Hi,
I've a `no space left on device` exception when pulling some 22GB data from
s3 block storage to the ephemeral HDFS. The cluster is on EC2 using
spark-ec2 script with 4 m1.large.
The code is basically:
val in = sc.textFile(s3://...)
in.saveAsTextFile(hdfs://...)
Spark creates 750 input
I guess it's due to missing documentation and quite complicated setup.
Continuous integration would be nice!
Btw. is it possible to use spark as a shared library and not to fetch spark
tarball for each task?
Do you point SPARK_EXECUTOR_URI to HDFS url?
--
View this message in context:
Please check JAVA_HOME. Usually it should point to /usr/java/default on
CentOS/Linux.
or FYI: http://stackoverflow.com/questions/1117398/java-home-directory
Date: Tue, 6 May 2014 00:23:02 -0700
From: sln-1...@163.com
To: u...@spark.incubator.apache.org
Subject: run spark0.9.1 on yarn with
I wonder why is your / is full. Try clearing out /tmp and also make sure in
the spark-env.sh you have put SPARK_JAVA_OPTS+=
-Dspark.local.dir=/mnt/spark
Thanks
Best Regards
On Tue, May 6, 2014 at 9:35 PM, Han JU ju.han.fe...@gmail.com wrote:
Hi,
I've a `no space left on device` exception
After some investigation, I found out that there's lots of temp files under
/tmp/hadoop-root/s3/
But this is strange since in both conf files,
~/ephemeral-hdfs/conf/core-site.xml and ~/spark/conf/core-site.xml, the
setting `hadoop.tmp.dir` is set to `/mnt/ephemeral-hdfs/`. Why spark jobs
still
Java 8 support is a feature in Spark, but vendors need to decide for themselves
when they’d like support Java 8 commercially. You can still run Spark on Java 7
or 6 without taking advantage of the new features (indeed our builds are always
against Java 6).
Matei
On May 6, 2014, at 8:59 AM,
Hi Spark users,
Do you guys plan to go the spark summit? Can you recommend any hotel near
the conference? I'm not familiar with the area.
Thanks!
Jerry
What should I do if I want to log something as part of a task?
This is what I tried. To set up a logger, I followed the advice here:
http://py4j.sourceforge.net/faq.html#how-to-turn-logging-on-off
logger = logging.getLogger(py4j)
logger.setLevel(logging.INFO)
I think you're looking for
RDD.foreach()http://spark.apache.org/docs/latest/api/pyspark/pyspark.rdd.RDD-class.html#foreach
.
According to the programming
guidehttp://spark.apache.org/docs/latest/scala-programming-guide.html
:
Run a function func on each element of the dataset. This is usually
Hi there,
Why can¹t I seem to kick the executor memory higher? See below from EC2
deployment using m1.large
And in the spark-env.sh
export SPARK_MEM=6154m
And in the spark context
sconf.setExecutorEnv(spark.executor.memory, 4g²)
Cheers
- Ian
If you're using standalone mode, you need to make sure the Spark Workers
know about the extra memory. This can be configured in spark-env.sh on the
workers as
export SPARK_WORKER_MEMORY=4g
On Tue, May 6, 2014 at 5:29 PM, Ian Ferreira ianferre...@hotmail.comwrote:
Hi there,
Why can’t I seem
Try using s3n instead of s3
Em 06/05/2014 21:19, kamatsuoka ken...@gmail.com escreveu:
I have a Spark app that writes out a file, s3://mybucket/mydir/myfile.txt.
Behind the scenes, the S3 driver creates a bunch of files like
s3://mybucket//mydir/myfile.txt/part-, as well as the block
I'd like to override the logic of comparing keys for equality in
groupByKey. Kinda like how combineByKey allows you to pass in the combining
logic for values, I'd like to do the same for keys.
My code looks like this:
val res = rdd.groupBy(myPartitioner)
Here, rdd is of type RDD[(MyKey,
32 matches
Mail list logo