Can anyone comment on the anticipated date or worse case timeframe for when
Spark 1.0.0 will be released?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/1-0-0-Release-Date-tp5664.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
thanks for reply~~
I had solved the problem and found the reason, because I used the Master
node to upload files to hdfs, this action may take up a lot of Master's
network resources. When I changed to use another computer none of the
cluster to upload these files, it got the correct result.
You need to look at the logs files for yarn. Generally this can be done with
yarn logs -applicationId your_app_id. That only works if you have log
aggregation enabled though. You should be able to see atleast the application
master logs through the yarn resourcemanager web ui. I would try
My configuration is just like this,the slave's node has been configuate,but I
donnot know what's happened to the shark?Can you help me Sir?
shark-env.sh
export SPARK_USER_HOME=/root
export SPARK_MEM=2g
export SCALA_HOME=/root/scala-2.11.0-RC4
export SHARK_MASTER_MEM=1g
export
Hi,
I've been trying to run my newly created spark job on my local master instead
of just runing it using maven and i haven't been able to make it work. My main
issue seems to be related to that error:
14/05/14 09:34:26 ERROR EndpointWriter: AssociationError
Hey Brian,
We've had a fairly stable 1.0 branch for a while now. I've started
voting on the dev list last night... voting can take some time but it
usually wraps up anywhere from a few days to weeks.
However, you can get started right now with the release candidates.
These are likely to be
I don't know whether this would fix the problem. In v0.9, you need
`yarn-standalone` instead of `yarn-cluster`.
See
https://github.com/apache/spark/commit/328c73d037c17440c2a91a6c88b4258fbefa0c08
On Tue, May 13, 2014 at 11:36 PM, Xiangrui Meng men...@gmail.com wrote:
Does v0.9 support
Hi,
I am trying to find a way to fill in missing values in an RDD. The RDD is a
sorted sequence.
For example, (1, 2, 3, 5, 8, 11, ...)
I need to fill in the missing numbers and get (1,2,3,4,5,6,7,8,9,10,11)
One way to do this is to slide and zip
rdd1 = sc.parallelize(List(1, 2, 3, 5, 8, 11,
The issue of console:12: error: not found: type Text is resolved by import
statement.. But still facing issue with imports of VectorWritable.
Mahout math jar is added to classpath as I can check on WebUI as well on shell
scala System.getenv
res1: java.util.Map[String,String] = {TERM=xterm,
We can create standalone Spark application by simply adding
spark-core_2.x to build.sbt/pom.xml and connecting it to Spark master.
We can also compile custom version of Spark (e.g. compiled against Hadoop
2.x) from source and deploy it to cluster manually.
But what is a proper way to use _custom
foreach vs. map isn't the issue. Both require serializing the called
function, so the pickle error would still apply, yes?
And at the moment, I'm just testing. Definitely wouldn't want to log
something for each element, but may want to detect something and log for
SOME elements.
So my question
Hi Xiangrui,
I actually used `yarn-standalone`, sorry for misleading. I did debugging in
the last couple days, and everything up to updateDependency in
executor.scala works. I also checked the file size and md5sum in the
executors, and they are the same as the one in driver. Gonna do more
testing
Would cache() + count() every N iterations work just as well as
checkPoint() + count() to get around this issue?
We're basically trying to get Spark to avoid working on too lengthy a
lineage at once, right?
Nick
On Tue, May 13, 2014 at 12:04 PM, Xiangrui Meng men...@gmail.com wrote:
After
If we do cache() + count() after say every 50 iterations. The whole process
becomes very slow.
I have tried checkpoint() , cache() + count(), saveAsObjectFiles().
Nothing works.
Materializing RDD's lead to drastic decrease in performance if we don't
materialize, we face stackoverflowerror.
On
Is your Spark working .. can you try running spark shell?
http://spark.apache.org/docs/0.9.1/quick-start.html
If spark is working we can move this to shark user list(copied here)
Also I am anything but a sir :)
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
Hi, DB
i've add breeze jars to workers using sc.addJar()
breeze jars include :
breeze-natives_2.10-0.7.jar
breeze-macros_2.10-0.3.jar
breeze-macros_2.10-0.3.1.jar
breeze_2.10-0.8-SNAPSHOT.jar
breeze_2.10-0.7.jar
almost all the jars about breeze i can find, but still
Hi,
Thanks François but this didn't change much. I'm not even sure what this
reference.conf is. It isn't mentioned in any of spark documentation. Should
i have one in my resources ?
Thanks
Laurent
--
View this message in context:
Hi Professor Lin,
On our internal datasets, I am getting accuracy at par with glmnet-R for
sparse feature selection from liblinear. The default mllib based gradient
descent was way off. I did not tune learning rate but I run with varying
lambda. Ths feature selection was weak.
I used liblinear
Hi Xiangrui,
Thanks for the response .. I tried few ways to include mahout-math jar while
launching Spark shell.. but no success.. Can you please point what I am doing
wrong
1. mahout-math.jar exported in CLASSPATH, and PATH
2. Tried Launching Spark Shell by : MASTER=spark://HOSTNAME:PORT
Hi,
Can we override the default file-replication factor while using
saveAsTextFile() to HDFS.
My default repl.factor is 1. But intermediate files that i want to put in
HDFS while running a SPARK query need not be replicated, so is there a way ?
Thanks !
Hi all,
Just 2 questions:
1. Is there a way to automatically re-spawn spark workers? We've
situations where executor OOM causes worker process to be DEAD and it does
not came back automatically.
2. How to dynamically add (or remove) some worker machines to (from) the
cluster? We'd like to
There's an undocumented mode that looks like it simulates a cluster:
SparkContext.scala:
// Regular expression for simulating a Spark cluster of [N, cores,
memory] locally
val LOCAL_CLUSTER_REGEX =
local-cluster\[\s*([0-9]+)\s*,\s*([0-9]+)\s*,\s*([0-9]+)\s*].r
can you running your tests
Have you actually found this to be true? I have found Spark local mode
to be quite good about blowing up if there is something non-serializable
and so my unit tests have been great for detecting this. I have never
seen something that worked in local mode that didn't work on the cluster
i have some settings that i think are relevant for my application. they are
spark.akka settings so i assume they are relevant for both executors and my
driver program.
i used to do:
SPARK_JAVA_OPTS=-Dspark.akka.frameSize=1
now this is deprecated. the alternatives mentioned are:
* some
I have a similar objective to use maven as our build tool and ran into the
same issue.
The idea is that your config file is actually not found, your fat jar
assembly does not contain the reference.conf resource.
I added the following the resources section of my pom to make it work :
resource
Hey all, trying to set up a pretty simple streaming app and getting some
weird behavior.
First, a non-streaming job that works fine: I'm trying to pull out lines
of a log file that match a regex, for which I've set up a function:
def getRequestDoc(s: String):
String = {
hi,all
When i run ZeroMQWordCount example on cluster, the worker log says: Caused
by: com.typesafe.config.ConfigException$Missing: No configuration setting
found for key 'akka.zeromq'
Actually, i can see that the reference.conf in
spark-examples-assembly-0.9.1.jar contains below
I used spark-submit to run the MovieLensALS example from the examples
module.
here is the command:
$spark-submit --master local
/home/phoenix/spark/spark-dev/examples/target/scala-2.10/spark-examples-1.0.0-SNAPSHOT-hadoop1.0.4.jar
--class org.apache.spark.examples.mllib.MovieLensALS u.data
also,
Hi all,
My spark code is running on yarn-standalone.
the last three lines of the code as below,
val result = model.predict(prdctpairs)
result.map(x =
x.user+,+x.product+,+x.rating).saveAsTextFile(output)
sc.stop()
the same code, sometimes be able to run successfully and could give
Hi Jacob,
Thanks for the help answer on the docker question. Have you already
experimented with the new link feature in Docker? That does not help the
HDFS issue as the DataNode needs the namenode and vice-versa but it does
facilitate simpler client-server interactions.
My issue described at
Hi Cheney
Which mode you are running? YARN or standalone?
I got the same exception when I ran spark on YARN.
On Tue, May 6, 2014 at 10:06 PM, Cheney Sun sun.che...@gmail.com wrote:
Hi Nan,
In worker's log, I see the following exception thrown when try to launch
on executor. (The SPARK_HOME
Hello Sophia
You are only providing the Spark jar here (nevertheless, a spark jar that
contains hadoop libraries in it, but that is not sufficient). Where is your
hadoop installed? (Most probably: /usr/lib/hadoop/*)
So you need to add that to your class path (by using -cp) I guess. Let me
know
Hi,
I've wanted to play with Spark. I wanted to fast track things and just use
one of the vendor's express VMs. I've tried Cloudera CDH 5.0 and
Hortonworks HDP 2.1.
I've not written down all of my issues, but for certain, when I try to run
spark-shell it doesn't work. Cloudera seems to crash,
J'ai oublié la plupart de mes français.
You can download a Spark binary or build from source.
This is how I build from source:
Download and install sbt:
http://www.scala-sbt.org/
I installed in C:\sbt
Check C:\sbt\conf\sbtconfig.txt, use these options:
-Xmx512M
-XX:MaxPermSize=256m
34 matches
Mail list logo