Can someone with experience briefly share or summarize the differences
between Ignite and Spark? Are they complementary? Totally unrelated?
Overlapping? Seems like ignite has reached version 1.0, I have never heard
of it until a few days ago and given what is advertised, it sounds pretty
-
From: Jay Vyas [mailto:jayunit100.apa...@gmail.com]
Sent: Thursday, February 26, 2015 3:40 PM
To: Sean Owen
Cc: Ognen Duzlevski; user@spark.apache.org
Subject: Re: Apache Ignite vs Apache Spark
-https://wiki.apache.org/incubator/IgniteProposal has I think been updated
recently and has a good
On Sat, Feb 21, 2015 at 8:54 AM, Deep Pradhan pradhandeep1...@gmail.com
wrote:
No, I am talking about some work parallel to prediction works that are
done on GPUs. Like say, given the data for smaller number of nodes in a
Spark cluster, the prediction needs to be done about the time that the
On Sun, Nov 23, 2014 at 1:03 PM, Ashish Rangole arang...@gmail.com wrote:
Java or Scala : I knew Java already yet I learnt Scala when I came across
Spark. As others have said, you can get started with a little bit of Scala
and learn more as you progress. Once you have started using Scala for a
Ashic,
Thanks for your email.
Two things:
1. I think a whole lot of data scientists and other people would love
it if they could just fire off jobs from their laptops. It is, in my
opinion, a common desired use case.
2. Did anyone actually get the Ooyala job server to work? I asked that
in mind there is a non-trivial amount of traffic between the
driver and cluster. It's not something I would do by default, running
the driver so remotely. With enough ports open it should work though.
On Sun, Sep 7, 2014 at 7:05 PM, Ognen Duzlevski
ognen.duzlev...@gmail.com wrote:
Horacio,
Thanks
On 9/7/2014 7:27 AM, Tomer Benyamini wrote:
2. What should I do to increase the quota? Should I bring down the
existing slaves and upgrade to ones with more storage? Is there a way
to add disks to existing slaves? I'm using the default m1.large slaves
set up using the spark-ec2 script.
Take a
I keep getting below reply every time I send a message to the Spark user
list? Can this person be taken off the list by powers that be?
Thanks!
Ognen
Forwarded Message
Subject: DELIVERY FAILURE: Error transferring to
QCMBSJ601.HERMES.SI.SOCGEN; Maximum hop count exceeded.
Have you actually tested this?
I have two instances, one is standalone master and the other one just
has spark installed, same versions of spark (1.0.0).
The security group on the master allows all (0-65535) TCP and UDP
traffic from the other machine and the other machine allows all TCP/UDP
Horacio,
Thanks, I have not tried that, however, I am not after security right
now - I am just wondering why something so obvious won't work ;)
Ognen
On 9/7/2014 12:38 PM, Horacio G. de Oro wrote:
Have you tryied with ssh? It will be much secure (only 1 port open),
and you'll be able to run
Is this possible? If i have a cluster set up on EC2 and I want to run
spark-shell --master my master IP on EC2:7077 from my home computer -
is this possible at all or am I wasting my time ;)? I am seeing a
connection timeout when I try it.
Thanks!
Ognen
On 9/5/2014 3:27 PM, anthonyjschu...@gmail.com wrote:
I think that should be possible. Make sure spark is installed on your local
machine and is the same version as on the cluster.
It is the same version, I can telnet to master:7077 but when I run the
spark-shell it times out.
That is the command I ran and it still times out.Besides 7077 is there
any other port that needs to be open?
Thanks!
Ognen
On 9/5/2014 4:10 PM, qihong wrote:
the command should be spark-shell --master spark://master ip on EC2:7077.
--
View this message in context:
Ah. So there is some kind of a back and forth going on. Thanks!
Ognen
On 9/5/2014 5:34 PM, qihong wrote:
Since you are using your home computer, so it's probably not reachable by EC2
from internet.
You can try to set spark.driver.host to your WAN ip, spark.driver.port
to a fixed port in
Hello all,
Can anyone offer any insight on the below?
Both are legal Spark but the first one works, the latter one does not.
They both work on a local machine but in a standalone cluster the one
with countByValue fails.
Thanks!
Ognen
On 7/15/14, 2:23 PM, Ognen Duzlevski wrote:
Hello,
I
Hello,
I am curious about something:
val result = for {
(dt,evrdd) - evrdds
val ct = evrdd.count
} yield (dt-ct)
works.
val result = for {
(dt,evrdd) - evrdds
val ct = evrdd.countByValue
} yield (dt-ct)
does not work. I get:
14/07/15 16:46:33 WARN
the shell, I
don’t have any more pointers for you. :(
On Sun, Jul 13, 2014 at 12:57 PM, Ognen Duzlevski
ognen.duzlev...@gmail.com mailto:ognen.duzlev...@gmail.com wrote:
Nicholas,
Thanks!
How do I make spark assemble against a local version of Hadoop?
I have 2.4.1 running
Hello,
I have been trying to play with the Google ngram dataset provided by
Amazon in form of LZO compressed files.
I am having trouble understanding what is going on ;). I have added the
compression jar and native library to the underlying Hadoop/HDFS
installation, restarted the name node
[org.apache.hadoop.io.Text])
|
On a side note, here’s a related JIRA issue: SPARK-2394: Make it
easier to read LZO-compressed files from EC2 clusters
https://issues.apache.org/jira/browse/SPARK-2394
Nick
On Sun, Jul 13, 2014 at 10:49 AM, Ognen Duzlevski
ognen.duzlev...@gmail.com mailto:ognen.duzlev
I only ran HDFS on the same nodes as Spark and that worked out great
performance and robustness wise. However, I did not run Hadoop itself to
do any computations/jobs on the same nodes. My expectation is that if
you actually ran both at the same time with your configuration, the
performance
How exciting! Congratulations! :-)
Ognen
On 5/30/14, 5:12 AM, Patrick Wendell wrote:
I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
is a milestone release as the first in the 1.0 line of releases,
providing API stability for Spark's core interfaces.
Spark 1.0.0 is
In the spirit of everything being bigger and better in TX ;) = if
anyone is in Austin and interested in meeting up over Spark - contact
me! There seems to be a Spark meetup group in Austin that has never met
and my initial email to organize the first gathering was never acknowledged.
Ognen
On
There is also this quote from the Tuning guide
(http://spark.incubator.apache.org/docs/latest/tuning.html):
Finally, if you don't register your classes, Kryo will still work, but
it will have to store the full class name with each object, which is
wasteful.
It implies that you don't really
Look at the tuning guide on Spark's webpage for strategies to cope with
this.
I have run into quite a few memory issues like these, some are resolved
by changing the StorageLevel strategy and employing things like Kryo,
some are solved by specifying the number of tasks to break down a given
Have you looked at the individual nodes logs? Can you post a bit more of
the exception's output?
On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote:
Hi all,
I got java.lang.ClassNotFoundException even with addJar called. The
jar file is present in each node.
I use the version of spark from
On Wed, Mar 26, 2014 at 3:34 PM, Ognen Duzlevski
og...@plainvanillagames.com mailto:og...@plainvanillagames.com
wrote:
Have you looked at the individual nodes logs? Can you post a
bit more of the exception's output?
On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote
spark.executor.memory. Just curious if I did something wrong.
On Mon, Mar 24, 2014 at 7:48 PM, Ognen Duzlevski
og...@plainvanillagames.com wrote:
Just so I can close this thread (in case anyone else runs into this stuff) -
I did sleep through the basics of Spark ;). The answer on why my job
? If so, that's good to know because it's definitely counter
intuitive.
On Sun, Mar 23, 2014 at 8:36 PM, Ognen Duzlevski
og...@nengoiksvelzud.com wrote:
I would love to work on this (and other) stuff if I can bother someone with
questions offline or on a dev mailing list.
Ognen
On 3/23/14, 10:04 PM
the rest of the slaves+master) and increasing.
Ognen
On 3/24/14, 7:00 AM, Ognen Duzlevski wrote:
Patrick, correct. I have a 16 node cluster. On 14 machines out of 16,
the inode usage was about 50%. On two of the slaves, one had inode
usage of 96% and on the other it was 100%. When i went
Diana,
Anywhere on the filesystem you have read/write access (you need not be
in your spark home directory):
mkdir myproject
cd myproject
mkdir project
mkdir target
mkdir -p src/main/scala
cp $mypath/$mymysource.scala src/main/scala/
cp $mypath/myproject.sbt .
Make sure that myproject.sbt
/usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or
directory
Our attempt to download sbt locally to sbt/sbt-launch-.jar failed.
Please install sbt manually from http://www.scala-sbt.org/
On Mon, Mar 24, 2014 at 4:25 PM, Ognen Duzlevski
og...@plainvanillagames.com mailto:og
, to minimize network traffic. It's how Hadoop
works, too.)
On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski
og...@nengoiksvelzud.com mailto:og...@nengoiksvelzud.com wrote:
Is
someRDD.saveAsTextFile(hdfs://ip:port/path/final_filename.txt)
supposed to work? Meaning, can I save files
.)
(Presumably it does this because it allows each partition to be saved
on the local disk, to minimize network traffic. It's how Hadoop
works, too.)
On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski
og...@nengoiksvelzud.com mailto:og...@nengoiksvelzud.com wrote
Hello,
I have a weird error showing up when I run a job on my Spark cluster.
The version of spark is 0.9 and I have 3+ GB free on the disk when this
error shows up. Any ideas what I should be looking for?
[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
167.0:3 failed
On 3/23/14, 5:49 PM, Matei Zaharia wrote:
You can set spark.local.dir to put this data somewhere other than /tmp
if /tmp is full. Actually it’s recommended to have multiple local
disks and set to to a comma-separated list of directories, one per disk.
Matei, does the number of tasks/partitions
partition is particularly small.
You might look at the actual executors' logs, as it's possible that
this error was caused by an earlier exception, such as too many open
files.
On Sun, Mar 23, 2014 at 4:46 PM, Ognen Duzlevski
og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote
(and sorry for the noise)!
Ognen
On 3/23/14, 9:52 PM, Ognen Duzlevski wrote:
Aaron, thanks for replying. I am very much puzzled as to what is going
on. A job that used to run on the same cluster is failing with this
mysterious message about not having enough disk space when in fact I
can see
which is not
on our current roadmap for state cleanup (cleaning up data which was
not fully cleaned up from a crashed process).
On Sun, Mar 23, 2014 at 7:57 PM, Ognen Duzlevski
og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote:
Bleh, strike that, one of my slaves
Hello,
I have a task that runs on a week's worth of data (let's say) and
produces a Set of tuples such as Set[(String,Long)] (essentially output
of countByValue.toMap)
I want to produce 4 sets, one each for a different week and run an
intersection of the 4 sets.
I have the sequential
On 3/18/14, 4:49 AM, dmpou...@gmail.com wrote:
On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia wrote:
Is there a reason for spark using the older akka?
On Sun, Mar 2, 2014 at 1:53 PM, 1esha alexey.r...@gmail.com wrote:
The problem is in akka remote. It contains files compiled
Hello,
Is there anything special about calling functions that parse json lines
from filter?
I have code that looks like this:
jsonMatches(line:String):Boolean = {
take a line in json format
val jline=parse(line)
val je = jline \ event
if (je != JNothing je.values.toString ==
.
—
p...@mult.ifario.us mailto:p...@mult.ifario.us | Multifarious, Inc. |
http://mult.ifario.us/
On Thu, Mar 13, 2014 at 8:04 AM, Ognen Duzlevski
og...@nengoiksvelzud.com mailto:og...@nengoiksvelzud.com wrote:
Hello,
Is there anything special about calling functions that parse json
| Multifarious, Inc. |
http://mult.ifario.us/
On Thu, Mar 13, 2014 at 9:20 AM, Ognen Duzlevski
og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote:
Hmm.
The whole thing is packaged in a .jar file and I execute .addJar
on the SparkContext. My expectation is that the whole
I even tried this:
def jsonMatches(line:String):Boolean = true
It is still failing with the same error.
Ognen
On 3/13/14, 11:45 AM, Ognen Duzlevski wrote:
I must be really dense! :)
Here is the most simplified version of the code, I removed a bunch of
stuff and hard-coded the event
OK, problem solved.
Interesting thing - I separated the jsonMatches function below and put
it in as a method to a separate file/object. Once done that way, it all
serializes and works.
Ognen
On 3/13/14, 11:52 AM, Ognen Duzlevski wrote:
I even tried this:
def jsonMatches(line:String
Are you using it with HDFS? What version of Hadoop? 1.0.4?
Ognen
On 3/10/14, 8:49 PM, abhinav chowdary wrote:
for any one who is interested to know about job server from Ooyala..
we started using it recently and been working great so far..
On Feb 25, 2014 9:23 PM, Ognen Duzlevski og
What is wrong with this code?
A condensed set of this code works in the spark-shell.
It does not work when deployed via a jar.
def
calcSimpleRetention(start:String,end:String,event1:String,event2:String):List[Double]
= {
val spd = new PipelineDate(start)
val epd = new
Strike that. Figured it out. Don't you just hate it when you fire off an
email and you figure it out as it is being sent? ;)
Ognen
On 3/7/14, 12:41 PM, Ognen Duzlevski wrote:
What is wrong with this code?
A condensed set of this code works in the spark-shell.
It does not work when deployed
@mayur_rustagi https://twitter.com/mayur_rustagi
On Fri, Mar 7, 2014 at 10:43 AM, Ognen Duzlevski
og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote:
Strike that. Figured it out. Don't you just hate it when you fire
off an email and you figure it out as it is being
Nice, thanks :)
Ognen
On 3/7/14, 2:48 PM, Brian O'Neill wrote:
FWIW - I posted some notes to help people get started quickly with
Spark on C*.
http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html
(tnx again to Rohit and team for all of their help)
-brian
--
Brian
://twitter.com/mayur_rustagi
On Thu, Mar 6, 2014 at 9:50 PM, Ognen Duzlevski
og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote:
It looks like the problem is in the filter task - is there
anything special about filter()?
I have removed the filter line from the loops just
Hello,
What is the general approach people take when trying to do analysis
across multiple large files where the data to be extracted from a
successive file depends on the data extracted from a previous file or
set of files?
For example:
I have the following: a group of HDFS files each
It looks like the problem is in the filter task - is there anything
special about filter()?
I have removed the filter line from the loops just to see if things will
work and they do.
Anyone has any ideas?
Thanks!
Ognen
On 3/6/14, 9:39 PM, Ognen Duzlevski wrote:
Hello,
What is the general
Rob,
I have seen this too. I have 16 nodes in my spark cluster and for some
reason (after app failures) one of the workers will go offline. I will
ssh to the machine in question and find that the java process is running
but for some reason the master is not noticing this. I have not had the
Deb,
On 3/4/14, 9:02 AM, Debasish Das wrote:
Hi Ognen,
Any particular reason of choosing scalatra over options like play or
spray ?
Is scalatra much better in serving apis or is it due to similarity
with ruby's sinatra ?
Did you try the other options and then pick scalatra ?
Not really.
Can someone point me to a simple, short code example of creating a basic
Actor that gets a context and runs an operation such as .textFile.count?
I am trying to figure out how to create just a basic actor that gets a
message like this:
case class Msg(filename:String, ctx: SparkContext)
and
56 matches
Mail list logo