Apache Ignite vs Apache Spark

2015-02-26 Thread Ognen Duzlevski
Can someone with experience briefly share or summarize the differences between Ignite and Spark? Are they complementary? Totally unrelated? Overlapping? Seems like ignite has reached version 1.0, I have never heard of it until a few days ago and given what is advertised, it sounds pretty

Re: Apache Ignite vs Apache Spark

2015-02-26 Thread Ognen Duzlevski
- From: Jay Vyas [mailto:jayunit100.apa...@gmail.com] Sent: Thursday, February 26, 2015 3:40 PM To: Sean Owen Cc: Ognen Duzlevski; user@spark.apache.org Subject: Re: Apache Ignite vs Apache Spark -https://wiki.apache.org/incubator/IgniteProposal has I think been updated recently and has a good

Re: Perf Prediction

2015-02-21 Thread Ognen Duzlevski
On Sat, Feb 21, 2015 at 8:54 AM, Deep Pradhan pradhandeep1...@gmail.com wrote: No, I am talking about some work parallel to prediction works that are done on GPUs. Like say, given the data for smaller number of nodes in a Spark cluster, the prediction needs to be done about the time that the

Re: Spark or MR, Scala or Java?

2014-11-23 Thread Ognen Duzlevski
On Sun, Nov 23, 2014 at 1:03 PM, Ashish Rangole arang...@gmail.com wrote: Java or Scala : I knew Java already yet I learnt Scala when I came across Spark. As others have said, you can get started with a little bit of Scala and learn more as you progress. Once you have started using Scala for a

Re: Submitting Python Applications from Remote to Master

2014-11-15 Thread Ognen Duzlevski
Ashic, Thanks for your email. Two things: 1. I think a whole lot of data scientists and other people would love it if they could just fire off jobs from their laptops. It is, in my opinion, a common desired use case. 2. Did anyone actually get the Ooyala job server to work? I asked that

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-08 Thread Ognen Duzlevski
in mind there is a non-trivial amount of traffic between the driver and cluster. It's not something I would do by default, running the driver so remotely. With enough ports open it should work though. On Sun, Sep 7, 2014 at 7:05 PM, Ognen Duzlevski ognen.duzlev...@gmail.com wrote: Horacio, Thanks

Re: Adding quota to the ephemeral hdfs on a standalone spark cluster on ec2

2014-09-07 Thread Ognen Duzlevski
On 9/7/2014 7:27 AM, Tomer Benyamini wrote: 2. What should I do to increase the quota? Should I bring down the existing slaves and upgrade to ones with more storage? Is there a way to add disks to existing slaves? I'm using the default m1.large slaves set up using the spark-ec2 script. Take a

Fwd: DELIVERY FAILURE: Error transferring to QCMBSJ601.HERMES.SI.SOCGEN; Maximum hop count exceeded. Message probably in a routing loop.

2014-09-07 Thread Ognen Duzlevski
I keep getting below reply every time I send a message to the Spark user list? Can this person be taken off the list by powers that be? Thanks! Ognen Forwarded Message Subject: DELIVERY FAILURE: Error transferring to QCMBSJ601.HERMES.SI.SOCGEN; Maximum hop count exceeded.

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-07 Thread Ognen Duzlevski
Have you actually tested this? I have two instances, one is standalone master and the other one just has spark installed, same versions of spark (1.0.0). The security group on the master allows all (0-65535) TCP and UDP traffic from the other machine and the other machine allows all TCP/UDP

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-07 Thread Ognen Duzlevski
Horacio, Thanks, I have not tried that, however, I am not after security right now - I am just wondering why something so obvious won't work ;) Ognen On 9/7/2014 12:38 PM, Horacio G. de Oro wrote: Have you tryied with ssh? It will be much secure (only 1 port open), and you'll be able to run

Running spark-shell (or queries) over the network (not from master)

2014-09-05 Thread Ognen Duzlevski
Is this possible? If i have a cluster set up on EC2 and I want to run spark-shell --master my master IP on EC2:7077 from my home computer - is this possible at all or am I wasting my time ;)? I am seeing a connection timeout when I try it. Thanks! Ognen

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-05 Thread Ognen Duzlevski
On 9/5/2014 3:27 PM, anthonyjschu...@gmail.com wrote: I think that should be possible. Make sure spark is installed on your local machine and is the same version as on the cluster. It is the same version, I can telnet to master:7077 but when I run the spark-shell it times out.

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-05 Thread Ognen Duzlevski
That is the command I ran and it still times out.Besides 7077 is there any other port that needs to be open? Thanks! Ognen On 9/5/2014 4:10 PM, qihong wrote: the command should be spark-shell --master spark://master ip on EC2:7077. -- View this message in context:

Re: Running spark-shell (or queries) over the network (not from master)

2014-09-05 Thread Ognen Duzlevski
Ah. So there is some kind of a back and forth going on. Thanks! Ognen On 9/5/2014 5:34 PM, qihong wrote: Since you are using your home computer, so it's probably not reachable by EC2 from internet. You can try to set spark.driver.host to your WAN ip, spark.driver.port to a fixed port in

Re: count vs countByValue in for/yield

2014-07-16 Thread Ognen Duzlevski
Hello all, Can anyone offer any insight on the below? Both are legal Spark but the first one works, the latter one does not. They both work on a local machine but in a standalone cluster the one with countByValue fails. Thanks! Ognen On 7/15/14, 2:23 PM, Ognen Duzlevski wrote: Hello, I

count vs countByValue in for/yield

2014-07-15 Thread Ognen Duzlevski
Hello, I am curious about something: val result = for { (dt,evrdd) - evrdds val ct = evrdd.count } yield (dt-ct) works. val result = for { (dt,evrdd) - evrdds val ct = evrdd.countByValue } yield (dt-ct) does not work. I get: 14/07/15 16:46:33 WARN

Re: Problem reading in LZO compressed files

2014-07-14 Thread Ognen Duzlevski
the shell, I don’t have any more pointers for you. :( ​ On Sun, Jul 13, 2014 at 12:57 PM, Ognen Duzlevski ognen.duzlev...@gmail.com mailto:ognen.duzlev...@gmail.com wrote: Nicholas, Thanks! How do I make spark assemble against a local version of Hadoop? I have 2.4.1 running

Problem reading in LZO compressed files

2014-07-13 Thread Ognen Duzlevski
Hello, I have been trying to play with the Google ngram dataset provided by Amazon in form of LZO compressed files. I am having trouble understanding what is going on ;). I have added the compression jar and native library to the underlying Hadoop/HDFS installation, restarted the name node

Re: Problem reading in LZO compressed files

2014-07-13 Thread Ognen Duzlevski
[org.apache.hadoop.io.Text]) | On a side note, here’s a related JIRA issue: SPARK-2394: Make it easier to read LZO-compressed files from EC2 clusters https://issues.apache.org/jira/browse/SPARK-2394 Nick ​ On Sun, Jul 13, 2014 at 10:49 AM, Ognen Duzlevski ognen.duzlev...@gmail.com mailto:ognen.duzlev

Re: Running Spark alongside Hadoop

2014-06-20 Thread Ognen Duzlevski
I only ran HDFS on the same nodes as Spark and that worked out great performance and robustness wise. However, I did not run Hadoop itself to do any computations/jobs on the same nodes. My expectation is that if you actually ran both at the same time with your configuration, the performance

Re: Announcing Spark 1.0.0

2014-05-30 Thread Ognen Duzlevski
How exciting! Congratulations! :-) Ognen On 5/30/14, 5:12 AM, Patrick Wendell wrote: I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0 is a milestone release as the first in the 1.0 line of releases, providing API stability for Spark's core interfaces. Spark 1.0.0 is

Calling Spark enthusiasts in Austin, TX

2014-03-31 Thread Ognen Duzlevski
In the spirit of everything being bigger and better in TX ;) = if anyone is in Austin and interested in meeting up over Spark - contact me! There seems to be a Spark meetup group in Austin that has never met and my initial email to organize the first gathering was never acknowledged. Ognen On

Re: Do all classes involving RDD operation need to be registered?

2014-03-28 Thread Ognen Duzlevski
There is also this quote from the Tuning guide (http://spark.incubator.apache.org/docs/latest/tuning.html): Finally, if you don't register your classes, Kryo will still work, but it will have to store the full class name with each object, which is wasteful. It implies that you don't really

Re: GC overhead limit exceeded

2014-03-27 Thread Ognen Duzlevski
Look at the tuning guide on Spark's webpage for strategies to cope with this. I have run into quite a few memory issues like these, some are resolved by changing the StorageLevel strategy and employing things like Kryo, some are solved by specifying the number of tasks to break down a given

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Ognen Duzlevski
Have you looked at the individual nodes logs? Can you post a bit more of the exception's output? On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote: Hi all, I got java.lang.ClassNotFoundException even with addJar called. The jar file is present in each node. I use the version of spark from

Re: java.lang.ClassNotFoundException

2014-03-26 Thread Ognen Duzlevski
On Wed, Mar 26, 2014 at 3:34 PM, Ognen Duzlevski og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote: Have you looked at the individual nodes logs? Can you post a bit more of the exception's output? On 3/26/14, 8:42 AM, Jaonary Rabarisoa wrote

Re: Writing RDDs to HDFS

2014-03-25 Thread Ognen Duzlevski
spark.executor.memory. Just curious if I did something wrong. On Mon, Mar 24, 2014 at 7:48 PM, Ognen Duzlevski og...@plainvanillagames.com wrote: Just so I can close this thread (in case anyone else runs into this stuff) - I did sleep through the basics of Spark ;). The answer on why my job

Re: No space left on device exception

2014-03-24 Thread Ognen Duzlevski
? If so, that's good to know because it's definitely counter intuitive. On Sun, Mar 23, 2014 at 8:36 PM, Ognen Duzlevski og...@nengoiksvelzud.com wrote: I would love to work on this (and other) stuff if I can bother someone with questions offline or on a dev mailing list. Ognen On 3/23/14, 10:04 PM

Re: No space left on device exception

2014-03-24 Thread Ognen Duzlevski
the rest of the slaves+master) and increasing. Ognen On 3/24/14, 7:00 AM, Ognen Duzlevski wrote: Patrick, correct. I have a 16 node cluster. On 14 machines out of 16, the inode usage was about 50%. On two of the slaves, one had inode usage of 96% and on the other it was 100%. When i went

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Ognen Duzlevski
Diana, Anywhere on the filesystem you have read/write access (you need not be in your spark home directory): mkdir myproject cd myproject mkdir project mkdir target mkdir -p src/main/scala cp $mypath/$mymysource.scala src/main/scala/ cp $mypath/myproject.sbt . Make sure that myproject.sbt

Re: quick start guide: building a standalone scala program

2014-03-24 Thread Ognen Duzlevski
/usr/lib/spark/sbt/sbt: line 33: sbt/sbt-launch-.jar: No such file or directory Our attempt to download sbt locally to sbt/sbt-launch-.jar failed. Please install sbt manually from http://www.scala-sbt.org/ On Mon, Mar 24, 2014 at 4:25 PM, Ognen Duzlevski og...@plainvanillagames.com mailto:og

Re: Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
, to minimize network traffic. It's how Hadoop works, too.) On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski og...@nengoiksvelzud.com mailto:og...@nengoiksvelzud.com wrote: Is someRDD.saveAsTextFile(hdfs://ip:port/path/final_filename.txt) supposed to work? Meaning, can I save files

Re: Writing RDDs to HDFS

2014-03-24 Thread Ognen Duzlevski
.) (Presumably it does this because it allows each partition to be saved on the local disk, to minimize network traffic. It's how Hadoop works, too.) On Mon, Mar 24, 2014 at 5:00 PM, Ognen Duzlevski og...@nengoiksvelzud.com mailto:og...@nengoiksvelzud.com wrote

No space left on device exception

2014-03-23 Thread Ognen Duzlevski
Hello, I have a weird error showing up when I run a job on my Spark cluster. The version of spark is 0.9 and I have 3+ GB free on the disk when this error shows up. Any ideas what I should be looking for? [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task 167.0:3 failed

Re: No space left on device exception

2014-03-23 Thread Ognen Duzlevski
On 3/23/14, 5:49 PM, Matei Zaharia wrote: You can set spark.local.dir to put this data somewhere other than /tmp if /tmp is full. Actually it’s recommended to have multiple local disks and set to to a comma-separated list of directories, one per disk. Matei, does the number of tasks/partitions

Re: No space left on device exception

2014-03-23 Thread Ognen Duzlevski
partition is particularly small. You might look at the actual executors' logs, as it's possible that this error was caused by an earlier exception, such as too many open files. On Sun, Mar 23, 2014 at 4:46 PM, Ognen Duzlevski og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote

Re: No space left on device exception

2014-03-23 Thread Ognen Duzlevski
(and sorry for the noise)! Ognen On 3/23/14, 9:52 PM, Ognen Duzlevski wrote: Aaron, thanks for replying. I am very much puzzled as to what is going on. A job that used to run on the same cluster is failing with this mysterious message about not having enough disk space when in fact I can see

Re: No space left on device exception

2014-03-23 Thread Ognen Duzlevski
which is not on our current roadmap for state cleanup (cleaning up data which was not fully cleaned up from a crashed process). On Sun, Mar 23, 2014 at 7:57 PM, Ognen Duzlevski og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote: Bleh, strike that, one of my slaves

Parallelizing job execution

2014-03-21 Thread Ognen Duzlevski
Hello, I have a task that runs on a week's worth of data (let's say) and produces a Set of tuples such as Set[(String,Long)] (essentially output of countByValue.toMap) I want to produce 4 sets, one each for a different week and run an intersection of the 4 sets. I have the sequential

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-03-18 Thread Ognen Duzlevski
On 3/18/14, 4:49 AM, dmpou...@gmail.com wrote: On Sunday, 2 March 2014 19:19:49 UTC+2, Aureliano Buendia wrote: Is there a reason for spark using the older akka? On Sun, Mar 2, 2014 at 1:53 PM, 1esha alexey.r...@gmail.com wrote: The problem is in akka remote. It contains files compiled

parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
Hello, Is there anything special about calling functions that parse json lines from filter? I have code that looks like this: jsonMatches(line:String):Boolean = { take a line in json format val jline=parse(line) val je = jline \ event if (je != JNothing je.values.toString ==

Re: parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
. — p...@mult.ifario.us mailto:p...@mult.ifario.us | Multifarious, Inc. | http://mult.ifario.us/ On Thu, Mar 13, 2014 at 8:04 AM, Ognen Duzlevski og...@nengoiksvelzud.com mailto:og...@nengoiksvelzud.com wrote: Hello, Is there anything special about calling functions that parse json

Re: parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
| Multifarious, Inc. | http://mult.ifario.us/ On Thu, Mar 13, 2014 at 9:20 AM, Ognen Duzlevski og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote: Hmm. The whole thing is packaged in a .jar file and I execute .addJar on the SparkContext. My expectation is that the whole

Re: parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
I even tried this: def jsonMatches(line:String):Boolean = true It is still failing with the same error. Ognen On 3/13/14, 11:45 AM, Ognen Duzlevski wrote: I must be really dense! :) Here is the most simplified version of the code, I removed a bunch of stuff and hard-coded the event

Re: parson json within rdd's filter()

2014-03-13 Thread Ognen Duzlevski
OK, problem solved. Interesting thing - I separated the jsonMatches function below and put it in as a method to a separate file/object. Once done that way, it all serializes and works. Ognen On 3/13/14, 11:52 AM, Ognen Duzlevski wrote: I even tried this: def jsonMatches(line:String

Re: Sharing SparkContext

2014-03-10 Thread Ognen Duzlevski
Are you using it with HDFS? What version of Hadoop? 1.0.4? Ognen On 3/10/14, 8:49 PM, abhinav chowdary wrote: for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far.. On Feb 25, 2014 9:23 PM, Ognen Duzlevski og

Can anyone offer any insight at all?

2014-03-07 Thread Ognen Duzlevski
What is wrong with this code? A condensed set of this code works in the spark-shell. It does not work when deployed via a jar. def calcSimpleRetention(start:String,end:String,event1:String,event2:String):List[Double] = { val spd = new PipelineDate(start) val epd = new

Re: Can anyone offer any insight at all?

2014-03-07 Thread Ognen Duzlevski
Strike that. Figured it out. Don't you just hate it when you fire off an email and you figure it out as it is being sent? ;) Ognen On 3/7/14, 12:41 PM, Ognen Duzlevski wrote: What is wrong with this code? A condensed set of this code works in the spark-shell. It does not work when deployed

Re: Can anyone offer any insight at all?

2014-03-07 Thread Ognen Duzlevski
@mayur_rustagi https://twitter.com/mayur_rustagi On Fri, Mar 7, 2014 at 10:43 AM, Ognen Duzlevski og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote: Strike that. Figured it out. Don't you just hate it when you fire off an email and you figure it out as it is being

Re: [BLOG] Spark on Cassandra w/ Calliope

2014-03-07 Thread Ognen Duzlevski
Nice, thanks :) Ognen On 3/7/14, 2:48 PM, Brian O'Neill wrote: FWIW - I posted some notes to help people get started quickly with Spark on C*. http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html (tnx again to Rohit and team for all of their help) -brian -- Brian

Re: Running actions in loops

2014-03-07 Thread Ognen Duzlevski
://twitter.com/mayur_rustagi On Thu, Mar 6, 2014 at 9:50 PM, Ognen Duzlevski og...@plainvanillagames.com mailto:og...@plainvanillagames.com wrote: It looks like the problem is in the filter task - is there anything special about filter()? I have removed the filter line from the loops just

Running actions in loops

2014-03-06 Thread Ognen Duzlevski
Hello, What is the general approach people take when trying to do analysis across multiple large files where the data to be extracted from a successive file depends on the data extracted from a previous file or set of files? For example: I have the following: a group of HDFS files each

Re: Running actions in loops

2014-03-06 Thread Ognen Duzlevski
It looks like the problem is in the filter task - is there anything special about filter()? I have removed the filter line from the loops just to see if things will work and they do. Anyone has any ideas? Thanks! Ognen On 3/6/14, 9:39 PM, Ognen Duzlevski wrote: Hello, What is the general

Re: Spark Worker crashing and Master not seeing recovered worker

2014-03-05 Thread Ognen Duzlevski
Rob, I have seen this too. I have 16 nodes in my spark cluster and for some reason (after app failures) one of the workers will go offline. I will ssh to the machine in question and find that the java process is running but for some reason the master is not noticing this. I have not had the

Re: Actors and sparkcontext actions

2014-03-04 Thread Ognen Duzlevski
Deb, On 3/4/14, 9:02 AM, Debasish Das wrote: Hi Ognen, Any particular reason of choosing scalatra over options like play or spray ? Is scalatra much better in serving apis or is it due to similarity with ruby's sinatra ? Did you try the other options and then pick scalatra ? Not really.

Actors and sparkcontext actions

2014-02-26 Thread Ognen Duzlevski
Can someone point me to a simple, short code example of creating a basic Actor that gets a context and runs an operation such as .textFile.count? I am trying to figure out how to create just a basic actor that gets a message like this: case class Msg(filename:String, ctx: SparkContext) and