Re: Visualizing Spark Streaming data

2015-03-20 Thread Roger Hoover
Hi Harut, Jeff's right that Kibana + Elasticsearch can take you quite far out of the box. Depending on your volume of data, you may only be able to keep recent data around though. Another option that is custom-built for handling many dimensions at query time (not as separate metrics) is Druid

Re: Is Spark the right tool for me?

2014-12-02 Thread Roger Hoover
I’ve also considered to use Kafka to message between Web UI and the pipes, I think it will fit. Chaining the pipes together as a workflow and implementing, managing and monitoring these long running user tasks with locality as I need them is still causing me headache. You can look at Apache

Re: Low Level Kafka Consumer for Spark

2014-08-30 Thread Roger Hoover
I have this same question. Isn't there somewhere that the Kafka range metadata can be saved? From my naive perspective, it seems like it should be very similar to HDFS lineage. The original HDFS blocks are kept somewhere (in the driver?) so that if an RDD partition is lost, it can be

Re: Using Spark on Data size larger than Memory size

2014-06-06 Thread Roger Hoover
as the work that Aaron mentioned is happening, I think he might be referring to the discussion and code surrounding https://issues.apache.org/jira/browse/SPARK-983 Cheers! Andrew On Thu, Jun 5, 2014 at 5:16 PM, Roger Hoover roger.hoo...@gmail.com wrote: I think it would very handy to be able

Re: Using Spark on Data size larger than Memory size

2014-06-05 Thread Roger Hoover
Hi Aaron, When you say that sorting is being worked on, can you elaborate a little more please? If particular, I want to sort the items within each partition (not globally) without necessarily bringing them all into memory at once. Thanks, Roger On Sat, May 31, 2014 at 11:10 PM, Aaron

Re: Using Spark on Data size larger than Memory size

2014-06-05 Thread Roger Hoover
I think it would very handy to be able to specify that you want sorting during a partitioning stage. On Thu, Jun 5, 2014 at 4:42 PM, Roger Hoover roger.hoo...@gmail.com wrote: Hi Aaron, When you say that sorting is being worked on, can you elaborate a little more please? If particular, I

Re: Running a spark-submit compatible app in spark-shell

2014-06-04 Thread Roger Hoover
AM, Roger Hoover roger.hoo...@gmail.com wrote: Thanks, Andrew. I'll give it a try. On Mon, May 26, 2014 at 2:22 PM, Andrew Or and...@databricks.com wrote: Hi Roger, This was due to a bug in the Spark shell code, and is fixed in the latest master (and RC11). Here is the commit that fixed

Re: Running a spark-submit compatible app in spark-shell

2014-05-27 Thread Roger Hoover
/8edbee7d1b4afc192d97ba192a5526affc464205. Try it now and it should work. :) Andrew 2014-05-26 10:35 GMT+02:00 Perttu Ranta-aho ranta...@iki.fi: Hi Roger, Were you able to solve this? -Perttu On Tue, Apr 29, 2014 at 8:11 AM, Roger Hoover roger.hoo...@gmail.comwrote: Patrick, Thank you

Re: How to declare Tuple return type for a function

2014-04-29 Thread Roger Hoover
The return type should be RDD[(Int, Int, Int)] because sc.textFile() returns an RDD. Try adding an import for the RDD type to get rid of the compile error. import org.apache.spark.rdd.RDD On Mon, Apr 28, 2014 at 6:22 PM, SK skrishna...@gmail.com wrote: Hi, I am a new user of Spark. I have

Re: Running a spark-submit compatible app in spark-shell

2014-04-28 Thread Roger Hoover
that method from the SBT shell, that should work. Matei On Apr 27, 2014, at 3:14 PM, Roger Hoover roger.hoo...@gmail.com wrote: Hi, From the meetup talk about the 1.0 release, I saw that spark-submit will be the preferred way to launch apps going forward. How do you recommend launching

Re: Running a spark-submit compatible app in spark-shell

2014-04-28 Thread Roger Hoover
. When I do that in the scala repl, it works. BTW, I'm using the latest code from the master branch (8421034e793c0960373a0a1d694ce334ad36e747) On Mon, Apr 28, 2014 at 3:40 PM, Roger Hoover roger.hoo...@gmail.comwrote: Matei, thank you. That seemed to work but I'm not able to import a class

Re: Running a spark-submit compatible app in spark-shell

2014-04-28 Thread Roger Hoover
this or the --jars flag should work, but it's possible there is a bug with the --jars flag when calling the Repl. On Mon, Apr 28, 2014 at 4:30 PM, Roger Hoover roger.hoo...@gmail.comwrote: A couple of issues: 1) the jar doesn't show up on the classpath even though SparkSubmit had it in the --jars

Running a spark-submit compatible app in spark-shell

2014-04-27 Thread Roger Hoover
Hi, From the meetup talk about the 1.0 release, I saw that spark-submit will be the preferred way to launch apps going forward. How do you recommend launching such jobs in a development cycle? For example, how can I load an app that's expecting to a given to spark-submit into spark-shell?

Re: How to cogroup/join pair RDDs with different key types?

2014-04-16 Thread Roger Hoover
need help with? On Wed, Apr 16, 2014 at 7:11 PM, Roger Hoover roger.hoo...@gmail.comwrote: Ah, in case this helps others, looks like RDD.zipPartitions will accomplish step 4. On Tue, Apr 15, 2014 at 10:44 AM, Roger Hoover roger.hoo...@gmail.comwrote: Andrew, Thank you very much for your

Re: How to cogroup/join pair RDDs with different key types?

2014-04-15 Thread Roger Hoover
to have the cartesian product work against you on scale at that point. Andrew On Tue, Apr 15, 2014 at 1:07 AM, Roger Hoover roger.hoo...@gmail.comwrote: Hi, I'm trying to figure out how to join two RDDs with different key types and appreciate any suggestions. Say I have two RDDS

Re: How to cogroup/join pair RDDs with different key types?

2014-04-15 Thread Roger Hoover
I'm thinking of creating a union type for the key so that IPRange and IP types can be joined. On Tue, Apr 15, 2014 at 10:44 AM, Roger Hoover roger.hoo...@gmail.comwrote: Andrew, Thank you very much for your feedback. Unfortunately, the ranges are not of predictable size but you gave me

How to cogroup/join pair RDDs with different key types?

2014-04-14 Thread Roger Hoover
Hi, I'm trying to figure out how to join two RDDs with different key types and appreciate any suggestions. Say I have two RDDS: ipToUrl of type (IP, String) ipRangeToZip of type (IPRange, String) How can I join/cogroup these two RDDs together to produce a new RDD of type (IP, (String,

Re: Spark - ready for prime time?

2014-04-10 Thread Roger Hoover
Can anyone comment on their experience running Spark Streaming in production? On Thu, Apr 10, 2014 at 10:33 AM, Dmitriy Lyubimov dlie...@gmail.comwrote: On Thu, Apr 10, 2014 at 9:24 AM, Andrew Ash and...@andrewash.com wrote: The biggest issue I've come across is that the cluster is