Hi Harut,
Jeff's right that Kibana + Elasticsearch can take you quite far out of the
box. Depending on your volume of data, you may only be able to keep recent
data around though.
Another option that is custom-built for handling many dimensions at query
time (not as separate metrics) is Druid
I’ve also considered to use Kafka to message between Web UI and the pipes,
I think it will fit. Chaining the pipes together as a workflow and
implementing, managing and monitoring these long running user tasks with
locality as I need them is still causing me headache.
You can look at Apache
I have this same question. Isn't there somewhere that the Kafka range
metadata can be saved? From my naive perspective, it seems like it should
be very similar to HDFS lineage. The original HDFS blocks are kept
somewhere (in the driver?) so that if an RDD partition is lost, it can be
as the work that Aaron mentioned is happening, I think he might be
referring to the discussion and code surrounding
https://issues.apache.org/jira/browse/SPARK-983
Cheers!
Andrew
On Thu, Jun 5, 2014 at 5:16 PM, Roger Hoover roger.hoo...@gmail.com
wrote:
I think it would very handy to be able
Hi Aaron,
When you say that sorting is being worked on, can you elaborate a little
more please?
If particular, I want to sort the items within each partition (not
globally) without necessarily bringing them all into memory at once.
Thanks,
Roger
On Sat, May 31, 2014 at 11:10 PM, Aaron
I think it would very handy to be able to specify that you want sorting
during a partitioning stage.
On Thu, Jun 5, 2014 at 4:42 PM, Roger Hoover roger.hoo...@gmail.com wrote:
Hi Aaron,
When you say that sorting is being worked on, can you elaborate a little
more please?
If particular, I
AM, Roger Hoover roger.hoo...@gmail.com
wrote:
Thanks, Andrew. I'll give it a try.
On Mon, May 26, 2014 at 2:22 PM, Andrew Or and...@databricks.com wrote:
Hi Roger,
This was due to a bug in the Spark shell code, and is fixed in the latest
master (and RC11). Here is the commit that fixed
/8edbee7d1b4afc192d97ba192a5526affc464205.
Try it now and it should work. :)
Andrew
2014-05-26 10:35 GMT+02:00 Perttu Ranta-aho ranta...@iki.fi:
Hi Roger,
Were you able to solve this?
-Perttu
On Tue, Apr 29, 2014 at 8:11 AM, Roger Hoover roger.hoo...@gmail.comwrote:
Patrick,
Thank you
The return type should be RDD[(Int, Int, Int)] because sc.textFile()
returns an RDD. Try adding an import for the RDD type to get rid of the
compile error.
import org.apache.spark.rdd.RDD
On Mon, Apr 28, 2014 at 6:22 PM, SK skrishna...@gmail.com wrote:
Hi,
I am a new user of Spark. I have
that method from the SBT shell, that should work.
Matei
On Apr 27, 2014, at 3:14 PM, Roger Hoover roger.hoo...@gmail.com wrote:
Hi,
From the meetup talk about the 1.0 release, I saw that spark-submit will
be the preferred way to launch apps going forward.
How do you recommend launching
. When I do that in the scala repl, it works.
BTW, I'm using the latest code from the master branch
(8421034e793c0960373a0a1d694ce334ad36e747)
On Mon, Apr 28, 2014 at 3:40 PM, Roger Hoover roger.hoo...@gmail.comwrote:
Matei, thank you. That seemed to work but I'm not able to import a class
this or the --jars flag should work, but it's possible
there is a bug with the --jars flag when calling the Repl.
On Mon, Apr 28, 2014 at 4:30 PM, Roger Hoover roger.hoo...@gmail.comwrote:
A couple of issues:
1) the jar doesn't show up on the classpath even though SparkSubmit had
it in the --jars
Hi,
From the meetup talk about the 1.0 release, I saw that spark-submit will be
the preferred way to launch apps going forward.
How do you recommend launching such jobs in a development cycle? For
example, how can I load an app that's expecting to a given to spark-submit
into spark-shell?
need help with?
On Wed, Apr 16, 2014 at 7:11 PM, Roger Hoover roger.hoo...@gmail.comwrote:
Ah, in case this helps others, looks like RDD.zipPartitions will
accomplish step 4.
On Tue, Apr 15, 2014 at 10:44 AM, Roger Hoover roger.hoo...@gmail.comwrote:
Andrew,
Thank you very much for your
to have the
cartesian product work against you on scale at that point.
Andrew
On Tue, Apr 15, 2014 at 1:07 AM, Roger Hoover roger.hoo...@gmail.comwrote:
Hi,
I'm trying to figure out how to join two RDDs with different key types
and appreciate any suggestions.
Say I have two RDDS
I'm thinking of creating a union type for the key so that IPRange and IP
types can be joined.
On Tue, Apr 15, 2014 at 10:44 AM, Roger Hoover roger.hoo...@gmail.comwrote:
Andrew,
Thank you very much for your feedback. Unfortunately, the ranges are not
of predictable size but you gave me
Hi,
I'm trying to figure out how to join two RDDs with different key types and
appreciate any suggestions.
Say I have two RDDS:
ipToUrl of type (IP, String)
ipRangeToZip of type (IPRange, String)
How can I join/cogroup these two RDDs together to produce a new RDD of type
(IP, (String,
Can anyone comment on their experience running Spark Streaming in
production?
On Thu, Apr 10, 2014 at 10:33 AM, Dmitriy Lyubimov dlie...@gmail.comwrote:
On Thu, Apr 10, 2014 at 9:24 AM, Andrew Ash and...@andrewash.com wrote:
The biggest issue I've come across is that the cluster is
18 matches
Mail list logo