Re: [ERROR] Insufficient Space

2015-06-19 Thread Vadim Bichutskiy
if there is, it probably still requires manual work on the new nodes. This would be the advantage of EMR over EC2, as we take care of all of that configuration. ~ Jonathan From: Vadim Bichutskiy vadim.bichuts...@gmail.com Date: Friday, June 19, 2015 at 5:21 PM To: Jonathan Kelly jonat

Re: [ERROR] Insufficient Space

2015-06-19 Thread Vadim Bichutskiy
: Would you be able to use Spark on EMR rather than on EC2? EMR clusters allow easy resizing of the cluster, and EMR also now supports Spark 1.3.1 as of EMR AMI 3.8.0. See http://aws.amazon.com/emr/spark ~ Jonathan From: Vadim Bichutskiy vadim.bichuts...@gmail.com Date: Friday, June 19, 2015

[ERROR] Insufficient Space

2015-06-19 Thread Vadim Bichutskiy
Hello Spark Experts, I've been running a standalone Spark cluster on EC2 for a few months now, and today I get this error: IOError: [Errno 28] No space left on device Spark assembly has been built with Hive, including Datanucleus jars on classpath OpenJDK 64-Bit Server VM warning: Insufficient

Re: Is anyone using Amazon EC2?

2015-05-23 Thread Vadim Bichutskiy
Yes, we're running Spark on EC2. Will transition to EMR soon. -Vadim ᐧ On Sat, May 23, 2015 at 2:22 PM, Johan Beisser j...@caustic.org wrote: Yes. We're looking at bootstrapping in EMR... On Sat, May 23, 2015 at 07:21 Joe Wass jw...@crossref.org wrote: I used Spark on EC2 a while ago

Re: textFileStream Question

2015-05-17 Thread Vadim Bichutskiy
/FileInputDStream.scala#L172 Thanks Best Regards On Fri, May 15, 2015 at 2:25 AM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: How does textFileStream work behind the scenes? How does Spark Streaming know what files are new and need to be processed? Is it based on time stamp, file

Re: DStream Union vs. StreamingContext Union

2015-05-14 Thread Vadim Bichutskiy
, 2015 at 9:53 AM, Evo Eftimov evo.efti...@isecc.com wrote: I can confirm it does work in Java *From:* Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] *Sent:* Tuesday, May 12, 2015 5:53 PM *To:* Evo Eftimov *Cc:* Saisai Shao; user@spark.apache.org *Subject:* Re: DStream Union vs

textFileStream Question

2015-05-14 Thread Vadim Bichutskiy
How does textFileStream work behind the scenes? How does Spark Streaming know what files are new and need to be processed? Is it based on time stamp, file name? Thanks, Vadim ᐧ

Re: DStream Union vs. StreamingContext Union

2015-05-12 Thread Vadim Bichutskiy
...@isecc.com wrote: I can confirm it does work in Java *From:* Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] *Sent:* Tuesday, May 12, 2015 5:53 PM *To:* Evo Eftimov *Cc:* Saisai Shao; user@spark.apache.org *Subject:* Re: DStream Union vs. StreamingContext Union Thanks Evo. I tried

Re: DStream Union vs. StreamingContext Union

2015-05-12 Thread Vadim Bichutskiy
Vadim Bichutskiy vadim.bichuts...@gmail.com: Can someone explain to me the difference between DStream union and StreamingContext union? When do you use one vs the other? Thanks, Vadim ᐧ

Re: DStream Union vs. StreamingContext Union

2015-05-12 Thread Vadim Bichutskiy
multiple DstreamRDDs in this way DstreamRDD1.union(DstreamRDD2).union(DstreamRDD3) etc etc Ps: the API is not “redundant” it offers several ways for achivieving the same thing as a convenience depending on the situation *From:* Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] *Sent

DStream Union vs. StreamingContext Union

2015-05-11 Thread Vadim Bichutskiy
Can someone explain to me the difference between DStream union and StreamingContext union? When do you use one vs the other? Thanks, Vadim ᐧ

Re: Re: Spark streaming - textFileStream/fileStream - Get file name

2015-04-28 Thread Vadim Bichutskiy
I was wondering about the same thing. Vadim ᐧ On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com bit1...@163.com wrote: Looks to me that the same thing also applies to the SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD to figure out the file information where the

Re: Weird error/exception

2015-04-28 Thread Vadim Bichutskiy
I was having this issue when my batch interval was very big -- like 5 minutes. When my batch interval is smaller, I don't get this exception. Can someone explain to me why this might be happening? Vadim ᐧ On Tue, Apr 28, 2015 at 4:26 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: I am

Weird error/exception

2015-04-28 Thread Vadim Bichutskiy
I am using Spark Streaming to monitor an S3 bucket. Everything appears to be fine. But every batch interval I get the following: *15/04/28 16:12:36 WARN HttpMethodReleaseInputStream: Attempting to release HttpMethod in finalize() as its response data stream has gone out of scope. This attempt

Re: Map Question

2015-04-23 Thread Vadim Bichutskiy
def get_metadata(): ... return mylist ᐧ On Wed, Apr 22, 2015 at 6:47 PM, Tathagata Das t...@databricks.com wrote: Can you give full code? especially the myfunc? On Wed, Apr 22, 2015 at 2:20 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: Here's what I did: print 'BROADCASTING

Re: Map Question

2015-04-23 Thread Vadim Bichutskiy
as you share a spark context all will work as expected. http://stackoverflow.com/questions/142545/python-how-to-make-a-cross-module-variable Sent with Good (www.good.com) -Original Message- *From: *Vadim Bichutskiy [vadim.bichuts...@gmail.com] *Sent: *Thursday, April 23, 2015

Re: Map Question

2015-04-22 Thread Vadim Bichutskiy
it will immutable at the executors, and if you update the list at the driver, you will have to broadcast it again. TD On Wed, Apr 22, 2015 at 9:28 AM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: I am using Spark Streaming with Python. For each RDD, I call a map, i.e., myrdd.map(myfunc

Re: Map Question

2015-04-22 Thread Vadim Bichutskiy
is in a different module. How do I make it aware of broadcastVar? ᐧ On Wed, Apr 22, 2015 at 2:13 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: Great. Will try to modify the code. Always room to optimize! ᐧ On Wed, Apr 22, 2015 at 2:11 PM, Tathagata Das t...@databricks.com wrote

saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
I am using Spark Streaming where during each micro-batch I output data to S3 using saveAsTextFile. Right now each batch of data is put into its own directory containing 2 objects, _SUCCESS and part-0. How do I output each batch into a common directory? Thanks, Vadim ᐧ

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
containing partitions, as is common in Hadoop. You can move them later, or just read them where they are. On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: I am using Spark Streaming where during each micro-batch I output data to S3 using saveAsTextFile

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
find them easily. Or consider somehow sending the batches of data straight into Redshift? no idea how that is done but I imagine it's doable. On Thu, Apr 16, 2015 at 6:38 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: Thanks Sean. I want to load each batch into Redshift. What's

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
the time as part of a DStream If you want fine / detailed management of the writing to HDFS you can implement your own HDFS adapter and invoke it in forEachRDD and foreach Regards Evo Eftimov From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] Sent: Thursday, April 16, 2015 6

Re: sbt-assembly spark-streaming-kinesis-asl error

2015-04-14 Thread Vadim Bichutskiy
(Constructor.java:526) at java.lang.Class.newInstance(Class.java:379) Has anyone else run into this issue? On Mon, Apr 13, 2015 at 6:46 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: I don't believe the Kinesis asl should be provided. I used mergeStrategy successfully to produce an uber jar

Re: sbt-assembly spark-streaming-kinesis-asl error

2015-04-13 Thread Vadim Bichutskiy
I don't believe the Kinesis asl should be provided. I used mergeStrategy successfully to produce an uber jar. Fyi, I've been having trouble consuming data out of Kinesis with Spark with no success :( Would be curious to know if you got it working. Vadim On Apr 13, 2015, at 9:36 PM, Mike

Re: sbt-assembly spark-streaming-kinesis-asl error

2015-04-13 Thread Vadim Bichutskiy
a spark-submit job via uber jar). Feel free to add me to gmail chat and maybe we can help each other. On Mon, Apr 13, 2015 at 6:46 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: I don't believe the Kinesis asl should be provided. I used mergeStrategy successfully to produce an uber

Spark Streaming and SQL

2015-04-08 Thread Vadim Bichutskiy
Hi all, I am using Spark Streaming to monitor an S3 bucket for objects that contain JSON. I want to import that JSON into Spark SQL DataFrame. Here's my current code: *from pyspark import SparkContext, SparkConf* *from pyspark.streaming import StreamingContext* *import json* *from pyspark.sql

Empty RDD?

2015-04-08 Thread Vadim Bichutskiy
When I call *transform* or *foreachRDD *on* DStream*, I keep getting an error that I have an empty RDD, which make sense since my batch interval maybe smaller than the rate at which new data are coming in. How to guard against it? Thanks, Vadim ᐧ

Re: Spark Streaming and SQL

2015-04-08 Thread Vadim Bichutskiy
Hi all, I figured it out! The DataFrames and SQL example in Spark Streaming docs were useful. Best, Vadim ᐧ On Wed, Apr 8, 2015 at 2:38 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: Hi all, I am using Spark Streaming to monitor an S3 bucket for objects that contain JSON. I want

Re: Empty RDD?

2015-04-08 Thread Vadim Bichutskiy
Thanks TD! On Apr 8, 2015, at 9:36 PM, Tathagata Das t...@databricks.com wrote: Aah yes. The jsonRDD method needs to walk through the whole RDD to understand the schema, and does not work if there is not data in it. Making sure there is no data in it using take(1) should work. TD

Re: Spark + Kinesis

2015-04-07 Thread Vadim Bichutskiy
Hey y'all, While I haven't been able to get Spark + Kinesis integration working, I pivoted to plan B: I now push data to S3 where I set up a DStream to monitor an S3 bucket with textFileStream, and that works great. I 3 Spark! Best, Vadim ᐧ On Mon, Apr 6, 2015 at 12:23 PM, Vadim Bichutskiy

Re: Spark + Kinesis

2015-04-06 Thread Vadim Bichutskiy
Hi all, I am wondering, has anyone on this list been able to successfully implement Spark on top of Kinesis? Best, Vadim ᐧ On Sun, Apr 5, 2015 at 1:50 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: ᐧ Hi all, Below is the output that I am getting. My Kinesis stream has 1 shard

Re: Spark + Kinesis

2015-04-05 Thread Vadim Bichutskiy
*** 15/04/05 17:14:50 INFO scheduler.ReceivedBlockTracker: Deleting batches ArrayBuffer(142825407 ms) On Sat, Apr 4, 2015 at 3:13 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: Hi all, More good news! I was able to utilize mergeStrategy to assembly my Kinesis consumer into an uber

Re: Spark + Kinesis

2015-04-04 Thread Vadim Bichutskiy
-kinesis-asl libraryDependencies += org.apache.spark %% spark-streaming-kinesis-asl % 1.3.0 On Fri, Apr 3, 2015 at 12:45 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: Thanks. So how do I fix it? ᐧ On Fri, Apr 3, 2015 at 3:43 PM, Kelly, Jonathan jonat...@amazon.com wrote: spark

Re: Spark + Kinesis

2015-04-03 Thread Vadim Bichutskiy
% 1.3.0 On Fri, Apr 3, 2015 at 12:45 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: Thanks. So how do I fix it? ᐧ On Fri, Apr 3, 2015 at 3:43 PM, Kelly, Jonathan jonat...@amazon.com wrote: spark-streaming-kinesis-asl is not part of the Spark distribution on your cluster, so you

Re: Spark + Kinesis

2015-04-03 Thread Vadim Bichutskiy
were not included in the assembly (but yes, they should be). ~ Jonathan Kelly From: Vadim Bichutskiy vadim.bichuts...@gmail.com Date: Friday, April 3, 2015 at 12:26 PM To: Jonathan Kelly jonat...@amazon.com Cc: user@spark.apache.org user@spark.apache.org Subject: Re: Spark + Kinesis Hi

Re: How to learn Spark ?

2015-04-02 Thread Vadim Bichutskiy
You can start with http://spark.apache.org/docs/1.3.0/index.html Also get the Learning Spark book http://amzn.to/1NDFI5x. It's great. Enjoy! Vadim ᐧ On Thu, Apr 2, 2015 at 4:19 AM, Star Guo st...@ceph.me wrote: Hi, all I am new to here. Could you give me some suggestion to learn Spark ?

Spark + Kinesis

2015-04-02 Thread Vadim Bichutskiy
Hi all, I am trying to write an Amazon Kinesis consumer Scala app that processes data in the Kinesis stream. Is this the correct way to specify *build.sbt*: --- *import AssemblyKeys._* *name := Kinesis Consumer* *version := 1.0organization := com.myconsumerscalaVersion :=

Re: How to learn Spark ?

2015-04-02 Thread Vadim Bichutskiy
=5533377798602752pi=4b4c247b-b7e9-4031-81d5-9b9a8f5f1963 http://polyglotprogramming.com On Thu, Apr 2, 2015 at 8:33 AM, Vadim Bichutskiy vadim.bichuts...@gmail.com wrote: You can start with http://spark.apache.org/docs/1.3.0/index.html Also get the Learning Spark book http://amzn.to/1NDFI5x. It's

Re: Spark + Kinesis

2015-04-02 Thread Vadim Bichutskiy
for that, and I temporarily moved on to other things for now. ~ Jonathan Kelly From: 'Vadim Bichutskiy' vadim.bichuts...@gmail.com Date: Thursday, April 2, 2015 at 9:53 AM To: user@spark.apache.org user@spark.apache.org Subject: Spark + Kinesis Hi all, I am trying to write an Amazon

Spark on EC2

2015-04-01 Thread Vadim Bichutskiy
Hi all, I just tried launching a Spark cluster on EC2 as described in http://spark.apache.org/docs/1.3.0/ec2-scripts.html I got the following response: *ResponseErrorsErrorCodePendingVerification/CodeMessageYour account is currently being verified. Verification normally takes less than 2