Re: Disable queuing of spark job on Mesos cluster if sufficient resources are not found

2017-05-26 Thread Michael Gummelt
Nope, sorry. On Fri, May 26, 2017 at 4:38 AM, Mevada, Vatsal wrote: > Hello, > > I am using Mesos with cluster deployment mode to submit my jobs. > > When sufficient resources are not available on Mesos cluster, I can see > that my jobs are queuing up on Mesos

[Spark Streaming] DAG Execution Model Clarification

2017-05-26 Thread Nipun Arora
Hi, I would like some clarification on the execution model for spark streaming. Broadly, I am trying to understand if output operations in a DAG are only processed after all intermediate operations are finished for all parts of the DAG. Let me give an example: I have a dstream -A , I do map

user-unsubscr...@spark.apache.org

2017-05-26 Thread williamtellme123
user-unsubscr...@spark.apache.org From: ANEESH .V.V [mailto:aneeshnair.ku...@gmail.com] Sent: Friday, May 26, 2017 1:50 AM To: user@spark.apache.org Subject: unsubscribe unsubscribe

Temp checkpoint directory for EMR (S3 or HDFS)

2017-05-26 Thread Everett Anderson
Hi, I need to set a checkpoint directory as I'm starting to use GraphFrames. (Also, occasionally my regular DataFrame lineages get too long so it'd be nice to use checkpointing to squash the lineage.) I don't actually need this checkpointed data to live beyond the life of the job, however. I'm

Re: Spark checkpoint - nonstreaming

2017-05-26 Thread Jörn Franke
Just load it as from any other directory. > On 26. May 2017, at 17:26, Priya PM wrote: > > > -- Forwarded message -- > From: Priya PM > Date: Fri, May 26, 2017 at 8:54 PM > Subject: Re: Spark checkpoint - nonstreaming > To: Jörn Franke

Fwd: Spark checkpoint - nonstreaming

2017-05-26 Thread Priya PM
-- Forwarded message -- From: Priya PM Date: Fri, May 26, 2017 at 8:54 PM Subject: Re: Spark checkpoint - nonstreaming To: Jörn Franke Oh, how do i do it. I dont see it mentioned anywhere in the documentation. I have followed this link

[no subject]

2017-05-26 Thread Anton Kravchenko
df.rdd.foreachPartition(convert_to_sas_single_partition) def convert_to_sas_single_partition(ipartition: Iterator[Row]): Unit = { for (irow <- ipartition) {

Re: Spark checkpoint - nonstreaming

2017-05-26 Thread Jörn Franke
Do you have some source code? Did you set the checkpoint directory ? > On 26. May 2017, at 16:06, Priya wrote: > > Hi, > > With nonstreaming spark application, did checkpoint the RDD and I could see > the RDD getting checkpointed. I have killed the application after >

Re: Spark checkpoint - nonstreaming

2017-05-26 Thread Holden Karau
In non streaming Spark checkpoints aren't for inter-application recovery, rather you can think of them as doing persist but to a HDFS rather than each nodes local memory / storage. On Fri, May 26, 2017 at 3:06 PM Priya wrote: > Hi, > > With nonstreaming spark application,

using pandas and pyspark to run ETL job - always failing after about 40 minutes

2017-05-26 Thread Zeming Yu
Hi, I tried running the ETL job a few times. It always fails after 40 minutes or so. When I relaunch jupyter and rerun the job, it runs without error. Then it fails again after some time. Just wondering if anyone else has encountered this before? Here's the error message:

Re: Documentation on "Automatic file coalescing for native data sources"?

2017-05-26 Thread Daniel Siegmann
Thanks for the help everyone. It seems the automatic coalescing doesn't happen when accessing ORC data through a Hive metastore unless you configure spark.sql.hive.convertMetastoreOrc to be true (it is false by default). I'm not sure if this is documented somewhere, or if there's any reason not

Spark checkpoint - nonstreaming

2017-05-26 Thread Priya
Hi, With nonstreaming spark application, did checkpoint the RDD and I could see the RDD getting checkpointed. I have killed the application after checkpointing the RDD and restarted the same application again immediately, but it doesn't seem to pick from checkpoint and it again checkpoints the

convert ps to jpg file

2017-05-26 Thread Selvam Raman
Hi, is there any good open source to convert the ps to jpg?. I am running spark job within that i am using Imagemagick/Graphicsmagick with Ghostscript to convert/resize image. IM/GM is took lot of memory/map memory/disk to convert KB of image file and took lot of time. Because of this issue

Disable queuing of spark job on Mesos cluster if sufficient resources are not found

2017-05-26 Thread Mevada, Vatsal
Hello, I am using Mesos with cluster deployment mode to submit my jobs. When sufficient resources are not available on Mesos cluster, I can see that my jobs are queuing up on Mesos dispatcher UI. Is it possible to tweak some configuration so that my job submission fails gracefully(instead of

unsubscribe

2017-05-26 Thread ANEESH .V.V
unsubscribe

Re: Running into the same problem as JIRA SPARK-19268

2017-05-26 Thread kant kodali
https://issues.apache.org/jira/browse/SPARK-20894 On Thu, May 25, 2017 at 4:31 PM, Shixiong(Ryan) Zhu wrote: > I don't know what happened in your case so cannot provide any work around. > It would be great if you can provide logs output > by