Nope, sorry.
On Fri, May 26, 2017 at 4:38 AM, Mevada, Vatsal
wrote:
> Hello,
>
> I am using Mesos with cluster deployment mode to submit my jobs.
>
> When sufficient resources are not available on Mesos cluster, I can see
> that my jobs are queuing up on Mesos
Hi,
I would like some clarification on the execution model for spark streaming.
Broadly, I am trying to understand if output operations in a DAG are only
processed after all intermediate operations are finished for all parts of
the DAG.
Let me give an example:
I have a dstream -A , I do map
user-unsubscr...@spark.apache.org
From: ANEESH .V.V [mailto:aneeshnair.ku...@gmail.com]
Sent: Friday, May 26, 2017 1:50 AM
To: user@spark.apache.org
Subject: unsubscribe
unsubscribe
Hi,
I need to set a checkpoint directory as I'm starting to use GraphFrames.
(Also, occasionally my regular DataFrame lineages get too long so it'd be
nice to use checkpointing to squash the lineage.)
I don't actually need this checkpointed data to live beyond the life of the
job, however. I'm
Just load it as from any other directory.
> On 26. May 2017, at 17:26, Priya PM wrote:
>
>
> -- Forwarded message --
> From: Priya PM
> Date: Fri, May 26, 2017 at 8:54 PM
> Subject: Re: Spark checkpoint - nonstreaming
> To: Jörn Franke
-- Forwarded message --
From: Priya PM
Date: Fri, May 26, 2017 at 8:54 PM
Subject: Re: Spark checkpoint - nonstreaming
To: Jörn Franke
Oh, how do i do it. I dont see it mentioned anywhere in the documentation.
I have followed this link
df.rdd.foreachPartition(convert_to_sas_single_partition)
def convert_to_sas_single_partition(ipartition: Iterator[Row]): Unit = {
for (irow <- ipartition) {
Do you have some source code?
Did you set the checkpoint directory ?
> On 26. May 2017, at 16:06, Priya wrote:
>
> Hi,
>
> With nonstreaming spark application, did checkpoint the RDD and I could see
> the RDD getting checkpointed. I have killed the application after
>
In non streaming Spark checkpoints aren't for inter-application recovery,
rather you can think of them as doing persist but to a HDFS rather than
each nodes local memory / storage.
On Fri, May 26, 2017 at 3:06 PM Priya wrote:
> Hi,
>
> With nonstreaming spark application,
Hi,
I tried running the ETL job a few times. It always fails after 40 minutes
or so. When I relaunch jupyter and rerun the job, it runs without error.
Then it fails again after some time. Just wondering if anyone else has
encountered this before?
Here's the error message:
Thanks for the help everyone.
It seems the automatic coalescing doesn't happen when accessing ORC data
through a Hive metastore unless you configure
spark.sql.hive.convertMetastoreOrc to be true (it is false by default). I'm
not sure if this is documented somewhere, or if there's any reason not
Hi,
With nonstreaming spark application, did checkpoint the RDD and I could see
the RDD getting checkpointed. I have killed the application after
checkpointing the RDD and restarted the same application again immediately,
but it doesn't seem to pick from checkpoint and it again checkpoints the
Hi,
is there any good open source to convert the ps to jpg?.
I am running spark job within that i am using Imagemagick/Graphicsmagick
with Ghostscript to convert/resize image.
IM/GM is took lot of memory/map memory/disk to convert KB of image file and
took lot of time. Because of this issue
Hello,
I am using Mesos with cluster deployment mode to submit my jobs.
When sufficient resources are not available on Mesos cluster, I can see that my
jobs are queuing up on Mesos dispatcher UI.
Is it possible to tweak some configuration so that my job submission fails
gracefully(instead of
unsubscribe
https://issues.apache.org/jira/browse/SPARK-20894
On Thu, May 25, 2017 at 4:31 PM, Shixiong(Ryan) Zhu wrote:
> I don't know what happened in your case so cannot provide any work around.
> It would be great if you can provide logs output
> by
16 matches
Mail list logo