Re: Datasource API V2 and checkpointing

2018-05-01 Thread Thakrar, Jayesh
From: Joseph Torres <joseph.tor...@databricks.com> Sent: Tuesday, May 1, 2018 1:58:54 PM To: Ryan Blue Cc: Thakrar, Jayesh; dev@spark.apache.org Subject: Re: Datasource API V2 and checkpointing I agree that Spark should fully handle state serialization and re

Re: Datasource API V2 and checkpointing

2018-05-01 Thread Joseph Torres
Reader keeps a log of all the files it's seen, so its >>>>>> offsets can be simply indices into the log rather than huge strings >>>>>> containing all the paths. >>>>>> >>>>>> SPARK-23323 is orthogonal. That commit coordinator

Re: Datasource API V2 and checkpointing

2018-05-01 Thread Ryan Blue
t;>>> containing all the paths. >>>>> >>>>> SPARK-23323 is orthogonal. That commit coordinator is responsible for >>>>> ensuring that, within a single Spark job, two different tasks can't commit >>>>> the same partition. >>

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Joseph Torres
ingle Spark job, two different tasks can't commit >>>> the same partition. >>>> >>>> On Fri, Apr 27, 2018 at 8:53 AM, Thakrar, Jayesh < >>>> jthak...@conversantmedia.com> wrote: >>>> >>>>> Wondering if this issue is related to SPAR

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Ryan Blue
Jayesh < >>> jthak...@conversantmedia.com> wrote: >>> >>>> Wondering if this issue is related to SPARK-23323? >>>> >>>> >>>> >>>> Any pointers will be greatly appreciated…. >>>> >>>> >>&g

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Joseph Torres
>>> >>> >>> Any pointers will be greatly appreciated…. >>> >>> >>> >>> Thanks, >>> >>> Jayesh >>> >>> >>> >>> *From: *"Thakrar, Jayesh" <jthak...@conversantmedia.com> &

Re: Datasource API V2 and checkpointing

2018-04-30 Thread Ryan Blue
t;> Jayesh >> >> >> >> *From: *"Thakrar, Jayesh" <jthak...@conversantmedia.com> >> *Date: *Monday, April 23, 2018 at 9:49 PM >> *To: *"dev@spark.apache.org" <dev@spark.apache.org> >> *Subject: *Datasource API V2 and checkpoint

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Thakrar, Jayesh
Thanks Joseph! From: Joseph Torres <joseph.tor...@databricks.com> Date: Friday, April 27, 2018 at 11:23 AM To: "Thakrar, Jayesh" <jthak...@conversantmedia.com> Cc: "dev@spark.apache.org" <dev@spark.apache.org> Subject: Re: Datasource API V2 and c

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Joseph Torres
Thanks, > > Jayesh > > > > *From: *"Thakrar, Jayesh" <jthak...@conversantmedia.com> > *Date: *Monday, April 23, 2018 at 9:49 PM > *To: *"dev@spark.apache.org" <dev@spark.apache.org> > *Subject: *Datasource API V2 and checkpointing > > > &g

Re: Datasource API V2 and checkpointing

2018-04-27 Thread Thakrar, Jayesh
Wondering if this issue is related to SPARK-23323? Any pointers will be greatly appreciated…. Thanks, Jayesh From: "Thakrar, Jayesh" <jthak...@conversantmedia.com> Date: Monday, April 23, 2018 at 9:49 PM To: "dev@spark.apache.org" <dev@spark.apache.or

Datasource API V2 and checkpointing

2018-04-23 Thread Thakrar, Jayesh
I was wondering when checkpointing is enabled, who does the actual work? The streaming datasource or the execution engine/driver? I have written a small/trivial datasource that just generates strings. After enabling checkpointing, I do see a folder being created under the checkpoint folder, but