Re: DataSourceWriter V2 Api questions

Ryan Blue Mon, 10 Sep 2018 10:20:02 -0700

Ross, I think the intent is to create a single transaction on the driver,
write as part of it in each task, and then commit the transaction once the
tasks complete. Is that possible in your implementation?


I think that part of this is made more difficult by not having a clear
starting point for a write, which we are fixing in the redesign of the v2
API. That will have a method that creates a Write to track the operation.
That can create your transaction when it is created and commit the
transaction when commit is called on it.

rb

On Mon, Sep 10, 2018 at 9:05 AM Reynold Xin <r...@databricks.com> wrote:

> Typically people do it via transactions, or staging tables.
>
>
> On Mon, Sep 10, 2018 at 2:07 AM Ross Lawley <ross.law...@gmail.com> wrote:
>
>> Hi all,
>>
>> I've been prototyping an implementation of the DataSource V2 writer for
>> the MongoDB Spark Connector and I have a couple of questions about how its
>> intended to be used with database systems. According to the Javadoc for
>> DataWriter.commit():
>>
>>
>> *"this method should still "hide" the written data and ask the
>> DataSourceWriter at driver side to do the final commit via
>> WriterCommitMessage"*
>>
>> Although, MongoDB now has transactions, it doesn't have a way to "hide"
>> the data once it has been written. So as soon as the DataWriter has
>> committed the data, it has been inserted/updated in the collection and is
>> discoverable - thereby breaking the documented contract.
>>
>> I was wondering how other databases systems plan to implement this API
>> and meet the contract as per the Javadoc?
>>
>> Many thanks
>>
>> Ross
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: DataSourceWriter V2 Api questions

Reply via email to