Thanks all for the clarification. Regards, Shiv
On Sat, Aug 3, 2019 at 12:49 PM Ryan Blue <rb...@netflix.com.invalid> wrote: > > What you could try instead is intermediate output: inserting into > temporal table in executors, and move inserted records to the final table > in driver (must be atomic) > > I think that this is the approach that other systems (maybe sqoop?) have > taken. Insert into independent temporary tables, which can be done quickly. > Then for the final commit operation, union and insert into the final table. > In a lot of cases, JDBC databases can do that quickly as well because the > data is already on disk and just needs to added to the final table. > > On Fri, Aug 2, 2019 at 7:25 PM Jungtaek Lim <kabh...@gmail.com> wrote: > >> I asked similar question for end-to-end exactly-once with Kafka, and >> you're correct distributed transaction is not supported. Introducing >> distributed transaction like "two-phase commit" requires huge change on >> Spark codebase and the feedback was not positive. >> >> What you could try instead is intermediate output: inserting into >> temporal table in executors, and move inserted records to the final table >> in driver (must be atomic). >> >> Thanks, >> Jungtaek Lim (HeartSaVioR) >> >> On Sat, Aug 3, 2019 at 4:56 AM Shiv Prashant Sood <shivprash...@gmail.com> >> wrote: >> >>> All, >>> >>> I understood that DataSourceV2 supports Transactional write and wanted >>> to implement that in JDBC DataSource V2 connector ( PR#25211 >>> <https://github.com/apache/spark/pull/25211> ). >>> >>> Don't see how this is feasible for JDBC based connector. The FW suggest >>> that EXECUTOR send a commit message to DRIVER, and actual commit >>> should only be done by DRIVER after receiving all commit confirmations. >>> This will not work for JDBC as commits have to happen on the JDBC >>> Connection which is maintained by the EXECUTORS and JDBCConnection is not >>> serializable that it can be sent to the DRIVER. >>> >>> Am i right in thinking that this cannot be supported for JDBC? My goal >>> is to either fully write or roll back the dataframe write operation. >>> >>> Thanks in advance for your help. >>> >>> Regards, >>> Shiv >>> >> >> >> -- >> Name : Jungtaek Lim >> Blog : http://medium.com/@heartsavior >> Twitter : http://twitter.com/heartsavior >> LinkedIn : http://www.linkedin.com/in/heartsavior >> > > > -- > Ryan Blue > Software Engineer > Netflix >