zhaomin1423 commented on issue #988: URL: https://github.com/apache/incubator-seatunnel/issues/988#issuecomment-1049481572
The dirty data management has two aspect. First, We can handle data one by one, then, the database must support transactions because when writing a batch data with few dirty data, the database must rollback. Therefore, we can write the batch one by one to catch the dirty data. In spark, add a datasource strategy to transform WriteToDataSourceV2 to an extended WriteToDataSourceV2Exec. So, we can handle the data one by one to mange dirty data. Then, to implement a jdbc connector base on DataSourceV2 API. Welcome to comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
