Hi All, Need your views and suggestions regarding JdbcPOJOInsertOutput operator. This operator creates a transaction at the start of window, executes batches of SQL updates, and closes the transaction at the end of the window. Each tuple corresponds to an SQL insert statement. The operator groups the inserts in a batch and submits them with one call to the database. To write a tuple exactly once in the database, only when all the updates are executed, the transaction is committed in the end window call. For all this to function as per the expectation the underlying database (or table) to which we are writing must have transaction capabilities. For example the insert statements should not be auto committed, they should only be committed when a commit is fired in endWindow(). If a commit is not fired and the connection is closed (or roll backed) then there should not be any inserts in the table. This is important for exactly once to work correctly. For example consider a batch size of 10000 and if the operator/container is killed after inserting 4000 rows then when the operator comes back again these 4000 redundant rows will be inserted again.
So to handle the above scenario we can document and make it user's responsibility to give the table/database supporting transactions, for example if the database is MySQL then it is user's responsibility to give the table with storage engine as InnoDB. Please let me know if you have any other solution confirming to SQL standards. Regards, Hitesh Kapoor
