Hi All,

Need your views and suggestions regarding JdbcPOJOInsertOutput operator.
This operator creates a transaction at the start of window, executes
batches of SQL updates, and closes the transaction at the end of the window.
Each tuple corresponds to an SQL insert statement. The operator groups the
inserts in a batch and submits them with one call to the database.
To write a tuple exactly once in the database, only when all the updates
are executed, the transaction is committed in the end window call.
For all this to function as per the expectation the underlying database (or
table) to which we are writing must have transaction capabilities.
For example the insert statements should not be auto committed, they should
only be committed when a commit is fired in endWindow(). If a commit is not
fired and the connection is closed (or roll backed) then there should not
be any inserts in the table.
This is important for exactly once to work correctly. For example consider
a batch size of 10000 and if the operator/container is killed after
inserting 4000 rows then when the operator comes back again these 4000
redundant rows will be inserted again.

So to handle the above scenario we can document and make it user's
responsibility to give the table/database supporting transactions, for
example if the database is MySQL then it is user's responsibility to give
the table with storage engine as InnoDB.
Please let me know if you have any other solution confirming to SQL
standards.

Regards,
Hitesh Kapoor

Reply via email to