Hello everyone
Lukas and I started a discussion on the manual pages which Lukas proposed 
to move over to the user group instead. The issue was: what is the perfect 
pattern with JOOQ for a DB batch programm that is supposed to process a 
significant amount of data - let's say: 1 million rows per night. I added a 
real-world usecase at the bottom of this post. We assume that there are 
good reasons why the business logic for this batch operation cannot be 
ported to a stored procedure, so it must run in Java.
Due to the large amount of data, you would like to select from an open DB 
cursor (ResultSet in JDBC) without the need to produce intermediate lists 
in memory which make your JVM run out of memory. This is perfectly 
addressed by JOOQ with something like that:

create.selectFrom(EMAILCONTACT)....stream()
>

Now a got all the source data provided as a stream. Very nice.
For the inserts and updates we would need batched prepared statements for 
maximum writing speed. Our Oracle DB makes 3.000 inserts per second with 
batch prepared inserts, so we are fine with 1 million records per night. 
And here comes the difficult part: how can I feed multiple batched prepared 
statements from one source stream. JOOQ provides a few different features 
for batched prepared statements (batchStore, batch, Loader API) but it 
seems that there is a missing link to elegantly build a kind of processing 
pipeline from the single source and the various outputs. Java streams 
cannot be "plit" somehow, so there is no straight-forward solution. Lukas 
brought the term "dispatching" into the discussion. Something like that 
might be the link.

Any suggestions?

Regards,
Jan



The real-world usecase
An external e-mail fullfillment system provides us a DB table about which 
customers got send an e-mail for advertisment campaigns. Sometimes a few 
thousand per day, sometimes a few million, depending on the marketing 
department's actiity. A second table holds information about e-mail 
bounces. Dependening on this information, our system has to write entries 
in the corresponding customers' contact history as an input for the service 
center application. If the e-mail bounced, the history entry has different 
type and content. Additionally the e-mail bounce leads to an update of the 
customer's e-mail account status.
So we have a single source, consisting of

   - The e-mail information records
   - joined with the e-mail bounce information if present (i.e. outer join) 
   and
   - joined with the customer's e-mail account record (inner join). In case 
   the e-mail status must be changed, we don't want to select the e-mail 
   account separately. So the join is reasonable

The results of the process are

   - inserts into a customer contact history table (nearly the same amount 
   as provided e-mail information records) and
   - updates for e-mail accounts (an unpredictable sub-amount but still 
   lots of)


-- 
You received this message because you are subscribed to the Google Groups "jOOQ 
User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to