If we are bothered with performance and JDBC rules violation, then we can easily do the following:
1) Add boolean flag "*batch_streaming*" to JDBC string. 2) If it is "*false*" (default) - we copy all updates locally and flush them only on "*executeBatch*" call. This way JDBC semantics is preserved. 3) If it is "*true*", all adds to batch goes to streamer directly. This way it might be faster, but violates JDBC. E.g. call to "*clearBatch*" doesn't work anymore and we should throw an exception. Bottom line is that normal non-batched operations should never go through streamer. Streamer is only involved when: a) user explicitly declared that he performs batch update b) special flag in connection string is set. Vladimir. On Thu, Dec 8, 2016 at 3:20 PM, Alexander Paschenko < alexander.a.pasche...@gmail.com> wrote: > Sergi, > > JDBC batching might work quite differently from driver to driver. Say, > MySQL happily rewrites queries as I had suggested in the beginning of > this thread (it's not the only strategy, but one of the possible > options) - and, BTW, would like to hear at least an opinion about it. > > On your first approach, section before streamer: you suggest that we > send single statement and multiple param sets as a single query task, > am I right? (Just to make sure that I got you properly.) If so, do you > also mean that API (namely JdbcQueryTask) between server and client > should also change? Or should new API means be added to facilitate > batching tasks? > > - Alex > > 2016-12-08 15:05 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com>: > > Guys, > > > > I discussed this feature with Dmitriy and we came to conclusion that > > batching in JDBC and Data Streaming in Ignite have different semantics > and > > performance characteristics. Thus they are independent features (they may > > work together, may separately, but this is another story). > > > > Let me explain. > > > > This is how JDBC batching works: > > - Add N sets of parameters to a prepared statement. > > - Manually execute prepared statement. > > - Repeat until all the data is loaded. > > > > > > This is how data streamer works: > > - Keep adding data. > > - Streamer will buffer and load buffered per-node batches when they are > big > > enough. > > - Close streamer to make sure that everything is loaded. > > > > As you can see we have a difference in semantics of when we send data: if > > in our JDBC we will allow sending batches to nodes without calling > > `execute` (and probably we will need to make `execute` to no-op here), > then > > we are violating semantics of JDBC, if we will disallow this behavior, > then > > this batching will underperform. > > > > Thus I suggest keeping these features (JDBC Batching and JDBC Streaming) > as > > separate features. > > > > As I already said they can work together: Batching will batch parameters > > and on `execute` they will go to the Streamer in one shot and Streamer > will > > deal with the rest. > > > > Sergi > > > > > > > > > > > > > > > > 2016-12-08 14:16 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > > > >> Hi Alex, > >> > >> To my understanding there are two possible approaches to batching in > JDBC > >> layer: > >> > >> 1) Rely on default batching API. Specifically > >> *PreparedStatement.addBatch()* [1] > >> and others. This is nice and clear API, users are used to it, and it's > >> adoption will minimize user code changes when migrating from other JDBC > >> sources. We simply copy updates locally and then execute them all at > once > >> with only a single network hop to servers. *IgniteDataStreamer* can be > used > >> underneath. > >> > >> 2) Or we can have separate connection flag which will move all > >> INSERT/UPDATE/DELETE statements through streamer. > >> > >> I prefer the first approach > >> > >> Also we need to keep in mind that data streamer has poor performance > when > >> adding single key-value pairs due to high overhead on concurrency and > other > >> bookkeeping. Instead, it is better to pre-batch key-value pairs before > >> giving them to streamer. > >> > >> Vladimir. > >> > >> [1] > >> https://docs.oracle.com/javase/8/docs/api/java/sql/ > PreparedStatement.html# > >> addBatch-- > >> > >> On Thu, Dec 8, 2016 at 1:21 PM, Alexander Paschenko < > >> alexander.a.pasche...@gmail.com> wrote: > >> > >> > Hello Igniters, > >> > > >> > One of the major improvements to DML has to be support of batch > >> > statements. I'd like to discuss its implementation. The suggested > >> > approach is to rewrite given query turning it from few INSERTs into > >> > single statement and processing arguments accordingly. I suggest this > >> > as long as the whole point of batching is to make as little > >> > interactions with cluster as possible and to make operations as > >> > condensed as possible, and in case of Ignite it means that we should > >> > send as little JdbcQueryTasks as possible. And, as long as a query > >> > task holds single query and its arguments, this approach will not > >> > require any changes to be done to current design and won't break any > >> > backward compatibility - all dirty work on rewriting will be done by > >> > JDBC driver. > >> > Without rewriting, we could introduce some new query task for batch > >> > operations, but that would make impossible sending such requests from > >> > newer clients to older servers (say, servers of version 1.8.0, which > >> > does not know about batching, let alone older versions). > >> > I'd like to hear comments and suggestions from the community. Thanks! > >> > > >> > - Alex > >> > > >> >