Vladimir, I see no reason to forbid Streamer usage from non-batched statement execution. It is common that users already have their ETL tools and you can't be sure if they use batching or not.
Alex, I guess we have to decide on Streaming first and then we will discuss Batching separately, ok? Because this decision may become important for batching implementation. Sergi 2016-12-08 15:31 GMT+03:00 Andrey Gura <ag...@apache.org>: > Alex, > > In most cases JdbcQueryTask should be executed locally on client node > started by JDBC driver. > > JdbcQueryTask.QueryResult res = > loc ? qryTask.call() : > ignite.compute(ignite.cluster().forNodeId(nodeId)).call(qryTask); > > Is it valid behavior after introducing DML functionality? > > In cases when user wants to execute query on specific node he should > fully understand what he wants and what can go in wrong way. > > > On Thu, Dec 8, 2016 at 3:20 PM, Alexander Paschenko > <alexander.a.pasche...@gmail.com> wrote: > > Sergi, > > > > JDBC batching might work quite differently from driver to driver. Say, > > MySQL happily rewrites queries as I had suggested in the beginning of > > this thread (it's not the only strategy, but one of the possible > > options) - and, BTW, would like to hear at least an opinion about it. > > > > On your first approach, section before streamer: you suggest that we > > send single statement and multiple param sets as a single query task, > > am I right? (Just to make sure that I got you properly.) If so, do you > > also mean that API (namely JdbcQueryTask) between server and client > > should also change? Or should new API means be added to facilitate > > batching tasks? > > > > - Alex > > > > 2016-12-08 15:05 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com>: > >> Guys, > >> > >> I discussed this feature with Dmitriy and we came to conclusion that > >> batching in JDBC and Data Streaming in Ignite have different semantics > and > >> performance characteristics. Thus they are independent features (they > may > >> work together, may separately, but this is another story). > >> > >> Let me explain. > >> > >> This is how JDBC batching works: > >> - Add N sets of parameters to a prepared statement. > >> - Manually execute prepared statement. > >> - Repeat until all the data is loaded. > >> > >> > >> This is how data streamer works: > >> - Keep adding data. > >> - Streamer will buffer and load buffered per-node batches when they are > big > >> enough. > >> - Close streamer to make sure that everything is loaded. > >> > >> As you can see we have a difference in semantics of when we send data: > if > >> in our JDBC we will allow sending batches to nodes without calling > >> `execute` (and probably we will need to make `execute` to no-op here), > then > >> we are violating semantics of JDBC, if we will disallow this behavior, > then > >> this batching will underperform. > >> > >> Thus I suggest keeping these features (JDBC Batching and JDBC > Streaming) as > >> separate features. > >> > >> As I already said they can work together: Batching will batch parameters > >> and on `execute` they will go to the Streamer in one shot and Streamer > will > >> deal with the rest. > >> > >> Sergi > >> > >> > >> > >> > >> > >> > >> > >> 2016-12-08 14:16 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > >> > >>> Hi Alex, > >>> > >>> To my understanding there are two possible approaches to batching in > JDBC > >>> layer: > >>> > >>> 1) Rely on default batching API. Specifically > >>> *PreparedStatement.addBatch()* [1] > >>> and others. This is nice and clear API, users are used to it, and it's > >>> adoption will minimize user code changes when migrating from other JDBC > >>> sources. We simply copy updates locally and then execute them all at > once > >>> with only a single network hop to servers. *IgniteDataStreamer* can be > used > >>> underneath. > >>> > >>> 2) Or we can have separate connection flag which will move all > >>> INSERT/UPDATE/DELETE statements through streamer. > >>> > >>> I prefer the first approach > >>> > >>> Also we need to keep in mind that data streamer has poor performance > when > >>> adding single key-value pairs due to high overhead on concurrency and > other > >>> bookkeeping. Instead, it is better to pre-batch key-value pairs before > >>> giving them to streamer. > >>> > >>> Vladimir. > >>> > >>> [1] > >>> https://docs.oracle.com/javase/8/docs/api/java/sql/ > PreparedStatement.html# > >>> addBatch-- > >>> > >>> On Thu, Dec 8, 2016 at 1:21 PM, Alexander Paschenko < > >>> alexander.a.pasche...@gmail.com> wrote: > >>> > >>> > Hello Igniters, > >>> > > >>> > One of the major improvements to DML has to be support of batch > >>> > statements. I'd like to discuss its implementation. The suggested > >>> > approach is to rewrite given query turning it from few INSERTs into > >>> > single statement and processing arguments accordingly. I suggest this > >>> > as long as the whole point of batching is to make as little > >>> > interactions with cluster as possible and to make operations as > >>> > condensed as possible, and in case of Ignite it means that we should > >>> > send as little JdbcQueryTasks as possible. And, as long as a query > >>> > task holds single query and its arguments, this approach will not > >>> > require any changes to be done to current design and won't break any > >>> > backward compatibility - all dirty work on rewriting will be done by > >>> > JDBC driver. > >>> > Without rewriting, we could introduce some new query task for batch > >>> > operations, but that would make impossible sending such requests from > >>> > newer clients to older servers (say, servers of version 1.8.0, which > >>> > does not know about batching, let alone older versions). > >>> > I'd like to hear comments and suggestions from the community. Thanks! > >>> > > >>> > - Alex > >>> > > >>> >