Re: Batch DML queries design discussion

Vladimir Ozerov Thu, 08 Dec 2016 04:28:41 -0800

If we are bothered with performance and JDBC rules violation, then we can
easily do the following:


1) Add boolean flag "*batch_streaming*" to JDBC string.
2) If it is "*false*" (default) - we copy all updates locally and flush
them only on "*executeBatch*" call. This way JDBC semantics is preserved.
3) If it is "*true*", all adds to batch goes to streamer directly. This way
it might be faster, but violates JDBC. E.g. call to "*clearBatch*" doesn't
work anymore and we should throw an exception.

Bottom line is that normal non-batched operations should never go through
streamer. Streamer is only involved when:
a) user explicitly declared that he performs batch update
b) special flag in connection string is set.

Vladimir.

On Thu, Dec 8, 2016 at 3:20 PM, Alexander Paschenko <
alexander.a.pasche...@gmail.com> wrote:

> Sergi,
>
> JDBC batching might work quite differently from driver to driver. Say,
> MySQL happily rewrites queries as I had suggested in the beginning of
> this thread (it's not the only strategy, but one of the possible
> options) - and, BTW, would like to hear at least an opinion about it.
>
> On your first approach, section before streamer: you suggest that we
> send single statement and multiple param sets as a single query task,
> am I right? (Just to make sure that I got you properly.) If so, do you
> also mean that API (namely JdbcQueryTask) between server and client
> should also change? Or should new API means be added to facilitate
> batching tasks?
>
> - Alex
>
> 2016-12-08 15:05 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com>:
> > Guys,
> >
> > I discussed this feature with Dmitriy and we came to conclusion that
> > batching in JDBC and Data Streaming in Ignite have different semantics
> and
> > performance characteristics. Thus they are independent features (they may
> > work together, may separately, but this is another story).
> >
> > Let me explain.
> >
> > This is how JDBC batching works:
> > - Add N sets of parameters to a prepared statement.
> > - Manually execute prepared statement.
> > - Repeat until all the data is loaded.
> >
> >
> > This is how data streamer works:
> > - Keep adding data.
> > - Streamer will buffer and load buffered per-node batches when they are
> big
> > enough.
> > - Close streamer to make sure that everything is loaded.
> >
> > As you can see we have a difference in semantics of when we send data: if
> > in our JDBC we will allow sending batches to nodes without calling
> > `execute` (and probably we will need to make `execute` to no-op here),
> then
> > we are violating semantics of JDBC, if we will disallow this behavior,
> then
> > this batching will underperform.
> >
> > Thus I suggest keeping these features (JDBC Batching and JDBC Streaming)
> as
> > separate features.
> >
> > As I already said they can work together: Batching will batch parameters
> > and on `execute` they will go to the Streamer in one shot and Streamer
> will
> > deal with the rest.
> >
> > Sergi
> >
> >
> >
> >
> >
> >
> >
> > 2016-12-08 14:16 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> >
> >> Hi Alex,
> >>
> >> To my understanding there are two possible approaches to batching in
> JDBC
> >> layer:
> >>
> >> 1) Rely on default batching API. Specifically
> >> *PreparedStatement.addBatch()* [1]
> >> and others. This is nice and clear API, users are used to it, and it's
> >> adoption will minimize user code changes when migrating from other JDBC
> >> sources. We simply copy updates locally and then execute them all at
> once
> >> with only a single network hop to servers. *IgniteDataStreamer* can be
> used
> >> underneath.
> >>
> >> 2) Or we can have separate connection flag which will move all
> >> INSERT/UPDATE/DELETE statements through streamer.
> >>
> >> I prefer the first approach
> >>
> >> Also we need to keep in mind that data streamer has poor performance
> when
> >> adding single key-value pairs due to high overhead on concurrency and
> other
> >> bookkeeping. Instead, it is better to pre-batch key-value pairs before
> >> giving them to streamer.
> >>
> >> Vladimir.
> >>
> >> [1]
> >> https://docs.oracle.com/javase/8/docs/api/java/sql/
> PreparedStatement.html#
> >> addBatch--
> >>
> >> On Thu, Dec 8, 2016 at 1:21 PM, Alexander Paschenko <
> >> alexander.a.pasche...@gmail.com> wrote:
> >>
> >> > Hello Igniters,
> >> >
> >> > One of the major improvements to DML has to be support of batch
> >> > statements. I'd like to discuss its implementation. The suggested
> >> > approach is to rewrite given query turning it from few INSERTs into
> >> > single statement and processing arguments accordingly. I suggest this
> >> > as long as the whole point of batching is to make as little
> >> > interactions with cluster as possible and to make operations as
> >> > condensed as possible, and in case of Ignite it means that we should
> >> > send as little JdbcQueryTasks as possible. And, as long as a query
> >> > task holds single query and its arguments, this approach will not
> >> > require any changes to be done to current design and won't break any
> >> > backward compatibility - all dirty work on rewriting will be done by
> >> > JDBC driver.
> >> > Without rewriting, we could introduce some new query task for batch
> >> > operations, but that would make impossible sending such requests from
> >> > newer clients to older servers (say, servers of version 1.8.0, which
> >> > does not know about batching, let alone older versions).
> >> > I'd like to hear comments and suggestions from the community. Thanks!
> >> >
> >> > - Alex
> >> >
> >>
>

Re: Batch DML queries design discussion

Reply via email to