Re: Batch DML queries design discussion

Sergi Vladykin Thu, 08 Dec 2016 04:06:33 -0800

Guys,

I discussed this feature with Dmitriy and we came to conclusion that
batching in JDBC and Data Streaming in Ignite have different semantics and
performance characteristics. Thus they are independent features (they may
work together, may separately, but this is another story).


Let me explain.

This is how JDBC batching works:
- Add N sets of parameters to a prepared statement.
- Manually execute prepared statement.
- Repeat until all the data is loaded.


This is how data streamer works:
- Keep adding data.
- Streamer will buffer and load buffered per-node batches when they are big
enough.
- Close streamer to make sure that everything is loaded.

As you can see we have a difference in semantics of when we send data: if
in our JDBC we will allow sending batches to nodes without calling
`execute` (and probably we will need to make `execute` to no-op here), then
we are violating semantics of JDBC, if we will disallow this behavior, then
this batching will underperform.

Thus I suggest keeping these features (JDBC Batching and JDBC Streaming) as
separate features.

As I already said they can work together: Batching will batch parameters
and on `execute` they will go to the Streamer in one shot and Streamer will
deal with the rest.

Sergi







2016-12-08 14:16 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:

> Hi Alex,
>
> To my understanding there are two possible approaches to batching in JDBC
> layer:
>
> 1) Rely on default batching API. Specifically
> *PreparedStatement.addBatch()* [1]
> and others. This is nice and clear API, users are used to it, and it's
> adoption will minimize user code changes when migrating from other JDBC
> sources. We simply copy updates locally and then execute them all at once
> with only a single network hop to servers. *IgniteDataStreamer* can be used
> underneath.
>
> 2) Or we can have separate connection flag which will move all
> INSERT/UPDATE/DELETE statements through streamer.
>
> I prefer the first approach
>
> Also we need to keep in mind that data streamer has poor performance when
> adding single key-value pairs due to high overhead on concurrency and other
> bookkeeping. Instead, it is better to pre-batch key-value pairs before
> giving them to streamer.
>
> Vladimir.
>
> [1]
> https://docs.oracle.com/javase/8/docs/api/java/sql/PreparedStatement.html#
> addBatch--
>
> On Thu, Dec 8, 2016 at 1:21 PM, Alexander Paschenko <
> alexander.a.pasche...@gmail.com> wrote:
>
> > Hello Igniters,
> >
> > One of the major improvements to DML has to be support of batch
> > statements. I'd like to discuss its implementation. The suggested
> > approach is to rewrite given query turning it from few INSERTs into
> > single statement and processing arguments accordingly. I suggest this
> > as long as the whole point of batching is to make as little
> > interactions with cluster as possible and to make operations as
> > condensed as possible, and in case of Ignite it means that we should
> > send as little JdbcQueryTasks as possible. And, as long as a query
> > task holds single query and its arguments, this approach will not
> > require any changes to be done to current design and won't break any
> > backward compatibility - all dirty work on rewriting will be done by
> > JDBC driver.
> > Without rewriting, we could introduce some new query task for batch
> > operations, but that would make impossible sending such requests from
> > newer clients to older servers (say, servers of version 1.8.0, which
> > does not know about batching, let alone older versions).
> > I'd like to hear comments and suggestions from the community. Thanks!
> >
> > - Alex
> >
>

Re: Batch DML queries design discussion

Reply via email to