Re: Batch DML queries design discussion

Sergi Vladykin Thu, 08 Dec 2016 04:40:24 -0800

Vladimir,

I see no reason to forbid Streamer usage from non-batched statement
execution.
It is common that users already have their ETL tools and you can't be sure
if they use batching or not.


Alex,

I guess we have to decide on Streaming first and then we will discuss
Batching separately, ok? Because this decision may become important for
batching implementation.

Sergi

2016-12-08 15:31 GMT+03:00 Andrey Gura <ag...@apache.org>:

> Alex,
>
> In most cases JdbcQueryTask should be executed locally on client node
> started by JDBC driver.
>
> JdbcQueryTask.QueryResult res =
>     loc ? qryTask.call() :
> ignite.compute(ignite.cluster().forNodeId(nodeId)).call(qryTask);
>
> Is it valid behavior after introducing DML functionality?
>
> In cases when user wants to execute query on specific node he should
> fully understand what he wants and what can go in wrong way.
>
>
> On Thu, Dec 8, 2016 at 3:20 PM, Alexander Paschenko
> <alexander.a.pasche...@gmail.com> wrote:
> > Sergi,
> >
> > JDBC batching might work quite differently from driver to driver. Say,
> > MySQL happily rewrites queries as I had suggested in the beginning of
> > this thread (it's not the only strategy, but one of the possible
> > options) - and, BTW, would like to hear at least an opinion about it.
> >
> > On your first approach, section before streamer: you suggest that we
> > send single statement and multiple param sets as a single query task,
> > am I right? (Just to make sure that I got you properly.) If so, do you
> > also mean that API (namely JdbcQueryTask) between server and client
> > should also change? Or should new API means be added to facilitate
> > batching tasks?
> >
> > - Alex
> >
> > 2016-12-08 15:05 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com>:
> >> Guys,
> >>
> >> I discussed this feature with Dmitriy and we came to conclusion that
> >> batching in JDBC and Data Streaming in Ignite have different semantics
> and
> >> performance characteristics. Thus they are independent features (they
> may
> >> work together, may separately, but this is another story).
> >>
> >> Let me explain.
> >>
> >> This is how JDBC batching works:
> >> - Add N sets of parameters to a prepared statement.
> >> - Manually execute prepared statement.
> >> - Repeat until all the data is loaded.
> >>
> >>
> >> This is how data streamer works:
> >> - Keep adding data.
> >> - Streamer will buffer and load buffered per-node batches when they are
> big
> >> enough.
> >> - Close streamer to make sure that everything is loaded.
> >>
> >> As you can see we have a difference in semantics of when we send data:
> if
> >> in our JDBC we will allow sending batches to nodes without calling
> >> `execute` (and probably we will need to make `execute` to no-op here),
> then
> >> we are violating semantics of JDBC, if we will disallow this behavior,
> then
> >> this batching will underperform.
> >>
> >> Thus I suggest keeping these features (JDBC Batching and JDBC
> Streaming) as
> >> separate features.
> >>
> >> As I already said they can work together: Batching will batch parameters
> >> and on `execute` they will go to the Streamer in one shot and Streamer
> will
> >> deal with the rest.
> >>
> >> Sergi
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> 2016-12-08 14:16 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
> >>
> >>> Hi Alex,
> >>>
> >>> To my understanding there are two possible approaches to batching in
> JDBC
> >>> layer:
> >>>
> >>> 1) Rely on default batching API. Specifically
> >>> *PreparedStatement.addBatch()* [1]
> >>> and others. This is nice and clear API, users are used to it, and it's
> >>> adoption will minimize user code changes when migrating from other JDBC
> >>> sources. We simply copy updates locally and then execute them all at
> once
> >>> with only a single network hop to servers. *IgniteDataStreamer* can be
> used
> >>> underneath.
> >>>
> >>> 2) Or we can have separate connection flag which will move all
> >>> INSERT/UPDATE/DELETE statements through streamer.
> >>>
> >>> I prefer the first approach
> >>>
> >>> Also we need to keep in mind that data streamer has poor performance
> when
> >>> adding single key-value pairs due to high overhead on concurrency and
> other
> >>> bookkeeping. Instead, it is better to pre-batch key-value pairs before
> >>> giving them to streamer.
> >>>
> >>> Vladimir.
> >>>
> >>> [1]
> >>> https://docs.oracle.com/javase/8/docs/api/java/sql/
> PreparedStatement.html#
> >>> addBatch--
> >>>
> >>> On Thu, Dec 8, 2016 at 1:21 PM, Alexander Paschenko <
> >>> alexander.a.pasche...@gmail.com> wrote:
> >>>
> >>> > Hello Igniters,
> >>> >
> >>> > One of the major improvements to DML has to be support of batch
> >>> > statements. I'd like to discuss its implementation. The suggested
> >>> > approach is to rewrite given query turning it from few INSERTs into
> >>> > single statement and processing arguments accordingly. I suggest this
> >>> > as long as the whole point of batching is to make as little
> >>> > interactions with cluster as possible and to make operations as
> >>> > condensed as possible, and in case of Ignite it means that we should
> >>> > send as little JdbcQueryTasks as possible. And, as long as a query
> >>> > task holds single query and its arguments, this approach will not
> >>> > require any changes to be done to current design and won't break any
> >>> > backward compatibility - all dirty work on rewriting will be done by
> >>> > JDBC driver.
> >>> > Without rewriting, we could introduce some new query task for batch
> >>> > operations, but that would make impossible sending such requests from
> >>> > newer clients to older servers (say, servers of version 1.8.0, which
> >>> > does not know about batching, let alone older versions).
> >>> > I'd like to hear comments and suggestions from the community. Thanks!
> >>> >
> >>> > - Alex
> >>> >
> >>>
>

Re: Batch DML queries design discussion

Reply via email to