Re: Batch DML queries design discussion

Andrey Gura Thu, 08 Dec 2016 04:32:22 -0800

Alex,

In most cases JdbcQueryTask should be executed locally on client node
started by JDBC driver.


JdbcQueryTask.QueryResult res =
    loc ? qryTask.call() :
ignite.compute(ignite.cluster().forNodeId(nodeId)).call(qryTask);

Is it valid behavior after introducing DML functionality?

In cases when user wants to execute query on specific node he should
fully understand what he wants and what can go in wrong way.


On Thu, Dec 8, 2016 at 3:20 PM, Alexander Paschenko
<alexander.a.pasche...@gmail.com> wrote:
> Sergi,
>
> JDBC batching might work quite differently from driver to driver. Say,
> MySQL happily rewrites queries as I had suggested in the beginning of
> this thread (it's not the only strategy, but one of the possible
> options) - and, BTW, would like to hear at least an opinion about it.
>
> On your first approach, section before streamer: you suggest that we
> send single statement and multiple param sets as a single query task,
> am I right? (Just to make sure that I got you properly.) If so, do you
> also mean that API (namely JdbcQueryTask) between server and client
> should also change? Or should new API means be added to facilitate
> batching tasks?
>
> - Alex
>
> 2016-12-08 15:05 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com>:
>> Guys,
>>
>> I discussed this feature with Dmitriy and we came to conclusion that
>> batching in JDBC and Data Streaming in Ignite have different semantics and
>> performance characteristics. Thus they are independent features (they may
>> work together, may separately, but this is another story).
>>
>> Let me explain.
>>
>> This is how JDBC batching works:
>> - Add N sets of parameters to a prepared statement.
>> - Manually execute prepared statement.
>> - Repeat until all the data is loaded.
>>
>>
>> This is how data streamer works:
>> - Keep adding data.
>> - Streamer will buffer and load buffered per-node batches when they are big
>> enough.
>> - Close streamer to make sure that everything is loaded.
>>
>> As you can see we have a difference in semantics of when we send data: if
>> in our JDBC we will allow sending batches to nodes without calling
>> `execute` (and probably we will need to make `execute` to no-op here), then
>> we are violating semantics of JDBC, if we will disallow this behavior, then
>> this batching will underperform.
>>
>> Thus I suggest keeping these features (JDBC Batching and JDBC Streaming) as
>> separate features.
>>
>> As I already said they can work together: Batching will batch parameters
>> and on `execute` they will go to the Streamer in one shot and Streamer will
>> deal with the rest.
>>
>> Sergi
>>
>>
>>
>>
>>
>>
>>
>> 2016-12-08 14:16 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
>>
>>> Hi Alex,
>>>
>>> To my understanding there are two possible approaches to batching in JDBC
>>> layer:
>>>
>>> 1) Rely on default batching API. Specifically
>>> *PreparedStatement.addBatch()* [1]
>>> and others. This is nice and clear API, users are used to it, and it's
>>> adoption will minimize user code changes when migrating from other JDBC
>>> sources. We simply copy updates locally and then execute them all at once
>>> with only a single network hop to servers. *IgniteDataStreamer* can be used
>>> underneath.
>>>
>>> 2) Or we can have separate connection flag which will move all
>>> INSERT/UPDATE/DELETE statements through streamer.
>>>
>>> I prefer the first approach
>>>
>>> Also we need to keep in mind that data streamer has poor performance when
>>> adding single key-value pairs due to high overhead on concurrency and other
>>> bookkeeping. Instead, it is better to pre-batch key-value pairs before
>>> giving them to streamer.
>>>
>>> Vladimir.
>>>
>>> [1]
>>> https://docs.oracle.com/javase/8/docs/api/java/sql/PreparedStatement.html#
>>> addBatch--
>>>
>>> On Thu, Dec 8, 2016 at 1:21 PM, Alexander Paschenko <
>>> alexander.a.pasche...@gmail.com> wrote:
>>>
>>> > Hello Igniters,
>>> >
>>> > One of the major improvements to DML has to be support of batch
>>> > statements. I'd like to discuss its implementation. The suggested
>>> > approach is to rewrite given query turning it from few INSERTs into
>>> > single statement and processing arguments accordingly. I suggest this
>>> > as long as the whole point of batching is to make as little
>>> > interactions with cluster as possible and to make operations as
>>> > condensed as possible, and in case of Ignite it means that we should
>>> > send as little JdbcQueryTasks as possible. And, as long as a query
>>> > task holds single query and its arguments, this approach will not
>>> > require any changes to be done to current design and won't break any
>>> > backward compatibility - all dirty work on rewriting will be done by
>>> > JDBC driver.
>>> > Without rewriting, we could introduce some new query task for batch
>>> > operations, but that would make impossible sending such requests from
>>> > newer clients to older servers (say, servers of version 1.8.0, which
>>> > does not know about batching, let alone older versions).
>>> > I'd like to hear comments and suggestions from the community. Thanks!
>>> >
>>> > - Alex
>>> >
>>>

Re: Batch DML queries design discussion

Reply via email to