Re: IGNITE-2294 implementation details

Alexander Paschenko Tue, 26 Jul 2016 08:24:48 -0700

I see your point. But what about my concerns from initial post?
Particularly about signatures of existing methods? I personally don't
like an option of query() method always returning an empty iterator
for any non-select query, it seems ugly design wise.


- Alex

2016-07-26 18:15 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com>:
> BTW, the simplest way to solve this issue is to allow running SQL commands
> inside of SqlFieldsQuery.
>
> We may add some additional convenience API for updates if we want, but JDBC
> client will always call it like this:
>
> cache.query(new SqlFieldsQuery("INSERT INTO MY_TABLE
> VALUES(?,?)").setArgs(1,2));
>
> This will resolve any ambiguity.
>
> Sergi
>
> 2016-07-26 17:56 GMT+03:00 Sergi Vladykin <sergi.vlady...@gmail.com>:
>
>> I don't like any pre-parsing, especially with some libraries other than
>> H2. H2 itself has enough quirks to multiply it on quirks of another library.
>>
>> This is exactly what I was talking about - we need some single entry point
>> on API for all the SQL commands and queries. Thats why I suggested
>> SqlUpdate to extend Query. To me its is the cleanest approach. May be we
>> need to change in some backward compatible way this Query hierarchy to get
>> rid of extra methods but the idea is still the same.
>>
>> Sergi
>>
>> 2016-07-26 14:34 GMT+03:00 Alexander Paschenko <
>> alexander.a.pasche...@gmail.com>:
>>
>>> Guys,
>>>
>>> I would like to advance the discussion further. There's one quite
>>> important question that arose based on current state of work on this
>>> issue. If we use some kind of interactive console, like Visor, then
>>> how should it know whether SQL query it is requested to execute
>>> returns a result set or not? In JDBC world, solution is quite simple -
>>> there's base interface called Statement that all commands implement,
>>> and it has magic isResultSet method that tells whether statement is a
>>> query or an update command. The API proposed now has separate Query
>>> and Update operations which I believe to be a right thing by the
>>> reasons I outlined in the beginning of this thread. However, their
>>> lack of common ancestor prevents possible console clients from running
>>> text SQL commands in a fully transparent manner - like
>>> IgniteCache.execute(String sql). Therefore I see two possible ways of
>>> solving this:
>>>
>>> - we change API so that it includes new class or interface parenting
>>> both Query and Update, and clients use it to communicate with cache
>>> - we let (or make :) ) the client determine command type independently
>>> and behave accordingly - for it to work it will have some kind of
>>> command parsing by itself just to determine its type. Visor console
>>> may use simple library like JSqlParser
>>> (https://github.com/JSQLParser/JSqlParser; dual LGPL 2.1/ASF 2.0
>>> licensed) to determine request type in terms of JDBC, and behave
>>> accordingly.
>>>
>>> Personally, I think that the second approach is better - and here's why.
>>>
>>> First, it does not seem wise to change API simply to make console (or
>>> any other) clients simpler. Programmatic APIs should be concise and
>>> short for programmatic use, console clients should be easy to use from
>>> console - and that's it: after all, console client exists to free a
>>> user from burden of doing things programmatically, so its aim is to
>>> adapt API to console or whatever UI.
>>> Second, possible complications in client implied by such approach
>>> certainly won't be dramatic - I don't think that additional single
>>> query parsing operation in client code will make it much harder to
>>> develop.
>>> Third, as I see it now, adding a new "synthetic" entity and new method
>>> would take more effort to adapting the client to new API.
>>>
>>> Dmitry, Sergi, I would like to hear what you think about it all. Thanks.
>>>
>>> - Alex
>>>
>>> 2016-07-21 21:17 GMT+03:00 Dmitriy Setrakyan <dsetrak...@apache.org>:
>>> > OK, then using your analogy, the current behavior in Ignite is MERGE for
>>> > the most part.
>>> >
>>> > My preference is that Ignite SQL should work no different from
>>> traditional
>>> > databases, which means:
>>> >
>>> > - INSERT is translated into *putIfAbsent()* call in Ignite
>>> > - UPDATE is translated into *replace()* call in Ignite
>>> > - MERGE is translated into *put()* call in Ignite
>>> > - For SQL BATCH calls we should delegate to Ignite batch operations,
>>> e.g.
>>> > *putAll()*
>>> >
>>> > The above should hold true for atomic and transactional put/putAll
>>> calls,
>>> > as well as for the data streamer.
>>> >
>>> > Does this make sense?
>>> >
>>> > D.
>>> >
>>> > On Thu, Jul 21, 2016 at 4:06 AM, Sergi Vladykin <
>>> sergi.vlady...@gmail.com>
>>> > wrote:
>>> >
>>> >> No, this does not make sense.
>>> >>
>>> >> There is no upsert mode in databases. There are operations: INSERT,
>>> UPDATE,
>>> >> DELETE, MERGE.
>>> >>
>>> >> I want to have clear understanding of how they have to behave in SQL
>>> >> databases and how they will actually behave in Ignite in different
>>> >> scenarios. Also I want to have clear understanding of performance
>>> >> implications of each decision here.
>>> >>
>>> >> Anything wrong with that?
>>> >>
>>> >> Sergi
>>> >>
>>> >> On Thu, Jul 21, 2016 at 1:04 PM, Dmitriy Setrakyan <
>>> dsetrak...@apache.org>
>>> >> wrote:
>>> >>
>>> >> > Serj, are you asking what will happen as of today? Then the answer
>>> to all
>>> >> > your questions is that duplicate keys are not an issue, and Ignite
>>> always
>>> >> > operates in **upsert** mode (which is essentially a *“put(…)”
>>> *method).
>>> >> >
>>> >> > However, the *“insert”* that is suggested by Alex would delegate to
>>> >> > *“putIfAbsent(…)”*, which in database world makes more sense.
>>> However, in
>>> >> > this case, the *“update”* syntax should delegate to *“replace(…)”*,
>>> as
>>> >> > update should fail in case if a key is absent.
>>> >> >
>>> >> > Considering the above, a notion of “*upsert”* or “*merge” *operation
>>> is
>>> >> > very much needed, as it will give a user an option to perform
>>> >> > “insert-or-update” in 1 call.
>>> >> >
>>> >> > Does this make sense?
>>> >> >
>>> >> > D.
>>> >> >
>>> >> > On Wed, Jul 20, 2016 at 9:39 PM, Sergi Vladykin <
>>> >> sergi.vlady...@gmail.com>
>>> >> > wrote:
>>> >> >
>>> >> > > I'd prefer to do MERGE operation last because in H2 it is not
>>> standard
>>> >> > ANSI
>>> >> > > SQL MERGE. Or may be not implement it at all, or may be contribute
>>> ANSI
>>> >> > > correct version to H2, then implement it on Ignite. Need to
>>> investigate
>>> >> > the
>>> >> > > semantics deeper before making any decisions here.
>>> >> > >
>>> >> > > Lets start with simple scenarios for INSERT and go through all the
>>> >> > possible
>>> >> > > cases and answer the questions:
>>> >> > > - What will happen on key conflict in TX cache?
>>> >> > > - What will happen on key conflict in Atomic cache?
>>> >> > > - What will happen with the previous two if we use DataLoader?
>>> >> > > - How to make these operations efficient (it will be simple enough
>>> to
>>> >> > > implement them with separate put/putIfAbsent operations but
>>> probably we
>>> >> > > will need some batching like putAllIfAbsent for efficiency)?
>>> >> > >
>>> >> > > As for API, we still will need to have a single entry point for
>>> all SQL
>>> >> > > queries/commands to allow any console work with it transparently.
>>> It
>>> >> > would
>>> >> > > be great if we will be able to come up with something consistent
>>> with
>>> >> > this
>>> >> > > idea on public API.
>>> >> > >
>>> >> > > Sergi
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > On Wed, Jul 20, 2016 at 2:23 PM, Dmitriy Setrakyan <
>>> >> > > dsetrak...@gridgain.com>
>>> >> > > wrote:
>>> >> > >
>>> >> > > > Like the idea of merge and insert. I need more time to think
>>> about
>>> >> the
>>> >> > > API
>>> >> > > > changes.
>>> >> > > >
>>> >> > > > Sergi, what do you think?
>>> >> > > >
>>> >> > > > Dmitriy
>>> >> > > >
>>> >> > > >
>>> >> > > >
>>> >> > > > On Jul 20, 2016, at 12:36 PM, Alexander Paschenko <
>>> >> > > > alexander.a.pasche...@gmail.com> wrote:
>>> >> > > >
>>> >> > > > >> Thus, I suggest that we implement MERGE as a separate
>>> operation
>>> >> > backed
>>> >> > > > by putIfAbsent operation, while INSERT will be implemented via
>>> put.
>>> >> > > > >
>>> >> > > > > Sorry, of course I meant that MERGE has to be put-based, while
>>> >> INSERT
>>> >> > > > > has to be putIfAbsent-based.
>>> >> > > > >
>>> >> > > > > 2016-07-20 12:30 GMT+03:00 Alexander Paschenko
>>> >> > > > > <alexander.a.pasche...@gmail.com>:
>>> >> > > > >> Hell Igniters,
>>> >> > > > >>
>>> >> > > > >> In this thread I would like to share and discuss some
>>> thoughts on
>>> >> > DML
>>> >> > > > >> operations' implementation, so let's start and keep it here.
>>> >> > Everyone
>>> >> > > > >> is of course welcome to share their suggestions.
>>> >> > > > >>
>>> >> > > > >> For starters, I was thinking about semantics of INSERT. In
>>> >> > traditional
>>> >> > > > >> RDBMSs, INSERT works only for records whose primary keys don't
>>> >> > > > >> conflict with those of records that are already persistent -
>>> you
>>> >> > can't
>>> >> > > > >> try to insert the same key more than once because you'll get
>>> an
>>> >> > error.
>>> >> > > > >> However, semantics of cache put is obviously different - it
>>> does
>>> >> not
>>> >> > > > >> have anything about duplicate keys, it just quietly updates
>>> values
>>> >> > in
>>> >> > > > >> case of keys' duplication. Still, cache has putIfAbsent
>>> operation
>>> >> > that
>>> >> > > > >> is closer to traditional notion of INSERT, and H2's SQL
>>> dialect
>>> >> has
>>> >> > > > >> MERGE operation which corresponds to semantics of cache put.
>>> >> Thus, I
>>> >> > > > >> suggest that we implement MERGE as a separate operation
>>> backed by
>>> >> > > > >> putIfAbsent operation, while INSERT will be implemented via
>>> put.
>>> >> > > > >>
>>> >> > > > >> And one more, probably more important thing: I suggest that we
>>> >> > create
>>> >> > > > >> separate class Update and corresponding operation update() in
>>> >> > > > >> IgniteCache. The reasons are as follows:
>>> >> > > > >>
>>> >> > > > >> - Query bears some flags that are clearly redundant for Update
>>> >> (page
>>> >> > > > >> size, locality)
>>> >> > > > >> - query() method in IgniteCache (one that accepts Query) and
>>> >> query()
>>> >> > > > >> methods in GridQueryIndexing return iterators. So, if we
>>> strive to
>>> >> > > > >> leave interfaces unchanged, we still will introduce some
>>> design
>>> >> > > > >> ugliness like query methods returning empty iterators for
>>> certain
>>> >> > > > >> queries, and/or query flags that indicate whether it's an
>>> update
>>> >> > query
>>> >> > > > >> or not, etc.
>>> >> > > > >> - If some Queries are update queries, then continuous queries
>>> >> can't
>>> >> > be
>>> >> > > > >> based on them - more design-wise ugly checks and stuff like
>>> that.
>>> >> > > > >> - I'm pretty sure there's more I don't know about.
>>> >> > > > >>
>>> >> > > > >> Comments and suggestions are welcome. Sergi Vladykin, Dmitry
>>> >> > > > >> Setrakyan, your opinions are of particular interest, please
>>> >> advise.
>>> >> > > > >>
>>> >> > > > >> Regards,
>>> >> > > > >> Alex
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>>
>>
>>

Re: IGNITE-2294 implementation details

Reply via email to