Re: DML data streaming

Alexander Paschenko Fri, 10 Feb 2017 00:49:52 -0800

Dima,
>
> There are several ways to handle it. I would check how other databases
> handle it, maybe we can borrow something. To the least, we should log such
> errors in the log for now.
>

Logging errors would mean introducing some kind of stream receiver to
do that and thus that would be really the same performance penalty for
the successful operations. I think we should go with that optional
flag for semantics after all.

> You don't have to use _key. Primary key is usually a field in the class, so
> you can use a normal column name. In any case, we should remove any usage
> of _key before 2.0 is released.
>
> Again, if user does not have to specify _key on INSERT, then it is very
> unclear to me, why user would need to specify _key for UPDATE or DELETE.
> Something smells here. Can you please provide an example?
>

UPDATE and DELETE _in streaming mode_ are carried _only_ for "fast"
optimized cases - i.e. those where _key (and possibly _val) are
explicitly specified by the user thus allowing us to map UPDATE and
DELETE directly to cache's replace and remove operations without
messing with entry processors and doing map-reduce SELECT by given
criteria.

Say, we have Person { firstName, secondName } with key class Key { id1, id2 }

If I say DELETE from Person WHERE _key = ? and specify arg via JDBC,
there's no need to do any SELECT - we can just call IgniteCache.remove
on that key.

But if I say DELETE from Person WHERE id1 = 5 then there's no way to
avoid MR - we have to find all keys that interest us first by doing
SELECT as long as we know only partly about what keys the user wants
to be affected.

It works in the same way for UPDATE. And I hope that it's clear how
it's different from INSERT - there's no MR by definition (we don't
allow INSERT FROM SELECT in streaming mode).

AGAIN: this all is said only about streaming mode; non streaming mode
does those optimizations too, but it also allows complex conditions,
while streaming mode does not allow them to keep things fast and avoid
MR.

That's the reason why I suggest that we drop UPDATE and DELETE from
DML streaming as they mean messing with those soon-hidden columns.

Still we could optimize stuff like DELETE from Person WHERE id1 = 5
AND id2 = 6 - query involves ALL fields of key AND compares only for
equality AND has no complex expressions - we can construct key
unambiguously and still call remove directly.

But to me it does not sound like a really great reason to leave UPDATE
and DELETE in DML - the users will have to write some specific queries
to use that while all other stuff will just be declined in that mode.
And, as I said before, UPDATE and DELETE don't probably perfectly fit
with primary data streamer use cases - after all, modifying existing
stuff is not what data streamer is about.

And regarding hiding columns: it's unclear how things will look like
for caches like <int, int> when we remove _key and _val as long as
tables for such cases currently have nothing but those two columns.

- Alex

>> 8 февр. 2017 г. 11:33 PM пользователь "Dmitriy Setrakyan" <
>> dsetrak...@apache.org> написал:
>>
>> > Alexander,
>> >
>> > Are you suggesting that currently to execute a simple INSERT for 1 row we
>> > invoke a data streamer on Ignite API? How about an update by a primary
>> key?
>> > Why not execute a simple cache put in either case?
>> >
>> > I think we had a separate thread where we agreed that the streamer should
>> > only be turned on if a certain flag on a JDBC connection is set, no?
>> >
>> > D.
>> >
>> > On Wed, Feb 8, 2017 at 7:00 AM, Alexander Paschenko <
>> > alexander.a.pasche...@gmail.com> wrote:
>> >
>> > > Hello Igniters,
>> > >
>> > > I'd like to raise few questions regarding data streaming via DML
>> > > statements.
>> > >
>> > > Currently, all types of DML statements are supported (INSERT, UPDATE,
>> > > DELETE, MERGE).
>> > >
>> > > UPDATE and DELETE are supported in streaming mode only when their
>> > > WHERE condition is bounded with _key and/or _val columns, and UPDATE
>> > > works only for _val column directly.
>> > >
>> > > Seeing some activity in direction of hiding _key and _val from the
>> > > user as far as possible, these features seem pointless and should not
>> > > be released, what do you think?
>> > >
>> > > Also INSERT in streaming mode currently does not throw errors on
>> > > duplicate keys and silently ignores such new records (as long as it's
>> > > faster than it would work if we'd introduced receiver that would throw
>> > > exceptions) - this can be fixed with additional flag that could
>> > > _optionally_ make INSERT slower but more accurate in semantic.
>> > >
>> > > And MERGE in streaming mode currently not totally accurate in
>> > > semantic, too - on key presence, it will just replace whole value with
>> > > new one thus potentially making values of some concrete columns/fields
>> > > lost - this is analogous to
>> > > https://issues.apache.org/jira/browse/IGNITE-4489, but hardly can be
>> > > fixed as long as probably it would hit performance and would be
>> > > unresonably complex to implement.
>> > >
>> > > I suggest that we drop all except INSERT and introduce optional flag
>> > > for its totally correct semantic behavior as described above.
>> > >
>> > > - Alex
>> > >
>> >
>>

Re: DML data streaming

Reply via email to