Re: DDL implementation details

Alexander Paschenko Thu, 12 Jan 2017 11:41:01 -0800

BTW, I have also did some H2 guts inspection, and, as I see it now,
it's not impossible to do custom parsing without (an awful lot) of
ugliness. What we would have to do is basically spoof private
singleton org.h2.engine.Engine#INSTANCE on node start via reflection
with our custom implementation that could invoke custom parsing when
needed.


Not that we need it now - just in case.

- Alex

2017-01-13 2:16 GMT+08:00 Alexander Paschenko <alexander.a.pasche...@gmail.com>:
> Sergi,
>
> OK, great. Still, what's up with CREATE TABLE? After a bit of code
> digging, I currently don't see major obstacles against registering
> query entities (i.e. type descriptors) on the fly - CREATE TABLE will
> essentially boil down to *GridQueryIndexing#registerType* call.
>
> But, as you have justly noted, we have to keep nodes joining the
> cluster up-to-date about what schemas and tables need to be created in
> order for those nodes to participate in distributed queries. And,
> correct me if I'm wrong, but this is relevant even outside of context
> of 2.0 and page memory and persistent stores, amirite?
>
> - Alex
>
> 2017-01-13 1:47 GMT+08:00 Sergi Vladykin <sergi.vlady...@gmail.com>:
>> The xml config was only for example. We can put in this configuration
>> string cache config parameters directly like this:
>>
>> CREATE SCHEMA "MyCacheName" WITH
>> "cacheMode=REPLICATED;atomicityMode=ATOMIC"
>>
>> Sergi
>>
>> 2017-01-12 20:21 GMT+03:00 Alexander Paschenko <
>> alexander.a.pasche...@gmail.com>:
>>
>>> Sergi, Alexey G.,
>>>
>>> I see your point and am rather inclined to agree that we should let
>>> current notion of "single schema - multiple tables" live.
>>>
>>> Still, if we create schema with cache config file, what's the whole
>>> point of SQL then if the user anyway has to write XML? This probably
>>> could be useful to propagate configuration to all cluster nodes tho.
>>>
>>> And if we skip CREATE TABLE now, it means that we leave user facing
>>> the need to write XML configuration, no other options. Is this what we
>>> want?
>>>
>>> Still I must admit that leaving user with his familiar XML stuff looks
>>> attractive - no messing with bunch of unknown new params, just write
>>> your XML and go. Also it's portable and allows to re-use
>>> configurations easily, so undoubtedly is a good approach from some
>>> point.
>>>
>>> - Alex
>>>
>>> 2017-01-12 23:51 GMT+08:00 Sergi Vladykin <sergi.vlady...@gmail.com>:
>>> > Hi,
>>> >
>>> > 1. For now I'm against inventing any custom SQL syntax and implementing
>>> > parsing.
>>> > Currently H2 supports the following syntax:
>>> >
>>> > CREATE TABLE test(...) WITH "myCustomParamString"
>>> >
>>> > This is enough for us to pass the needed parameters.
>>> >
>>> > 2. Agree with AG, we have to separate cache creation from table creation.
>>> > Cache == SQL schema for us. We just have to add the same WITH syntax in
>>> H2
>>> > for schema creation like this:
>>> >
>>> > CREATE SCHEMA "MyCacheName" WITH "cacheConfig=myCache.xml"
>>> >
>>> > 3. If we want to create tables then I suggest to put this functionality
>>> to
>>> > 2.0+PageMemory right away and think where and how we are going to store
>>> all
>>> > the related metadata.This is especially important for persistent
>>> storages.
>>> >
>>> > Sergi
>>> >
>>> >
>>> > 2017-01-12 16:56 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:
>>> >
>>> >> I am afraid in this case user will have to define too much schemes -
>>> >> boilerplate.
>>> >> Does it make sense at all to pack multiple tuples into a single cache
>>> from
>>> >> user perspective?
>>> >>
>>> >> On Thu, Jan 12, 2017 at 4:40 PM, Alexey Goncharuk <
>>> >> alexey.goncha...@gmail.com> wrote:
>>> >>
>>> >> > Alexander,
>>> >> >
>>> >> > Will we keep the old option to have multiple tables in one cache? If
>>> so,
>>> >> > how will create table statement know which cache to choose?
>>> >> >
>>> >> > It seems to me that to be consistent with the current DML
>>> implementation
>>> >> we
>>> >> > should have a CREATE SCHEMA statement which will define the cache and
>>> >> cache
>>> >> > configuration, and CREATE TABLE should specify the schema name.
>>> >> >
>>> >> > Otherwise, we should enforce the single type per cache rule at the
>>> >> > configuration level and in runtime.
>>> >> >
>>> >> > As for affinity and primary key - agree with Vladimir.
>>> >> >
>>> >> > --
>>> >> > AG
>>> >> >
>>> >> > 2017-01-12 11:41 GMT+03:00 Sergey Kozlov <skoz...@gridgain.com>:
>>> >> >
>>> >> > > As first stage of DDL we can implement following CREATE TABLE
>>> statement
>>> >> > > support:
>>> >> > >  - CREATE TABLE without cache properties (use default cache
>>> properties
>>> >> or
>>> >> > > cache properties defined in SQL Schema)
>>> >> > >  - CREATE TABLE .. LIKE where we can create a cache based on an
>>> another
>>> >> > > existing cache.
>>> >> > >
>>> >> > > On Thu, Jan 12, 2017 at 5:54 AM, Dmitriy Setrakyan <
>>> >> > dsetrak...@apache.org>
>>> >> > > wrote:
>>> >> > >
>>> >> > > > Agree with Sergey. We should be able to specify cache properties
>>> >> inside
>>> >> > > of
>>> >> > > > SQL statements. Does H2 have any support to process SQL hints?
>>> Can we
>>> >> > > > change it?
>>> >> > > >
>>> >> > > > Having said that, while we finalize the above, I think we should
>>> >> start
>>> >> > > > working on DDL implementation to use the default settings, as
>>> >> specified
>>> >> > > in
>>> >> > > > Alexander's email.
>>> >> > > >
>>> >> > > > Also agree with the stop-the-world on the cache for index
>>> creation.
>>> >> We
>>> >> > > can
>>> >> > > > always improve on it in future.
>>> >> > > >
>>> >> > > > D.
>>> >> > > >
>>> >> > > > On Wed, Jan 11, 2017 at 11:28 AM, Sergey Kozlov <
>>> >> skoz...@gridgain.com>
>>> >> > > > wrote:
>>> >> > > >
>>> >> > > > > Hi
>>> >> > > > >
>>> >> > > > > I suppose we should put any ignite cache properties as
>>> additional
>>> >> > > > > non-standard attributes after CREATE TABLE () clause as it does
>>> >> > > > Postgress,
>>> >> > > > > MySQL and other RDBMS.
>>> >> > > > > Take a look on CREATE TABLE with using TABLESPACE (Postgess) or
>>> for
>>> >> > > > CREATE
>>> >> > > > > TABLE with using PARTITIONS (MySQL).
>>> >> > > > >
>>> >> > > > >
>>> >> > > > >
>>> >> > > > >
>>> >> > > > >
>>> >> > > > > On Wed, Jan 11, 2017 at 10:05 PM, Vladimir Ozerov <
>>> >> > > voze...@gridgain.com>
>>> >> > > > > wrote:
>>> >> > > > >
>>> >> > > > > > I believe custom synthax and parsing is a *must* for us, as
>>> well
>>> >> as
>>> >> > > for
>>> >> > > > > any
>>> >> > > > > > distributed database. At the very least we need to specify
>>> >> affinity
>>> >> > > key
>>> >> > > > > > column somehow. Any cache property can be specified at the
>>> very
>>> >> end
>>> >> > > of
>>> >> > > > > > table definition. Key columns can be determined as the ones
>>> with
>>> >> > > > PRIMARY
>>> >> > > > > > KEY constraint (Alex K. idea) + affinity column(s):
>>> >> > > > > >
>>> >> > > > > > CREATE TABLE employee (
>>> >> > > > > >     id BIGINT PRIMARY KEY,
>>> >> > > > > >     dept_id BIGINT AFFINITY KEY,
>>> >> > > > > >     name VARCHAR(128),
>>> >> > > > > >     address VARCHAR(256)
>>> >> > > > > >     BACKUPS 2,
>>> >> > > > > >     ATOMICITY_MODE ATOMIC,
>>> >> > > > > > );
>>> >> > > > > >
>>> >> > > > > > "id" and "dept_id" form key type, "name" and "address" form
>>> value
>>> >> > > type.
>>> >> > > > > >
>>> >> > > > > > Vladimir.
>>> >> > > > > >
>>> >> > > > > > On Wed, Jan 11, 2017 at 9:08 PM, Alexey Kuznetsov <
>>> >> > > > akuznet...@apache.org
>>> >> > > > > >
>>> >> > > > > > wrote:
>>> >> > > > > >
>>> >> > > > > > > Hi, Alex!
>>> >> > > > > > >
>>> >> > > > > > > As far as I know most RDBMS allow something like: create
>>> table
>>> >> t1
>>> >> > > (id
>>> >> > > > > > > integer primary key, ....)
>>> >> > > > > > > How about to take as key field that marked as "primary key"?
>>> >> > > > > > >
>>> >> > > > > > > As of atomicity and replication - I think it is a cache
>>> >> > properties
>>> >> > > > and
>>> >> > > > > > with
>>> >> > > > > > > create table we will create "types" in cache. No?
>>> >> > > > > > > I thought that cache it is a kind of "schema" in RDBMS.
>>> >> > > > > > >
>>> >> > > > > > > Could you describe what will be created with CREATE TABLE?
>>> >> > > > > > >
>>> >> > > > > > > On Thu, Jan 12, 2017 at 12:54 AM, Alexander Paschenko <
>>> >> > > > > > > alexander.a.pasche...@gmail.com> wrote:
>>> >> > > > > > >
>>> >> > > > > > > > Hello Igniters,
>>> >> > > > > > > >
>>> >> > > > > > > > I would like to start discussion about implementation of
>>> SQL
>>> >> > DDL
>>> >> > > > > > > commands.
>>> >> > > > > > > >
>>> >> > > > > > > > At the first stage, the most important ones seem to be
>>> CREATE
>>> >> > > TABLE
>>> >> > > > > > > > (that will obviously correspond to creation of a cache)
>>> and
>>> >> > > CREATE
>>> >> > > > > > > > INDEX.
>>> >> > > > > > > >
>>> >> > > > > > > > Regarding first one: SQL command for CREATE TABLE does not
>>> >> > > contain
>>> >> > > > > any
>>> >> > > > > > > > hints about cache settings (atomicity, replication,
>>> etc.), so
>>> >> > > these
>>> >> > > > > > > > will probably be defined by some configuration properties
>>> >> (like
>>> >> > > > > > > > ignite.ddl.default_cache_atomiticity, etc).
>>> >> > > > > > > >
>>> >> > > > > > > > Also it does not allow to distinguish between key and
>>> value
>>> >> > > > columns -
>>> >> > > > > > > > currently it is handled by keyFields property of
>>> QueryEntity,
>>> >> > but
>>> >> > > > it
>>> >> > > > > > > > is unclear how to declare key fields via CREATE TABLE.
>>> >> > > > > > > >
>>> >> > > > > > > > So at a first glance it seems like we should either
>>> implement
>>> >> > > some
>>> >> > > > > > > > sort of custom parsing (I believe Sergi will be against
>>> it)
>>> >> or
>>> >> > > > > > > > introduce some kind of name prefix that would tell SQL
>>> engine
>>> >> > > that
>>> >> > > > > > > > certain column is a key field column.
>>> >> > > > > > > >
>>> >> > > > > > > > Of course, this problem disappears is key is of SQL type.
>>> >> > > > > > > >
>>> >> > > > > > > > Regarding CREATE INDEX: probably at first we will have to
>>> >> > > implement
>>> >> > > > > > > > this in "stop-the-world" manner, i.e. all cache will be
>>> >> blocked
>>> >> > > > > during
>>> >> > > > > > > > the index's initial buildup.
>>> >> > > > > > > >
>>> >> > > > > > > > Any thoughts?
>>> >> > > > > > > >
>>> >> > > > > > > > Currently I'm working on parsing of those commands as that
>>> >> will
>>> >> > > be
>>> >> > > > > > > > needed anyway and does not affect further implementation.
>>> >> > > > > > > >
>>> >> > > > > > > > - Alex
>>> >> > > > > > > >
>>> >> > > > > > >
>>> >> > > > > > >
>>> >> > > > > > >
>>> >> > > > > > > --
>>> >> > > > > > > Alexey Kuznetsov
>>> >> > > > > > >
>>> >> > > > > >
>>> >> > > > >
>>> >> > > > >
>>> >> > > > >
>>> >> > > > > --
>>> >> > > > > Sergey Kozlov
>>> >> > > > > GridGain Systems
>>> >> > > > > www.gridgain.com
>>> >> > > > >
>>> >> > > >
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > --
>>> >> > > Sergey Kozlov
>>> >> > > GridGain Systems
>>> >> > > www.gridgain.com
>>> >> > >
>>> >> >
>>> >>
>>>

Re: DDL implementation details

Reply via email to