On 21.9.2016 12:00, Stanislav Muhametsin wrote:
On 21.9.2016 0:08, Jiri Jetmar wrote:
Independently of that that, things starts to be complicated in the SQL
world with large data when you are submitting e.g. a inner JOIN
statement
in a transactional INSERT expression, where the tables are located (the
data) on different nodes, simply because of a single node limit.
Well... there are ways to circumvent this, too. :)
Unfortunately, can't tell much more about that, as it is key concept
of the company where I work right now.
But very large and globally distributed RDBMS nodes, which don't lag
under high stress, are very much so possible!
I would like to work on this task. Therefore smart ideas are highly
welcome
! :-)
I am responsible for current SQL indexing code, and Paul is
responsible for current SQL entitystore code.
The indexing code is, unfortunately, quite spaghettified, and I would
do lots of things probably very differently now, than how I did them
several years ago.
I think one strategy to approach would be to re-write whole thing from
scratch, but keeping the main principles:
1. Each property and association should be stored in separate table.
This is required for the very dynamic nature of Zest queries.
This implicates that the whole DB schema will be owned by this
indexing service - it is not compatible with some external tool
generating tables (unless tables are, of course, in this exact
required format).
Example: if you have interface MyEntity { Property<String>
myProp(); Association<SomeOtherEntity> myAsso(); },
That would end up with 3 tables:
1.1. The table for implicit 'identity' property. Current indexing
code has separate DB-specific entity primary key which is of type
'long' (64bit integer), since that makes foreign key references much
faster and compacter. Not sure if that is good idea anymore.
1.2. The table for 'myProp' property with 2 columns: foreign key
reference to identity table and actual column of type 'text'
containing the contents of 'myProp' property.
1.3. The table for 'myAsso' association with 2 columns: foreign
key reference to identity table of 'MyEntity' and foreign key
reference to identity table of 'SomeOtherEntity'. Obviously this
covers the 'ManyAssociation's as well, if they are still present in Zest.
I forgot one thing to mention: the collectionized properties were cause
of major headache in SQL indexing.
IIRC I made separate tables and/or columns to model collections
properly, in order to properly support queries which had conditions like
"property X is list containing entity Y with property Z at value Ö".
However, with the JSON datatype, it might be better (?) idea to make
collectionized properties as single column with JSON datatype, and just
JSON-ize collections.
2. The SQL query will then be created by translating Zest query
objects into SQL query objects. You could also just build SQL strings,
but it would be quite nasty and most likely very unmaintaineable task
- current code uses java-sql-generator, which is my library on github,
to create SQL statements using DSL, and not by concatenating strings.
IIRC Zest AND-conditions are SQL INTERSECTIONs of single
property/asso queries, Zest OR-conditions are SQL UNIONs of single
property/asso queries, and Zest NOT-conditions are SQL NOTs of single
property/asso queries.
3. Keeping DB schema in sync with Zest application schema.
This is very tough one - if someone adds/removes
properties/associations, the database schema should be kept up-to-date.
I remember having problems with this one - it also implies that
you have to walk through the *whole* application structure in order to
generate tables for all the properties/assos of all the entities
visible to this indexing service.