Re: Materialization performance

Christian Beikov Thu, 31 Aug 2017 00:29:58 -0700

My CAS scheme was merely meant for Materialization registration. Theretry is an implementation detail and would happen inside of theregistration method, the user wouldn't notice that. APIs stay the waythey are, I'd only change the way the MaterializationActor is accessed.

The (root) schema is one part that I'd like to see being shared, but Iguess the type factory as well as the CalciteServer should be shared toobetween connections. Is there anything else you think that can/should beshared?

I could implement the discussed sharing as a javax.sql.DataSource if youwant so we can discuss specifics. Along the way I'd try to do somegeneral performance improvements regarding concurrency synchronizationmechanisms. Would that be ok?



Mit freundlichen Grüßen,
------------------------------------------------------------------------
*Christian Beikov*
Am 31.08.2017 um 00:02 schrieb Julian Hyde:

Schema is the context you are referring to. Schema has a longer
lifespan than Connection, and if you make an immutable one (which we
recommend) you can share it among connections.

Your CAS scheme would work but requires each user to create a copy of
all of the materialization state. This is potentially large (thousands
of tables) and rapidly changing. Also, your scheme requires the user
to re-try. I think the actor model is better suited for this.

On Wed, Aug 30, 2017 at 2:14 PM, Christian Beikov
<[email protected]> wrote:

Of course steps 2 and 3 depend on what you read, but if a change happens in
the meantime you'r CAS will fail in step 3 since all changes are done
through such a CAS, so you have to "redo" the transaction or parts of it.
That's basically optimistic locking :)

The important part is, that the whole holder is replaced so you can do
guarantee safety by doing a single CAS. Imagine the actor field in
MaterializationService is wrapped by an AtomicReference and all maps in the
MaterializationActor are immutable. The only way to change a thing is to
read the actor, create a copy of it with the new state and do a CAS through
the atomic reference. That would already solve all thread safety issues that
the current design has.

Could you maybe comment on the context sharing between connections part too?


Mit freundlichen Grüßen,
------------------------------------------------------------------------
*Christian Beikov*
Am 30.08.2017 um 21:31 schrieb Julian Hyde:

Consider a “transaction” that involves reads and writes:

    Read from a data structure
    Do some stuff
    Write to the data structure

If steps 2 and 3 depend on what you read in step 1, then you need to
prevent anyone from writing until you have written. A simple CAS won’t solve
this. The simplest solution is for the whole transaction to be in a critical
section. It doesn’t really matter whether that is implemented using an actor
or synchronized blocks.

We are mostly in agreement - especially about using immutable data
structures for anything shared between threads.

Julian

On Aug 29, 2017, at 2:01 PM, Christian Beikov
<[email protected]> wrote:

Imagine the holder of the various hash maps is immutable, let's call it
"actor". When a new registration is done, we create a copy of that holder
and CAS it. When we query, we simply get the current value and access it's
maps. So MaterializationService could have an AtomicReference to a holder
"actor" just like right now, but we make the maps immutable and create
copies whenever a change occurs. We could hide such details behind a message
passing interface so that remote models can be implemented too, but that
seems like a next step.

The materialization concurrency issues isn't the only problem, what about
the general usage in multithreaded environments? The whole schema is
currently bound to a CalciteConnection. It would be nice if all the context
could be shared between multiple connections so that we avoid having to
initialize every connection. Do you have any plans to tackle that or am I
not seeing how to achieve this?


Mit freundlichen Grüßen,
------------------------------------------------------------------------
*Christian Beikov*
Am 29.08.2017 um 19:40 schrieb Julian Hyde:

I'd rather have immutable state being CASed(compare-and-swap) to make
the querying cheap and do updates in an optimistic concurrency control
manner.

Compare and swap only works for one memory address. You can't use it
to, say, debit one bank account and credit another.

The set of valid materializations is just about the only mutable state
in Calcite and I think it will need to be several interconnected data
structures. So, compare-and-swap (or its high-level equivalent,
ConcurrentHashMap) won't cut it.

So we could use locks/monitors (the "synchronized" keyword) or we
could use an actor. The key difference between the two is who does the
work. With a monitor, each customer grabs the key (there is only one
key), walks into the bank vault, and moves the money from one deposit
box to another. With an actor, there is a bank employee in the vault
who is the only person allowed to move money around.

The work done is the same in both models. There are performance
advantages of the actor model (the data structures will tend to exist
in one core's cache) and there are code simplicity advantages (the
critical code is all in one class or package).

The overhead of two puts/gets on an ArrayBlockingQueue per request is
negligible. And besides, you can switch to a non-actor implementation
of the service if Calcite is single-threaded.

I haven't thought out the details of multi-tenant. It is not true to
say that this is "not a primary requirement for
the Calcite project." Look at the "data grid (cache)" on the diagram
in my "Optiq" talk [1] from 2013. Dynamic materialized views were in
from the very start. There can be multiple instances of the actor
(each with their own request/response queues), so you could have one
per tenant. Also, it is very straightforward to make the actors
remote, replacing the queues with RPC over a message broker. Remote
actors are called services.

Julian

[1]
https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework

On Tue, Aug 29, 2017 at 8:25 AM, Jesus Camacho Rodriguez
<[email protected]> wrote:

LGTM, I think by the time we have support for the outer joins, I might
have
had time to finish the filter tree index implementation too.

-Jesús



On 8/29/17, 3:11 AM, "Christian Beikov" <[email protected]>
wrote:

I'd like to stick to trying to figure out how to support outer joins
for
now and when I have an implementation for that, I'd look into the
filter
tree index if you haven't done it by then.


Mit freundlichen Grüßen,

------------------------------------------------------------------------
*Christian Beikov*
Am 28.08.2017 um 20:01 schrieb Jesus Camacho Rodriguez:

Christian,

The implementation of the filter tree index is what I was referring
to
indeed. In the initial implementation I focused on the rewriting
coverage,
but now that the first part is finished, it is at the top of my list
as
I think it is critical to make the whole query rewriting algorithm
work
at scale. However, I have not started yet.

The filter tree index will help to filter not only based on the
tables used
by a given query, but also for queries that do not meet the
equivalence
classes conditions, filter conditions, etc. We could implement all
the
preconditions mentioned in the paper, and we could add our own
additional
ones. I also think that in a second version, we might need to maybe
add
some kind of ranking/limit as many views might meet the preconditions
for
a given query.

It seems you understood how it should work, so if you could help to
quickstart that work by maybe implementing a first version of the
filter
tree index with a couple of basic conditions (table matching and EC
matching?),
that would be great. I could review any of the contributions you
make.

-Jesús





On 8/28/17, 3:22 AM, "Christian Beikov" <[email protected]>
wrote:

If the metadata was cached, that would be awesome, especially
because
that would also improve the prformance regarding the metadata
retrival
for the query currently being planned, although I am not sure how
the
caching would work since the RelNodes are mutable.

Have you considered implementing the filter tree index explained in
the
paper? As far as I understood, the whole thing only works when a
redundant table elimination is implemented. Is that the case? If so,
or
if it can be done easily, I'd propose we initialize all the lookup
structures during registration and use them during planning. This
will
improve planning time drastically and essentially handle the
scalability
problem you mention.

What other MV-related issues are on your personal todo list Jesus? I
read the paper now and think I can help you in one place or another
if
you want.


Mit freundlichen Grüßen,

------------------------------------------------------------------------
*Christian Beikov*
Am 28.08.2017 um 08:13 schrieb Jesus Camacho Rodriguez:

Hive does not use the Calcite SQL parser, thus we follow a
different path
and did not experience the problem on the Calcite end. However,
FWIW we
avoided reparsing the SQL every time a query was being planned by
creating/managing our own cache too.

The metadata providers implement some caching, thus I would expect
that once
you avoid reparsing every MV, the retrieval time of predicates,
lineage, etc.
would improve (at least after using the MV for the first time).
However,
I agree that the information should be inferred when the MV is
loaded.
In fact, maybe just making some calls to the metadata providers
while the MVs
are being loaded would do the trick (Julian should confirm this).

Btw, probably you will find another scalability issue as the number
of MVs
grows large with the current implementation of the rewriting, since
the´
pre-filtering implementation in place does not discard many of the
views that
are not valid to rewrite a given query, and rewriting is attempted
with all
of them.
This last bit is work that I would like to tackle shortly, but I
have not
created the corresponding JIRA yet.

-Jesús




On 8/27/17, 10:43 PM, "Rajat Venkatesh" <[email protected]>
wrote:

Thread Safety and repeated parsing is a problem. We have
experience with
managing 10s of materialized views. Repeated parsing takes more
time than
execution of the query itself. We also have a similar problem
where
concurrent queries (with a different set of materialized views
potentailly)
maybe planned at the same time. We solved it through maintaining a
cache
and carefully setting the cache in a thread local.
Relevant code for inspiration:

https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/prepare/Materializer.java

https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/plan/QuarkMaterializeCluster.java



On Sun, Aug 27, 2017 at 6:50 PM Christian Beikov
<[email protected]>
wrote:

Hey, I have been looking a bit into how materialized views
perform
during the planning because of a very long test
run(MaterializationTest#testJoinMaterializationUKFK6) and the
current
state is problematic.

CalcitePrepareImpl#getMaterializations always reparses the SQL
and down
the line, there is a lot of expensive work(e.g. predicate and
lineage
determination) done during planning that could easily be
pre-calculated
and cached during materialization creation.

There is also a bit of a thread safety problem with the current
implementation. Unless there is a different safety mechanism that
I
don't see, the sharing of the MaterializationService and thus
also the
maps in MaterializationActor via a static instance between
multiple
threads is problematic.

Since I mentioned thread safety, how is Calcite supposed to be
used in a
multi-threaded environment? Currently I use a connection pool
that
initializes the schema on new connections, but that is not really
nice.
I suppose caches are also bound to the connection? A thread safe
context
that can be shared between connections would be nice to avoid all
that
repetitive work.

Are these known issues which you have thought about how to fix or
should
I log JIRAs for these and fix them to the best of my knowledge?
I'd more
or less keep the service shared but would implement it using a
copy on
write strategy since I'd expect seldom schema changes after
startup.

Regarding the repetitive work that partly happens during
planning, I'd
suggest doing that during materialization registration instead
like it
is already mentioned CalcitePrepareImpl#populateMaterializations.
Would
that be ok?

--

Mit freundlichen Grüßen,

------------------------------------------------------------------------
*Christian Beikov*

Re: Materialization performance

Reply via email to