Re: Materialization performance

Julian Hyde Thu, 31 Aug 2017 10:57:10 -0700

No, type factories belong to a connection (or shorter duration if you like). By 
design, the tables in a schema don’t have row types; they have a method you 
call to get a row type for a particular type factory. So type factories are 
“short term” objects, schemas and tables are “long term”.


> On Aug 31, 2017, at 12:29 AM, Christian Beikov <[email protected]> 
> wrote:
> 
> My CAS scheme was merely meant for Materialization registration. The retry is 
> an implementation detail and would happen inside of the registration method, 
> the user wouldn't notice that. APIs stay the way they are, I'd only change 
> the way the MaterializationActor is accessed.
> 
> The (root) schema is one part that I'd like to see being shared, but I guess 
> the type factory as well as the CalciteServer should be shared too between 
> connections. Is there anything else you think that can/should be shared?
> 
> I could implement the discussed sharing as a javax.sql.DataSource if you want 
> so we can discuss specifics. Along the way I'd try to do some general 
> performance improvements regarding concurrency synchronization mechanisms. 
> Would that be ok?
> 
> 
> Mit freundlichen Grüßen,
> ------------------------------------------------------------------------
> *Christian Beikov*
> Am 31.08.2017 um 00:02 schrieb Julian Hyde:
>> Schema is the context you are referring to. Schema has a longer
>> lifespan than Connection, and if you make an immutable one (which we
>> recommend) you can share it among connections.
>> 
>> Your CAS scheme would work but requires each user to create a copy of
>> all of the materialization state. This is potentially large (thousands
>> of tables) and rapidly changing. Also, your scheme requires the user
>> to re-try. I think the actor model is better suited for this.
>> 
>> On Wed, Aug 30, 2017 at 2:14 PM, Christian Beikov
>> <[email protected]> wrote:
>>> Of course steps 2 and 3 depend on what you read, but if a change happens in
>>> the meantime you'r CAS will fail in step 3 since all changes are done
>>> through such a CAS, so you have to "redo" the transaction or parts of it.
>>> That's basically optimistic locking :)
>>> 
>>> The important part is, that the whole holder is replaced so you can do
>>> guarantee safety by doing a single CAS. Imagine the actor field in
>>> MaterializationService is wrapped by an AtomicReference and all maps in the
>>> MaterializationActor are immutable. The only way to change a thing is to
>>> read the actor, create a copy of it with the new state and do a CAS through
>>> the atomic reference. That would already solve all thread safety issues that
>>> the current design has.
>>> 
>>> Could you maybe comment on the context sharing between connections part too?
>>> 
>>> 
>>> Mit freundlichen Grüßen,
>>> ------------------------------------------------------------------------
>>> *Christian Beikov*
>>> Am 30.08.2017 um 21:31 schrieb Julian Hyde:
>>>> Consider a “transaction” that involves reads and writes:
>>>> 
>>>>    Read from a data structure
>>>>    Do some stuff
>>>>    Write to the data structure
>>>> 
>>>> If steps 2 and 3 depend on what you read in step 1, then you need to
>>>> prevent anyone from writing until you have written. A simple CAS won’t 
>>>> solve
>>>> this. The simplest solution is for the whole transaction to be in a 
>>>> critical
>>>> section. It doesn’t really matter whether that is implemented using an 
>>>> actor
>>>> or synchronized blocks.
>>>> 
>>>> We are mostly in agreement - especially about using immutable data
>>>> structures for anything shared between threads.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>>> On Aug 29, 2017, at 2:01 PM, Christian Beikov
>>>>> <[email protected]> wrote:
>>>>> 
>>>>> Imagine the holder of the various hash maps is immutable, let's call it
>>>>> "actor". When a new registration is done, we create a copy of that holder
>>>>> and CAS it. When we query, we simply get the current value and access it's
>>>>> maps. So MaterializationService could have an AtomicReference to a holder
>>>>> "actor" just like right now, but we make the maps immutable and create
>>>>> copies whenever a change occurs. We could hide such details behind a 
>>>>> message
>>>>> passing interface so that remote models can be implemented too, but that
>>>>> seems like a next step.
>>>>> 
>>>>> The materialization concurrency issues isn't the only problem, what about
>>>>> the general usage in multithreaded environments? The whole schema is
>>>>> currently bound to a CalciteConnection. It would be nice if all the 
>>>>> context
>>>>> could be shared between multiple connections so that we avoid having to
>>>>> initialize every connection. Do you have any plans to tackle that or am I
>>>>> not seeing how to achieve this?
>>>>> 
>>>>> 
>>>>> Mit freundlichen Grüßen,
>>>>> ------------------------------------------------------------------------
>>>>> *Christian Beikov*
>>>>> Am 29.08.2017 um 19:40 schrieb Julian Hyde:
>>>>>>> I'd rather have immutable state being CASed(compare-and-swap) to make
>>>>>>> the querying cheap and do updates in an optimistic concurrency control
>>>>>>> manner.
>>>>>> Compare and swap only works for one memory address. You can't use it
>>>>>> to, say, debit one bank account and credit another.
>>>>>> 
>>>>>> The set of valid materializations is just about the only mutable state
>>>>>> in Calcite and I think it will need to be several interconnected data
>>>>>> structures. So, compare-and-swap (or its high-level equivalent,
>>>>>> ConcurrentHashMap) won't cut it.
>>>>>> 
>>>>>> So we could use locks/monitors (the "synchronized" keyword) or we
>>>>>> could use an actor. The key difference between the two is who does the
>>>>>> work. With a monitor, each customer grabs the key (there is only one
>>>>>> key), walks into the bank vault, and moves the money from one deposit
>>>>>> box to another. With an actor, there is a bank employee in the vault
>>>>>> who is the only person allowed to move money around.
>>>>>> 
>>>>>> The work done is the same in both models. There are performance
>>>>>> advantages of the actor model (the data structures will tend to exist
>>>>>> in one core's cache) and there are code simplicity advantages (the
>>>>>> critical code is all in one class or package).
>>>>>> 
>>>>>> The overhead of two puts/gets on an ArrayBlockingQueue per request is
>>>>>> negligible. And besides, you can switch to a non-actor implementation
>>>>>> of the service if Calcite is single-threaded.
>>>>>> 
>>>>>> I haven't thought out the details of multi-tenant. It is not true to
>>>>>> say that this is "not a primary requirement for
>>>>>> the Calcite project." Look at the "data grid (cache)" on the diagram
>>>>>> in my "Optiq" talk [1] from 2013. Dynamic materialized views were in
>>>>>> from the very start. There can be multiple instances of the actor
>>>>>> (each with their own request/response queues), so you could have one
>>>>>> per tenant. Also, it is very straightforward to make the actors
>>>>>> remote, replacing the queues with RPC over a message broker. Remote
>>>>>> actors are called services.
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> [1]
>>>>>> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
>>>>>> 
>>>>>> On Tue, Aug 29, 2017 at 8:25 AM, Jesus Camacho Rodriguez
>>>>>> <[email protected]> wrote:
>>>>>>> LGTM, I think by the time we have support for the outer joins, I might
>>>>>>> have
>>>>>>> had time to finish the filter tree index implementation too.
>>>>>>> 
>>>>>>> -Jesús
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 8/29/17, 3:11 AM, "Christian Beikov" <[email protected]>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I'd like to stick to trying to figure out how to support outer joins
>>>>>>>> for
>>>>>>>> now and when I have an implementation for that, I'd look into the
>>>>>>>> filter
>>>>>>>> tree index if you haven't done it by then.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Mit freundlichen Grüßen,
>>>>>>>> 
>>>>>>>> ------------------------------------------------------------------------
>>>>>>>> *Christian Beikov*
>>>>>>>> Am 28.08.2017 um 20:01 schrieb Jesus Camacho Rodriguez:
>>>>>>>>> Christian,
>>>>>>>>> 
>>>>>>>>> The implementation of the filter tree index is what I was referring
>>>>>>>>> to
>>>>>>>>> indeed. In the initial implementation I focused on the rewriting
>>>>>>>>> coverage,
>>>>>>>>> but now that the first part is finished, it is at the top of my list
>>>>>>>>> as
>>>>>>>>> I think it is critical to make the whole query rewriting algorithm
>>>>>>>>> work
>>>>>>>>> at scale. However, I have not started yet.
>>>>>>>>> 
>>>>>>>>> The filter tree index will help to filter not only based on the
>>>>>>>>> tables used
>>>>>>>>> by a given query, but also for queries that do not meet the
>>>>>>>>> equivalence
>>>>>>>>> classes conditions, filter conditions, etc. We could implement all
>>>>>>>>> the
>>>>>>>>> preconditions mentioned in the paper, and we could add our own
>>>>>>>>> additional
>>>>>>>>> ones. I also think that in a second version, we might need to maybe
>>>>>>>>> add
>>>>>>>>> some kind of ranking/limit as many views might meet the preconditions
>>>>>>>>> for
>>>>>>>>> a given query.
>>>>>>>>> 
>>>>>>>>> It seems you understood how it should work, so if you could help to
>>>>>>>>> quickstart that work by maybe implementing a first version of the
>>>>>>>>> filter
>>>>>>>>> tree index with a couple of basic conditions (table matching and EC
>>>>>>>>> matching?),
>>>>>>>>> that would be great. I could review any of the contributions you
>>>>>>>>> make.
>>>>>>>>> 
>>>>>>>>> -Jesús
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 8/28/17, 3:22 AM, "Christian Beikov" <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> If the metadata was cached, that would be awesome, especially
>>>>>>>>>> because
>>>>>>>>>> that would also improve the prformance regarding the metadata
>>>>>>>>>> retrival
>>>>>>>>>> for the query currently being planned, although I am not sure how
>>>>>>>>>> the
>>>>>>>>>> caching would work since the RelNodes are mutable.
>>>>>>>>>> 
>>>>>>>>>> Have you considered implementing the filter tree index explained in
>>>>>>>>>> the
>>>>>>>>>> paper? As far as I understood, the whole thing only works when a
>>>>>>>>>> redundant table elimination is implemented. Is that the case? If so,
>>>>>>>>>> or
>>>>>>>>>> if it can be done easily, I'd propose we initialize all the lookup
>>>>>>>>>> structures during registration and use them during planning. This
>>>>>>>>>> will
>>>>>>>>>> improve planning time drastically and essentially handle the
>>>>>>>>>> scalability
>>>>>>>>>> problem you mention.
>>>>>>>>>> 
>>>>>>>>>> What other MV-related issues are on your personal todo list Jesus? I
>>>>>>>>>> read the paper now and think I can help you in one place or another
>>>>>>>>>> if
>>>>>>>>>> you want.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Mit freundlichen Grüßen,
>>>>>>>>>> 
>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>> *Christian Beikov*
>>>>>>>>>> Am 28.08.2017 um 08:13 schrieb Jesus Camacho Rodriguez:
>>>>>>>>>>> Hive does not use the Calcite SQL parser, thus we follow a
>>>>>>>>>>> different path
>>>>>>>>>>> and did not experience the problem on the Calcite end. However,
>>>>>>>>>>> FWIW we
>>>>>>>>>>> avoided reparsing the SQL every time a query was being planned by
>>>>>>>>>>> creating/managing our own cache too.
>>>>>>>>>>> 
>>>>>>>>>>> The metadata providers implement some caching, thus I would expect
>>>>>>>>>>> that once
>>>>>>>>>>> you avoid reparsing every MV, the retrieval time of predicates,
>>>>>>>>>>> lineage, etc.
>>>>>>>>>>> would improve (at least after using the MV for the first time).
>>>>>>>>>>> However,
>>>>>>>>>>> I agree that the information should be inferred when the MV is
>>>>>>>>>>> loaded.
>>>>>>>>>>> In fact, maybe just making some calls to the metadata providers
>>>>>>>>>>> while the MVs
>>>>>>>>>>> are being loaded would do the trick (Julian should confirm this).
>>>>>>>>>>> 
>>>>>>>>>>> Btw, probably you will find another scalability issue as the number
>>>>>>>>>>> of MVs
>>>>>>>>>>> grows large with the current implementation of the rewriting, since
>>>>>>>>>>> the´
>>>>>>>>>>> pre-filtering implementation in place does not discard many of the
>>>>>>>>>>> views that
>>>>>>>>>>> are not valid to rewrite a given query, and rewriting is attempted
>>>>>>>>>>> with all
>>>>>>>>>>> of them.
>>>>>>>>>>> This last bit is work that I would like to tackle shortly, but I
>>>>>>>>>>> have not
>>>>>>>>>>> created the corresponding JIRA yet.
>>>>>>>>>>> 
>>>>>>>>>>> -Jesús
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 8/27/17, 10:43 PM, "Rajat Venkatesh" <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Thread Safety and repeated parsing is a problem. We have
>>>>>>>>>>>> experience with
>>>>>>>>>>>> managing 10s of materialized views. Repeated parsing takes more
>>>>>>>>>>>> time than
>>>>>>>>>>>> execution of the query itself. We also have a similar problem
>>>>>>>>>>>> where
>>>>>>>>>>>> concurrent queries (with a different set of materialized views
>>>>>>>>>>>> potentailly)
>>>>>>>>>>>> maybe planned at the same time. We solved it through maintaining a
>>>>>>>>>>>> cache
>>>>>>>>>>>> and carefully setting the cache in a thread local.
>>>>>>>>>>>> Relevant code for inspiration:
>>>>>>>>>>>> 
>>>>>>>>>>>> https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/prepare/Materializer.java
>>>>>>>>>>>> 
>>>>>>>>>>>> https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/plan/QuarkMaterializeCluster.java
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Sun, Aug 27, 2017 at 6:50 PM Christian Beikov
>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> Hey, I have been looking a bit into how materialized views
>>>>>>>>>>>>> perform
>>>>>>>>>>>>> during the planning because of a very long test
>>>>>>>>>>>>> run(MaterializationTest#testJoinMaterializationUKFK6) and the
>>>>>>>>>>>>> current
>>>>>>>>>>>>> state is problematic.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> CalcitePrepareImpl#getMaterializations always reparses the SQL
>>>>>>>>>>>>> and down
>>>>>>>>>>>>> the line, there is a lot of expensive work(e.g. predicate and
>>>>>>>>>>>>> lineage
>>>>>>>>>>>>> determination) done during planning that could easily be
>>>>>>>>>>>>> pre-calculated
>>>>>>>>>>>>> and cached during materialization creation.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> There is also a bit of a thread safety problem with the current
>>>>>>>>>>>>> implementation. Unless there is a different safety mechanism that
>>>>>>>>>>>>> I
>>>>>>>>>>>>> don't see, the sharing of the MaterializationService and thus
>>>>>>>>>>>>> also the
>>>>>>>>>>>>> maps in MaterializationActor via a static instance between
>>>>>>>>>>>>> multiple
>>>>>>>>>>>>> threads is problematic.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Since I mentioned thread safety, how is Calcite supposed to be
>>>>>>>>>>>>> used in a
>>>>>>>>>>>>> multi-threaded environment? Currently I use a connection pool
>>>>>>>>>>>>> that
>>>>>>>>>>>>> initializes the schema on new connections, but that is not really
>>>>>>>>>>>>> nice.
>>>>>>>>>>>>> I suppose caches are also bound to the connection? A thread safe
>>>>>>>>>>>>> context
>>>>>>>>>>>>> that can be shared between connections would be nice to avoid all
>>>>>>>>>>>>> that
>>>>>>>>>>>>> repetitive work.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Are these known issues which you have thought about how to fix or
>>>>>>>>>>>>> should
>>>>>>>>>>>>> I log JIRAs for these and fix them to the best of my knowledge?
>>>>>>>>>>>>> I'd more
>>>>>>>>>>>>> or less keep the service shared but would implement it using a
>>>>>>>>>>>>> copy on
>>>>>>>>>>>>> write strategy since I'd expect seldom schema changes after
>>>>>>>>>>>>> startup.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Regarding the repetitive work that partly happens during
>>>>>>>>>>>>> planning, I'd
>>>>>>>>>>>>> suggest doing that during materialization registration instead
>>>>>>>>>>>>> like it
>>>>>>>>>>>>> is already mentioned CalcitePrepareImpl#populateMaterializations.
>>>>>>>>>>>>> Would
>>>>>>>>>>>>> that be ok?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Mit freundlichen Grüßen,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>>>> *Christian Beikov*
>>>>>>>>>>>>> 
>

Re: Materialization performance

Reply via email to