Re: Materialization performance

Julian Hyde Wed, 30 Aug 2017 12:31:49 -0700

Consider a “transaction” that involves reads and writes:

  Read from a data structure
  Do some stuff
  Write to the data structure


If steps 2 and 3 depend on what you read in step 1, then you need to prevent 
anyone from writing until you have written. A simple CAS won’t solve this. The 
simplest solution is for the whole transaction to be in a critical section. It 
doesn’t really matter whether that is implemented using an actor or 
synchronized blocks.

We are mostly in agreement - especially about using immutable data structures 
for anything shared between threads. 

Julian


> On Aug 29, 2017, at 2:01 PM, Christian Beikov <[email protected]> 
> wrote:
> 
> Imagine the holder of the various hash maps is immutable, let's call it 
> "actor". When a new registration is done, we create a copy of that holder and 
> CAS it. When we query, we simply get the current value and access it's maps. 
> So MaterializationService could have an AtomicReference to a holder "actor" 
> just like right now, but we make the maps immutable and create copies 
> whenever a change occurs. We could hide such details behind a message passing 
> interface so that remote models can be implemented too, but that seems like a 
> next step.
> 
> The materialization concurrency issues isn't the only problem, what about the 
> general usage in multithreaded environments? The whole schema is currently 
> bound to a CalciteConnection. It would be nice if all the context could be 
> shared between multiple connections so that we avoid having to initialize 
> every connection. Do you have any plans to tackle that or am I not seeing how 
> to achieve this?
> 
> 
> Mit freundlichen Grüßen,
> ------------------------------------------------------------------------
> *Christian Beikov*
> Am 29.08.2017 um 19:40 schrieb Julian Hyde:
>>> I'd rather have immutable state being CASed(compare-and-swap) to make
>>> the querying cheap and do updates in an optimistic concurrency control 
>>> manner.
>> Compare and swap only works for one memory address. You can't use it
>> to, say, debit one bank account and credit another.
>> 
>> The set of valid materializations is just about the only mutable state
>> in Calcite and I think it will need to be several interconnected data
>> structures. So, compare-and-swap (or its high-level equivalent,
>> ConcurrentHashMap) won't cut it.
>> 
>> So we could use locks/monitors (the "synchronized" keyword) or we
>> could use an actor. The key difference between the two is who does the
>> work. With a monitor, each customer grabs the key (there is only one
>> key), walks into the bank vault, and moves the money from one deposit
>> box to another. With an actor, there is a bank employee in the vault
>> who is the only person allowed to move money around.
>> 
>> The work done is the same in both models. There are performance
>> advantages of the actor model (the data structures will tend to exist
>> in one core's cache) and there are code simplicity advantages (the
>> critical code is all in one class or package).
>> 
>> The overhead of two puts/gets on an ArrayBlockingQueue per request is
>> negligible. And besides, you can switch to a non-actor implementation
>> of the service if Calcite is single-threaded.
>> 
>> I haven't thought out the details of multi-tenant. It is not true to
>> say that this is "not a primary requirement for
>> the Calcite project." Look at the "data grid (cache)" on the diagram
>> in my "Optiq" talk [1] from 2013. Dynamic materialized views were in
>> from the very start. There can be multiple instances of the actor
>> (each with their own request/response queues), so you could have one
>> per tenant. Also, it is very straightforward to make the actors
>> remote, replacing the queues with RPC over a message broker. Remote
>> actors are called services.
>> 
>> Julian
>> 
>> [1] 
>> https://www.slideshare.net/julianhyde/optiq-a-dynamic-data-management-framework
>> 
>> On Tue, Aug 29, 2017 at 8:25 AM, Jesus Camacho Rodriguez
>> <[email protected]> wrote:
>>> LGTM, I think by the time we have support for the outer joins, I might have
>>> had time to finish the filter tree index implementation too.
>>> 
>>> -Jesús
>>> 
>>> 
>>> 
>>> On 8/29/17, 3:11 AM, "Christian Beikov" <[email protected]> wrote:
>>> 
>>>> I'd like to stick to trying to figure out how to support outer joins for
>>>> now and when I have an implementation for that, I'd look into the filter
>>>> tree index if you haven't done it by then.
>>>> 
>>>> 
>>>> Mit freundlichen Grüßen,
>>>> ------------------------------------------------------------------------
>>>> *Christian Beikov*
>>>> Am 28.08.2017 um 20:01 schrieb Jesus Camacho Rodriguez:
>>>>> Christian,
>>>>> 
>>>>> The implementation of the filter tree index is what I was referring to
>>>>> indeed. In the initial implementation I focused on the rewriting coverage,
>>>>> but now that the first part is finished, it is at the top of my list as
>>>>> I think it is critical to make the whole query rewriting algorithm work
>>>>> at scale. However, I have not started yet.
>>>>> 
>>>>> The filter tree index will help to filter not only based on the tables 
>>>>> used
>>>>> by a given query, but also for queries that do not meet the equivalence
>>>>> classes conditions, filter conditions, etc. We could implement all the
>>>>> preconditions mentioned in the paper, and we could add our own additional
>>>>> ones. I also think that in a second version, we might need to maybe add
>>>>> some kind of ranking/limit as many views might meet the preconditions for
>>>>> a given query.
>>>>> 
>>>>> It seems you understood how it should work, so if you could help to
>>>>> quickstart that work by maybe implementing a first version of the filter
>>>>> tree index with a couple of basic conditions (table matching and EC 
>>>>> matching?),
>>>>> that would be great. I could review any of the contributions you make.
>>>>> 
>>>>> -Jesús
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 8/28/17, 3:22 AM, "Christian Beikov" <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>>> If the metadata was cached, that would be awesome, especially because
>>>>>> that would also improve the prformance regarding the metadata retrival
>>>>>> for the query currently being planned, although I am not sure how the
>>>>>> caching would work since the RelNodes are mutable.
>>>>>> 
>>>>>> Have you considered implementing the filter tree index explained in the
>>>>>> paper? As far as I understood, the whole thing only works when a
>>>>>> redundant table elimination is implemented. Is that the case? If so, or
>>>>>> if it can be done easily, I'd propose we initialize all the lookup
>>>>>> structures during registration and use them during planning. This will
>>>>>> improve planning time drastically and essentially handle the scalability
>>>>>> problem you mention.
>>>>>> 
>>>>>> What other MV-related issues are on your personal todo list Jesus? I
>>>>>> read the paper now and think I can help you in one place or another if
>>>>>> you want.
>>>>>> 
>>>>>> 
>>>>>> Mit freundlichen Grüßen,
>>>>>> ------------------------------------------------------------------------
>>>>>> *Christian Beikov*
>>>>>> Am 28.08.2017 um 08:13 schrieb Jesus Camacho Rodriguez:
>>>>>>> Hive does not use the Calcite SQL parser, thus we follow a different 
>>>>>>> path
>>>>>>> and did not experience the problem on the Calcite end. However, FWIW we
>>>>>>> avoided reparsing the SQL every time a query was being planned by
>>>>>>> creating/managing our own cache too.
>>>>>>> 
>>>>>>> The metadata providers implement some caching, thus I would expect that 
>>>>>>> once
>>>>>>> you avoid reparsing every MV, the retrieval time of predicates, 
>>>>>>> lineage, etc.
>>>>>>> would improve (at least after using the MV for the first time). However,
>>>>>>> I agree that the information should be inferred when the MV is loaded.
>>>>>>> In fact, maybe just making some calls to the metadata providers while 
>>>>>>> the MVs
>>>>>>> are being loaded would do the trick (Julian should confirm this).
>>>>>>> 
>>>>>>> Btw, probably you will find another scalability issue as the number of 
>>>>>>> MVs
>>>>>>> grows large with the current implementation of the rewriting, since the´
>>>>>>> pre-filtering implementation in place does not discard many of the 
>>>>>>> views that
>>>>>>> are not valid to rewrite a given query, and rewriting is attempted with 
>>>>>>> all
>>>>>>> of them.
>>>>>>> This last bit is work that I would like to tackle shortly, but I have 
>>>>>>> not
>>>>>>> created the corresponding JIRA yet.
>>>>>>> 
>>>>>>> -Jesús
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 8/27/17, 10:43 PM, "Rajat Venkatesh" <[email protected]> wrote:
>>>>>>> 
>>>>>>>> Thread Safety and repeated parsing is a problem. We have experience 
>>>>>>>> with
>>>>>>>> managing 10s of materialized views. Repeated parsing takes more time 
>>>>>>>> than
>>>>>>>> execution of the query itself. We also have a similar problem where
>>>>>>>> concurrent queries (with a different set of materialized views 
>>>>>>>> potentailly)
>>>>>>>> maybe planned at the same time. We solved it through maintaining a 
>>>>>>>> cache
>>>>>>>> and carefully setting the cache in a thread local.
>>>>>>>> Relevant code for inspiration:
>>>>>>>> https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/prepare/Materializer.java
>>>>>>>> https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/plan/QuarkMaterializeCluster.java
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Sun, Aug 27, 2017 at 6:50 PM Christian Beikov 
>>>>>>>> <[email protected]>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hey, I have been looking a bit into how materialized views perform
>>>>>>>>> during the planning because of a very long test
>>>>>>>>> run(MaterializationTest#testJoinMaterializationUKFK6) and the current
>>>>>>>>> state is problematic.
>>>>>>>>> 
>>>>>>>>> CalcitePrepareImpl#getMaterializations always reparses the SQL and 
>>>>>>>>> down
>>>>>>>>> the line, there is a lot of expensive work(e.g. predicate and lineage
>>>>>>>>> determination) done during planning that could easily be 
>>>>>>>>> pre-calculated
>>>>>>>>> and cached during materialization creation.
>>>>>>>>> 
>>>>>>>>> There is also a bit of a thread safety problem with the current
>>>>>>>>> implementation. Unless there is a different safety mechanism that I
>>>>>>>>> don't see, the sharing of the MaterializationService and thus also the
>>>>>>>>> maps in MaterializationActor via a static instance between multiple
>>>>>>>>> threads is problematic.
>>>>>>>>> 
>>>>>>>>> Since I mentioned thread safety, how is Calcite supposed to be used 
>>>>>>>>> in a
>>>>>>>>> multi-threaded environment? Currently I use a connection pool that
>>>>>>>>> initializes the schema on new connections, but that is not really 
>>>>>>>>> nice.
>>>>>>>>> I suppose caches are also bound to the connection? A thread safe 
>>>>>>>>> context
>>>>>>>>> that can be shared between connections would be nice to avoid all that
>>>>>>>>> repetitive work.
>>>>>>>>> 
>>>>>>>>> Are these known issues which you have thought about how to fix or 
>>>>>>>>> should
>>>>>>>>> I log JIRAs for these and fix them to the best of my knowledge? I'd 
>>>>>>>>> more
>>>>>>>>> or less keep the service shared but would implement it using a copy on
>>>>>>>>> write strategy since I'd expect seldom schema changes after startup.
>>>>>>>>> 
>>>>>>>>> Regarding the repetitive work that partly happens during planning, I'd
>>>>>>>>> suggest doing that during materialization registration instead like it
>>>>>>>>> is already mentioned CalcitePrepareImpl#populateMaterializations. 
>>>>>>>>> Would
>>>>>>>>> that be ok?
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> 
>>>>>>>>> Mit freundlichen Grüßen,
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>> *Christian Beikov*
>>>>>>>>> 
>

Re: Materialization performance

Reply via email to