Re: Materialization performance

Rajat Venkatesh Sun, 27 Aug 2017 22:44:37 -0700

Thread Safety and repeated parsing is a problem. We have experience with
managing 10s of materialized views. Repeated parsing takes more time than
execution of the query itself. We also have a similar problem where
concurrent queries (with a different set of materialized views potentailly)
maybe planned at the same time. We solved it through maintaining a cache
and carefully setting the cache in a thread local.
Relevant code for inspiration:
https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/prepare/Materializer.java
https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/plan/QuarkMaterializeCluster.java




On Sun, Aug 27, 2017 at 6:50 PM Christian Beikov <[email protected]>
wrote:

> Hey, I have been looking a bit into how materialized views perform
> during the planning because of a very long test
> run(MaterializationTest#testJoinMaterializationUKFK6) and the current
> state is problematic.
>
> CalcitePrepareImpl#getMaterializations always reparses the SQL and down
> the line, there is a lot of expensive work(e.g. predicate and lineage
> determination) done during planning that could easily be pre-calculated
> and cached during materialization creation.
>
> There is also a bit of a thread safety problem with the current
> implementation. Unless there is a different safety mechanism that I
> don't see, the sharing of the MaterializationService and thus also the
> maps in MaterializationActor via a static instance between multiple
> threads is problematic.
>
> Since I mentioned thread safety, how is Calcite supposed to be used in a
> multi-threaded environment? Currently I use a connection pool that
> initializes the schema on new connections, but that is not really nice.
> I suppose caches are also bound to the connection? A thread safe context
> that can be shared between connections would be nice to avoid all that
> repetitive work.
>
> Are these known issues which you have thought about how to fix or should
> I log JIRAs for these and fix them to the best of my knowledge? I'd more
> or less keep the service shared but would implement it using a copy on
> write strategy since I'd expect seldom schema changes after startup.
>
> Regarding the repetitive work that partly happens during planning, I'd
> suggest doing that during materialization registration instead like it
> is already mentioned CalcitePrepareImpl#populateMaterializations. Would
> that be ok?
>
> --
>
> Mit freundlichen Grüßen,
> ------------------------------------------------------------------------
> *Christian Beikov*
>

Re: Materialization performance

Reply via email to