Hey, I have been looking a bit into how materialized views perform during the planning because of a very long test run(MaterializationTest#testJoinMaterializationUKFK6) and the current state is problematic.

CalcitePrepareImpl#getMaterializations always reparses the SQL and down the line, there is a lot of expensive work(e.g. predicate and lineage determination) done during planning that could easily be pre-calculated and cached during materialization creation.

There is also a bit of a thread safety problem with the current implementation. Unless there is a different safety mechanism that I don't see, the sharing of the MaterializationService and thus also the maps in MaterializationActor via a static instance between multiple threads is problematic.

Since I mentioned thread safety, how is Calcite supposed to be used in a multi-threaded environment? Currently I use a connection pool that initializes the schema on new connections, but that is not really nice. I suppose caches are also bound to the connection? A thread safe context that can be shared between connections would be nice to avoid all that repetitive work.

Are these known issues which you have thought about how to fix or should I log JIRAs for these and fix them to the best of my knowledge? I'd more or less keep the service shared but would implement it using a copy on write strategy since I'd expect seldom schema changes after startup.

Regarding the repetitive work that partly happens during planning, I'd suggest doing that during materialization registration instead like it is already mentioned CalcitePrepareImpl#populateMaterializations. Would that be ok?

--

Mit freundlichen Grüßen,
------------------------------------------------------------------------
*Christian Beikov*

Reply via email to