Materialization performance

Christian Beikov Sun, 27 Aug 2017 06:20:35 -0700

Hey, I have been looking a bit into how materialized views performduring the planning because of a very long testrun(MaterializationTest#testJoinMaterializationUKFK6) and the currentstate is problematic.

CalcitePrepareImpl#getMaterializations always reparses the SQL and downthe line, there is a lot of expensive work(e.g. predicate and lineagedetermination) done during planning that could easily be pre-calculatedand cached during materialization creation.

There is also a bit of a thread safety problem with the currentimplementation. Unless there is a different safety mechanism that Idon't see, the sharing of the MaterializationService and thus also themaps in MaterializationActor via a static instance between multiplethreads is problematic.

Since I mentioned thread safety, how is Calcite supposed to be used in amulti-threaded environment? Currently I use a connection pool thatinitializes the schema on new connections, but that is not really nice.I suppose caches are also bound to the connection? A thread safe contextthat can be shared between connections would be nice to avoid all thatrepetitive work.

Are these known issues which you have thought about how to fix or shouldI log JIRAs for these and fix them to the best of my knowledge? I'd moreor less keep the service shared but would implement it using a copy onwrite strategy since I'd expect seldom schema changes after startup.

Regarding the repetitive work that partly happens during planning, I'dsuggest doing that during materialization registration instead like itis already mentioned CalcitePrepareImpl#populateMaterializations. Wouldthat be ok?


--

Mit freundlichen Grüßen,
------------------------------------------------------------------------
*Christian Beikov*

Materialization performance

Reply via email to