Thread Safety and repeated parsing is a problem. We have experience with managing 10s of materialized views. Repeated parsing takes more time than execution of the query itself. We also have a similar problem where concurrent queries (with a different set of materialized views potentailly) maybe planned at the same time. We solved it through maintaining a cache and carefully setting the cache in a thread local. Relevant code for inspiration: https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/prepare/Materializer.java https://github.com/qubole/quark/blob/master/optimizer/src/main/java/org/apache/calcite/plan/QuarkMaterializeCluster.java
On Sun, Aug 27, 2017 at 6:50 PM Christian Beikov <[email protected]> wrote: > Hey, I have been looking a bit into how materialized views perform > during the planning because of a very long test > run(MaterializationTest#testJoinMaterializationUKFK6) and the current > state is problematic. > > CalcitePrepareImpl#getMaterializations always reparses the SQL and down > the line, there is a lot of expensive work(e.g. predicate and lineage > determination) done during planning that could easily be pre-calculated > and cached during materialization creation. > > There is also a bit of a thread safety problem with the current > implementation. Unless there is a different safety mechanism that I > don't see, the sharing of the MaterializationService and thus also the > maps in MaterializationActor via a static instance between multiple > threads is problematic. > > Since I mentioned thread safety, how is Calcite supposed to be used in a > multi-threaded environment? Currently I use a connection pool that > initializes the schema on new connections, but that is not really nice. > I suppose caches are also bound to the connection? A thread safe context > that can be shared between connections would be nice to avoid all that > repetitive work. > > Are these known issues which you have thought about how to fix or should > I log JIRAs for these and fix them to the best of my knowledge? I'd more > or less keep the service shared but would implement it using a copy on > write strategy since I'd expect seldom schema changes after startup. > > Regarding the repetitive work that partly happens during planning, I'd > suggest doing that during materialization registration instead like it > is already mentioned CalcitePrepareImpl#populateMaterializations. Would > that be ok? > > -- > > Mit freundlichen Grüßen, > ------------------------------------------------------------------------ > *Christian Beikov* >
