Hi Sandeep, Thank you for sharing your experience. Your feedback matches with what I usually see in practice. IMO there are two main problems with Apache Calcite:
1. Documentation that focuses on a single edge case - the end-to-end execution of federated queries with Enumerable. In practice, this approach rarely could be used for anything but simple projects. Instead, major Apache Calcite users (Apache Hive, Apache Flink, etc) do what you mentioned as "unconventional use": they use separate Calcite components for query optimization, translation, DSLs, etc. And there is virtually no documentation on how to use them. So your use case is not unusual; there is just no docs for it. 2. Another big problem is stability and quality. There are many bugs in optimization rules, rel-to-sql conversion, etc. When doing a sophisticated Apache Calcite integration, you should be ready to find a creative workarounds for various bugs: method overrides, unusual class hierarchies, copy-pasting of Apache Calcite codebase, etc. Unfortunately, there is no major commercial vendor behind the community that systematically invests in documentation and quality, so it is what it is. However, if you manage to get through these difficulties, you will get an extremely powerful tool, and I personally not aware of any other project that is even close to Apache Calcite in query optimization and translation capabilities. This is why there are more and more database and data management projects that choose Apache Calcite as the backbone of their query processing. If your use case is simple, then perhaps Apache Calcite might be too heavy-weight. However, if you plan to make your system more complicated, I would recommend you to keep Apache Calcite in mind still. Regards, Vladimir. чт, 4 авг. 2022 г. в 04:42, Sandeep N <[email protected]>: > Hi all, > > I wanted to share my experience with one of the current projects that I > tried to use Calcite with and through this experience hope to shed light on > some of the challenges I ran into and share my thoughts on how they could > be addressed. > > Apologies for the long drawn explanation but I think context will help you > guys understand the challenges I hope to surface. > > For the tldr version jump to the paragraph with heading "In Summary". > > Let me first note that my use of Calcite is unconventional in the sense > that I was looking to convert queries written in a homegrown DSL to SQL > that could be run on Snowflake. There were two main challenges : (a) > Mapping the DSL to SQL and (b) Generating SQL compliant to Snowflake with > support for custom data-types like VARIANT. > > The approach I landed on based on feedback from this forum was to take the > AST from the DSL, use a tree walker to generate a relational algebra tree > which then can generate SQL using one of the Dialects (Snowflake to be > added). > > The first task involved asking questions on the dev forum and reading > through previously asked questions. Here I have to extend thanks to > Stamatis for the pointers and links he provided which helped immensely. In > the end it took those pointers + step through debugging of the code via > unit-tests to figure how to use the relational algebra tree (RelNode and > RexNode). > > The second task was to add Snowflake specific constructs and here again > thanks to Stamatis and Julian I had pointers to start looking at the > pieces. But before I went down the path of extending relational algebra or > figuring adding snowflake functions I shared my prototype work with my team > and talked through the approach and the Calcite integration. My goal was to > get a general sense of whether my team thought this was a good path forward > (some of the reason for this will be obvious in the In Summary section). > > A few of the team members looked at extending my prototype with Calcite and > after a couple of weeks of working with it we as a team arrived at the > decision that we will hold off on using Calcite and instead build > something simpler and homegrown. "Simpler" because the types of queries we > needed to translate to SQL were very limited and did not exercise all the > possible types of SQL queries. We may revisit this in the future and come > back to Calcite. > > In Summary : The feedback I have is as follows > (a) Calcite does not have enough examples, docs and how-to guides so it is > difficult to get started. For this one I think our usage of Calcite being > unconventional added to the complexity as the samples might not directly > map to what we were looking for - one needs to extrapolate from what is > done from SQL to applying the same for another DSL. > > (b) Learning curve for Calcite is steep. - It takes a lot of time and > patience to debug through code and understand Calcite before using it. It > takes a certain mind-set of "don't give up" and "must figure this out" to > make headway. My experience with some of the technologies (having written > micro-containers and classloaders) have left me with a higher threshold of > patience - but that is not for everyone. > > I will say that the dev forums are prompt to respond to queries but at > times it might not be adequate for newbies - One suggestion I can think of > is to pair a newbie with a seasoned Calcite contributor as a mentor for > tasks which need such a pairing e.g. adding a new Database Dialect. It is a > fairly big commitment/investment on the part of the mentor but it might > make inroads into the project easier. Mentors could assign tasks of > improving docs/samples/how-to guides to the newbies as they work with them. > It would address the challenges for future adopters and also build > confidence in the newbies to contribute. > > I will end on a thank you note, just the process of figuring out the > relational algebra tree and translation taught me a lot. Thanks to you guys > for all the contributions. Like I said earlier I am still hopeful we will > come back to Calcite and thus plan to stay plugged in. > > Cheers, > > --Sandeep Nayak >
