Hi all, I wanted to share my experience with one of the current projects that I tried to use Calcite with and through this experience hope to shed light on some of the challenges I ran into and share my thoughts on how they could be addressed.
Apologies for the long drawn explanation but I think context will help you guys understand the challenges I hope to surface. For the tldr version jump to the paragraph with heading "In Summary". Let me first note that my use of Calcite is unconventional in the sense that I was looking to convert queries written in a homegrown DSL to SQL that could be run on Snowflake. There were two main challenges : (a) Mapping the DSL to SQL and (b) Generating SQL compliant to Snowflake with support for custom data-types like VARIANT. The approach I landed on based on feedback from this forum was to take the AST from the DSL, use a tree walker to generate a relational algebra tree which then can generate SQL using one of the Dialects (Snowflake to be added). The first task involved asking questions on the dev forum and reading through previously asked questions. Here I have to extend thanks to Stamatis for the pointers and links he provided which helped immensely. In the end it took those pointers + step through debugging of the code via unit-tests to figure how to use the relational algebra tree (RelNode and RexNode). The second task was to add Snowflake specific constructs and here again thanks to Stamatis and Julian I had pointers to start looking at the pieces. But before I went down the path of extending relational algebra or figuring adding snowflake functions I shared my prototype work with my team and talked through the approach and the Calcite integration. My goal was to get a general sense of whether my team thought this was a good path forward (some of the reason for this will be obvious in the In Summary section). A few of the team members looked at extending my prototype with Calcite and after a couple of weeks of working with it we as a team arrived at the decision that we will hold off on using Calcite and instead build something simpler and homegrown. "Simpler" because the types of queries we needed to translate to SQL were very limited and did not exercise all the possible types of SQL queries. We may revisit this in the future and come back to Calcite. In Summary : The feedback I have is as follows (a) Calcite does not have enough examples, docs and how-to guides so it is difficult to get started. For this one I think our usage of Calcite being unconventional added to the complexity as the samples might not directly map to what we were looking for - one needs to extrapolate from what is done from SQL to applying the same for another DSL. (b) Learning curve for Calcite is steep. - It takes a lot of time and patience to debug through code and understand Calcite before using it. It takes a certain mind-set of "don't give up" and "must figure this out" to make headway. My experience with some of the technologies (having written micro-containers and classloaders) have left me with a higher threshold of patience - but that is not for everyone. I will say that the dev forums are prompt to respond to queries but at times it might not be adequate for newbies - One suggestion I can think of is to pair a newbie with a seasoned Calcite contributor as a mentor for tasks which need such a pairing e.g. adding a new Database Dialect. It is a fairly big commitment/investment on the part of the mentor but it might make inroads into the project easier. Mentors could assign tasks of improving docs/samples/how-to guides to the newbies as they work with them. It would address the challenges for future adopters and also build confidence in the newbies to contribute. I will end on a thank you note, just the process of figuring out the relational algebra tree and translation taught me a lot. Thanks to you guys for all the contributions. Like I said earlier I am still hopeful we will come back to Calcite and thus plan to stay plugged in. Cheers, --Sandeep Nayak
