Hi all,

I wanted to share my experience with one of the current projects that I
tried to use Calcite with and through this experience hope to shed light on
some of the challenges I ran into and share my thoughts on how they could
be addressed.

Apologies for the long drawn explanation but I think context will help you
guys understand the challenges I hope to surface.

For the tldr version jump to the paragraph with heading "In Summary".

Let me first note that my use of Calcite  is unconventional in the sense
that I was looking to convert queries written in a homegrown DSL to SQL
that could be run on Snowflake. There were two main challenges : (a)
Mapping the DSL to SQL and (b) Generating SQL compliant to Snowflake with
support for custom data-types like VARIANT.

The approach I landed on based on feedback from this forum was to take the
AST from the DSL, use a tree walker to generate a relational algebra tree
which then can generate SQL using one of the Dialects (Snowflake to be
added).

The first task involved asking questions on the dev forum and reading
through previously asked questions. Here I have to extend thanks to
Stamatis for the pointers and links he provided which helped immensely. In
the end it took those pointers + step through debugging of the code via
unit-tests to figure how to use the relational algebra tree (RelNode and
RexNode).

The second task was to add Snowflake specific constructs and here again
thanks to  Stamatis and Julian I had pointers to start looking at the
pieces. But before I went down the path of extending relational algebra or
figuring adding snowflake functions I shared my prototype work with my team
and talked through the approach and the Calcite integration. My goal was to
get a general sense of whether my team thought this was a good path forward
(some of the reason for this will be obvious in the In Summary section).

A few of the team members looked at extending my prototype with Calcite and
after a couple of weeks of working with it we as a team arrived at the
decision that we will hold off on using  Calcite and instead build
something simpler and homegrown. "Simpler" because the types of queries we
needed to translate to SQL were very limited and did not exercise all the
possible types of SQL queries. We may revisit this in the future and come
back to Calcite.

In Summary : The feedback I have is as follows
(a) Calcite does not have enough examples, docs and how-to guides so it is
difficult to get started. For this one I think our usage of Calcite being
unconventional added to the complexity as the samples might not directly
map to what we were looking for - one needs to extrapolate from what is
done from SQL to applying the same for another DSL.

(b) Learning curve for Calcite is steep. - It takes a lot of time and
patience to debug through code and understand Calcite before using it. It
takes a certain mind-set of "don't give up" and "must figure this out" to
make headway. My experience with some of the technologies (having written
micro-containers and classloaders) have left me with a higher threshold of
patience - but that is not for everyone.

I will say that the dev forums are prompt to respond to queries but at
times it might not be adequate for newbies - One suggestion I can think of
is to pair a newbie with a seasoned Calcite contributor as a mentor for
tasks which need such a pairing e.g. adding a new Database Dialect. It is a
fairly big commitment/investment on the part of the mentor but it might
make inroads into the project easier. Mentors could assign tasks of
improving docs/samples/how-to guides to the newbies as they work with them.
It would address the challenges for future adopters and also build
confidence in the newbies to contribute.

I will end on a thank you note, just the process of figuring out the
relational algebra tree and translation taught me a lot. Thanks to you guys
for all the contributions. Like I said earlier I am still hopeful we will
come back to Calcite and thus plan to stay plugged in.

Cheers,

--Sandeep Nayak

Reply via email to