If I understand your situation correctly, I would typically model this as a rule in Calcite that pushes partial predicates (whatever is supported in the underlying library) directly into the table scan operator. At that point, the references are directly related to the original table schema (and thus the names are known). Then convert that scan w/predicate operator into something that can be consumed from lower levels.
On Tue, Dec 21, 2021 at 10:25 AM Jeremy Dyer <[email protected]> wrote: > Hi Vladimir, > > I'm certain my design has room for improvement and would love any > suggestions. Here is the use case. > > I'm working on Dask-SQL [1]. We wrap Calcite with a Python layer and use > Calcite to parse, validate, and generate relational algebra. From the > relational algebra generated we in turn convert those to Dask Python (and > therefore Dataframe) API calls. Leaving out a lot of detail in a nutshell > this is the order of what happens. > > 1.) Parse SQL Python str to SqlNode > 2.) Generate RelNode from SqlNode > 3.) Convert each RexNode into a Python Pandas/cuDF Dataframe - this is the > step where I want to get the original SQL identifier at > > For step 3 there are some large performance gains that can be achieved by > using "predicate pushdown" in the IO readers and for example only reading > certain columns from a Parquet or ORC file. The format needed to achieve > this is DNF and requires the original column names so those predicates can > be passed down into the implementation libraries. The problem is those > libraries already exist as CUDA C/C++ implementations and cannot be > modified. > > Does that make sense? If there is a more intelligent way to conditional > predicates from the SQL query, even if it isn't at the Rex level I would > love to hear suggestions > > [1] - https://github.com/dask-contrib/dask-sql > > On Tue, Dec 21, 2021 at 1:05 PM Vladimir Ozerov <[email protected]> > wrote: > > > Hi Jeremy, > > > > Could you please share the use case behind this requirement? In the > general > > case, it is not possible to link RelNode's attributes to specific > > identifiers. For this reason, an attempt to extract such identifier from > > any "rel" except for the RelRoot might indicate a design issue. > > > > Regards, > > Vladimir. > > > > вт, 21 дек. 2021 г. в 20:34, Jeremy Dyer <[email protected]>: > > > > > Hello, > > > > > > Is it possible to get the original SQL identifier from an instance of > > > RexInputRef? For example given a simple query like > > > > > > SELECT id FROM employees WHERE fname = 'adam' > > > > > > Instead of the ordinal name generated by RexInputRef ($11, for > example). > > I > > > would like to find the original SQL identifier (fname, for example) > > > > > > Thanks, > > > Jeremy Dyer > > > > > >
