Also, forgive my ignorance but is it possible to get the sort of predicate
information I mentioned earlier in this email thread from a SqlNode? I
don't see an obvious way in the API but that might solve my use case much
easier if possible.

On Tue, Dec 21, 2021 at 4:14 PM Jeremy Dyer <[email protected]> wrote:

> Jacques, Interesting ... I'm not familiar with the table scan operator. Is
> there a good unit test or some example you could point me to using it? Of
> course doesn't have to be the exact same thing but some sort of example to
> familiarize myself with the code base there and how I could extend it.
>
> - Jeremy Dyer
>
> On Tue, Dec 21, 2021 at 3:53 PM Jacques Nadeau <[email protected]> wrote:
>
>> If I understand your situation correctly, I would typically model this as
>> a
>> rule in Calcite that pushes partial predicates (whatever is supported in
>> the underlying library) directly into the table scan operator. At that
>> point, the references are directly related to the original table schema
>> (and thus the names are known). Then convert that scan w/predicate
>> operator
>> into something that can be consumed from lower levels.
>>
>> On Tue, Dec 21, 2021 at 10:25 AM Jeremy Dyer <[email protected]> wrote:
>>
>> > Hi Vladimir,
>> >
>> > I'm certain my design has room for improvement and would love any
>> > suggestions. Here is the use case.
>> >
>> > I'm working on Dask-SQL [1]. We wrap Calcite with a Python layer and use
>> > Calcite to parse, validate, and generate relational algebra. From the
>> > relational algebra generated we in turn convert those to Dask Python
>> (and
>> > therefore Dataframe) API calls. Leaving out a lot of detail in a
>> nutshell
>> > this is the order of what happens.
>> >
>> > 1.) Parse SQL Python str to SqlNode
>> > 2.) Generate RelNode from SqlNode
>> > 3.) Convert each RexNode into a Python Pandas/cuDF Dataframe - this is
>> the
>> > step where I want to get the original SQL identifier at
>> >
>> > For step 3 there are some large performance gains that can be achieved
>> by
>> > using "predicate pushdown" in the IO readers and for example only
>> reading
>> > certain columns from a Parquet or ORC file. The format needed to achieve
>> > this is DNF and requires the original column names so those predicates
>> can
>> > be passed down into the implementation libraries. The problem is those
>> > libraries already exist as CUDA C/C++ implementations and cannot be
>> > modified.
>> >
>> > Does that make sense? If there is a more intelligent way to conditional
>> > predicates from the SQL query, even if it isn't at the Rex level I would
>> > love to hear suggestions
>> >
>> > [1] - https://github.com/dask-contrib/dask-sql
>> >
>> > On Tue, Dec 21, 2021 at 1:05 PM Vladimir Ozerov <[email protected]>
>> > wrote:
>> >
>> > > Hi Jeremy,
>> > >
>> > > Could you please share the use case behind this requirement? In the
>> > general
>> > > case, it is not possible to link RelNode's attributes to specific
>> > > identifiers. For this reason, an attempt to extract such identifier
>> from
>> > > any "rel" except for the RelRoot might indicate a design issue.
>> > >
>> > > Regards,
>> > > Vladimir.
>> > >
>> > > вт, 21 дек. 2021 г. в 20:34, Jeremy Dyer <[email protected]>:
>> > >
>> > > > Hello,
>> > > >
>> > > > Is it possible to get the original SQL identifier from an instance
>> of
>> > > > RexInputRef? For example given a simple query like
>> > > >
>> > > > SELECT id FROM employees WHERE fname = 'adam'
>> > > >
>> > > > Instead of the ordinal name generated by RexInputRef ($11, for
>> > example).
>> > > I
>> > > > would like to find the original SQL identifier (fname, for example)
>> > > >
>> > > > Thanks,
>> > > > Jeremy Dyer
>> > > >
>> > >
>> >
>>
>

Reply via email to