Also, forgive my ignorance but is it possible to get the sort of predicate information I mentioned earlier in this email thread from a SqlNode? I don't see an obvious way in the API but that might solve my use case much easier if possible.
On Tue, Dec 21, 2021 at 4:14 PM Jeremy Dyer <[email protected]> wrote: > Jacques, Interesting ... I'm not familiar with the table scan operator. Is > there a good unit test or some example you could point me to using it? Of > course doesn't have to be the exact same thing but some sort of example to > familiarize myself with the code base there and how I could extend it. > > - Jeremy Dyer > > On Tue, Dec 21, 2021 at 3:53 PM Jacques Nadeau <[email protected]> wrote: > >> If I understand your situation correctly, I would typically model this as >> a >> rule in Calcite that pushes partial predicates (whatever is supported in >> the underlying library) directly into the table scan operator. At that >> point, the references are directly related to the original table schema >> (and thus the names are known). Then convert that scan w/predicate >> operator >> into something that can be consumed from lower levels. >> >> On Tue, Dec 21, 2021 at 10:25 AM Jeremy Dyer <[email protected]> wrote: >> >> > Hi Vladimir, >> > >> > I'm certain my design has room for improvement and would love any >> > suggestions. Here is the use case. >> > >> > I'm working on Dask-SQL [1]. We wrap Calcite with a Python layer and use >> > Calcite to parse, validate, and generate relational algebra. From the >> > relational algebra generated we in turn convert those to Dask Python >> (and >> > therefore Dataframe) API calls. Leaving out a lot of detail in a >> nutshell >> > this is the order of what happens. >> > >> > 1.) Parse SQL Python str to SqlNode >> > 2.) Generate RelNode from SqlNode >> > 3.) Convert each RexNode into a Python Pandas/cuDF Dataframe - this is >> the >> > step where I want to get the original SQL identifier at >> > >> > For step 3 there are some large performance gains that can be achieved >> by >> > using "predicate pushdown" in the IO readers and for example only >> reading >> > certain columns from a Parquet or ORC file. The format needed to achieve >> > this is DNF and requires the original column names so those predicates >> can >> > be passed down into the implementation libraries. The problem is those >> > libraries already exist as CUDA C/C++ implementations and cannot be >> > modified. >> > >> > Does that make sense? If there is a more intelligent way to conditional >> > predicates from the SQL query, even if it isn't at the Rex level I would >> > love to hear suggestions >> > >> > [1] - https://github.com/dask-contrib/dask-sql >> > >> > On Tue, Dec 21, 2021 at 1:05 PM Vladimir Ozerov <[email protected]> >> > wrote: >> > >> > > Hi Jeremy, >> > > >> > > Could you please share the use case behind this requirement? In the >> > general >> > > case, it is not possible to link RelNode's attributes to specific >> > > identifiers. For this reason, an attempt to extract such identifier >> from >> > > any "rel" except for the RelRoot might indicate a design issue. >> > > >> > > Regards, >> > > Vladimir. >> > > >> > > вт, 21 дек. 2021 г. в 20:34, Jeremy Dyer <[email protected]>: >> > > >> > > > Hello, >> > > > >> > > > Is it possible to get the original SQL identifier from an instance >> of >> > > > RexInputRef? For example given a simple query like >> > > > >> > > > SELECT id FROM employees WHERE fname = 'adam' >> > > > >> > > > Instead of the ordinal name generated by RexInputRef ($11, for >> > example). >> > > I >> > > > would like to find the original SQL identifier (fname, for example) >> > > > >> > > > Thanks, >> > > > Jeremy Dyer >> > > > >> > > >> > >> >
