Re: Implementing HASH(*)

Julian Hyde Tue, 22 Aug 2023 13:45:13 -0700

Fwiw, the “*” in the syntax of "COUNT(*)” was a mistake.

By analogy with “SELECT *” and with how Unix treats commands like “ls -l *”, 
people expect “*” to be expanded to all columns. But suppose I have T (x, y) 
with values (0, null), (1, 0), (null, 0). Do I expect “select count(*) from t” 
to return the same results as “select count(x, y) from t”?


No. The first query returns 3, and the second query returns 1 (because there is 
only one row where neither x and y is null). You can try it:

  $ ./sqlline -u 
'jdbc:calcite:model=core/src/test/resources/hsqldb-model.json;fun=calcite' -n 
'sa' -p 'sa'
  > with t (x, y) as (values (0,null),(1, 0),(null,1)) select count(x) from t;
  2
  1 row selected
  > with t (x, y) as (values (0,null),(1, 0),(null,1)) select count(x,y) from t;
  1
  1 row selected

When someone writes “COUNT(*)” Calcite converts it to “COUNT()” internally.

Julian


> On Aug 16, 2023, at 6:45 AM, Kian Nassre <[email protected]> wrote:
> 
> Also I had a few more follow up questions, if you have the availability to
> respond:
> - Is it `RelToSqlConverter` or `SqlToRelConverter` where the changes need
> to be made?
> - I already have the `HASH` function working in its non-star format (e.g.
> `HASH(A, B, C)` or `HASH(X)`). Would using `FUNCTION_STAR` interfere with
> that?
> Thank you so much for your time!
> 
> On Wed, Aug 16, 2023 at 9:35 AM Kian Nassre <[email protected]> wrote:
> 
>> Hello Hongyu Guo,
>> 
>> Interesting! I’ll definitely explore this as an option. I’m curious
>> though, do the convertlet tables in RelToSqlConverter have access to enough
>> information to be able to look at a call to HASH(*) and derive the table
>> names (+ their column names) corresponding the scope? My understanding is
>> that the context objects available have access to the validator, but then
>> we’re right back where we started with the problem of how to access the
>> table/column names within the validator.
>> 
>> On 2023/08/16 13:27:06 Hongyu Guo wrote:
>>> Hi Kian Nassre,
>>> I try to create a HASH function instance to solve this problem
>>> ```
>>> public static final SqlFunction HASH =
>>>    new SqlFunction("HASH",
>>>    SqlKind.OTHER_FUNCTION,
>>>    ReturnTypes.BIGINT,
>>>    null,
>>>    OperandTypes.ONE_OR_MORE,
>>>    SqlFunctionCategory.NUMERIC) {
>>>  @Override public SqlSyntax getSyntax() {
>>>    return SqlSyntax.FUNCTION_STAR;
>>>  }
>>> 
>>>  @Override public RelDataType deriveType(
>>>      SqlValidator validator, SqlValidatorScope scope, SqlCall call) {
>>>    return validator.getTypeFactory().createSqlType(
>>>        SqlTypeName.BIGINT);
>>>  }
>>> };
>>> ```
>>> It can pass validation, but it will throw an exception in the
>>> `RelToSqlConverter`
>>> stage. I notice `count(*)` is a special case similar to `hash(*)`, so I
>>> have looked into the
>>> code for COUNT function and AggConverter, where a trick is used to remove
>>> the `*` identifier in `count(*)`
>>> (org/apache/calcite/sql2rel/AggConverter.java:447),
>>> we cannot use similar implementations in `hash(*)`.
>>> 
>>> So i think one possible solution could be expanding `*` in `hash(*)`
>> during
>>> the
>>> `SqlToRelConverter` stage (a litter complex, you should derive the type
>> of
>>> `*`
>>> or `tbl1.*`), or check why `hash(*)` throws an exception in the
>>> `RelToSqlConverter`
>>> stage.
>>> 
>>> I hope this helps :)
>>> 
>>> Best
>>> Hongyu Guo
>>> 
>>> On 2023/08/15 04:31:41 Kian Nassre wrote:
>>>> Hello,
>>>> 
>>>> I’m working on a project that utilizes Calcite and I have interest in
>>>> supporting functions that allow using * to specify all of the existing
>>>> columns in a function call. For example, one function that I am
>> looking at
>>>> is the variadic Snowflake HASH function which can accept all columns
>> with
>>>> HASH(*). Trying to add this function as I would any other function I
>>>> encounter issues in the validator because it does not recognize * as a
>>>> valid identifier.
>>>> 
>>>> My understanding is that the *  is essentially syntactic sugar which
>>> should
>>>> be expanded in the same way that SELECT * FROM t is expanded early by
>> the
>>>> validator.
>>>> 
>>>> I’m wondering if anyone has any advice on the best way to add support
>> for
>>>> this functionality. I have done some experimentation with replicating
>> the
>>>> approach that the validator uses for expanding SELECT * FROM T into
>>>> multiple columns. I do this using a variant of the private method
>>>> expandStar.
>>>> 
>>>> The first problem comes when I try to write my variant of expandStar
>>>> because it seems I need to access the package private method
>>> getChildNames,
>>>> or to the .names attribute of the package private class ScopeChild, so
>> I’m
>>>> not understanding if that’s what I should be doing or if there’s an
>>>> alternate method. I’m using it because I need some way to access the
>> table
>>>> names from a scope.
>>>> 
>>>> The other problem is that to get my version of expandStar called in all
>>> the
>>>> places I need it to be called (e.g. inside of function calls, or
>> inside of
>>>> ORDER BY terms) I think I need to add logic that calls it to the
>> Expander
>>>> classes or its subclasses. My initial idea was to subclass all of the
>>>> expander classes and have my subclasses call my expandStar method, then
>>>> have all methods that use the expanders instead use my subclasses. This
>>>> would require changing the classes from private/package-private to
>>>> protected, which makes me wonder if I'm barking up the wrong tree.
>>>> 
>>>> The approach seems promising, but I’m not sure if I’m headed in the
>> right
>>>> direction?
>>>> 
>>>> Sincerely,
>>>> - Kian Nassre
>>>> 
>>>

Re: Implementing HASH(*)

Reply via email to