Fwiw, the “*” in the syntax of "COUNT(*)” was a mistake. By analogy with “SELECT *” and with how Unix treats commands like “ls -l *”, people expect “*” to be expanded to all columns. But suppose I have T (x, y) with values (0, null), (1, 0), (null, 0). Do I expect “select count(*) from t” to return the same results as “select count(x, y) from t”?
No. The first query returns 3, and the second query returns 1 (because there is only one row where neither x and y is null). You can try it: $ ./sqlline -u 'jdbc:calcite:model=core/src/test/resources/hsqldb-model.json;fun=calcite' -n 'sa' -p 'sa' > with t (x, y) as (values (0,null),(1, 0),(null,1)) select count(x) from t; 2 1 row selected > with t (x, y) as (values (0,null),(1, 0),(null,1)) select count(x,y) from t; 1 1 row selected When someone writes “COUNT(*)” Calcite converts it to “COUNT()” internally. Julian > On Aug 16, 2023, at 6:45 AM, Kian Nassre <[email protected]> wrote: > > Also I had a few more follow up questions, if you have the availability to > respond: > - Is it `RelToSqlConverter` or `SqlToRelConverter` where the changes need > to be made? > - I already have the `HASH` function working in its non-star format (e.g. > `HASH(A, B, C)` or `HASH(X)`). Would using `FUNCTION_STAR` interfere with > that? > Thank you so much for your time! > > On Wed, Aug 16, 2023 at 9:35 AM Kian Nassre <[email protected]> wrote: > >> Hello Hongyu Guo, >> >> Interesting! I’ll definitely explore this as an option. I’m curious >> though, do the convertlet tables in RelToSqlConverter have access to enough >> information to be able to look at a call to HASH(*) and derive the table >> names (+ their column names) corresponding the scope? My understanding is >> that the context objects available have access to the validator, but then >> we’re right back where we started with the problem of how to access the >> table/column names within the validator. >> >> On 2023/08/16 13:27:06 Hongyu Guo wrote: >>> Hi Kian Nassre, >>> I try to create a HASH function instance to solve this problem >>> ``` >>> public static final SqlFunction HASH = >>> new SqlFunction("HASH", >>> SqlKind.OTHER_FUNCTION, >>> ReturnTypes.BIGINT, >>> null, >>> OperandTypes.ONE_OR_MORE, >>> SqlFunctionCategory.NUMERIC) { >>> @Override public SqlSyntax getSyntax() { >>> return SqlSyntax.FUNCTION_STAR; >>> } >>> >>> @Override public RelDataType deriveType( >>> SqlValidator validator, SqlValidatorScope scope, SqlCall call) { >>> return validator.getTypeFactory().createSqlType( >>> SqlTypeName.BIGINT); >>> } >>> }; >>> ``` >>> It can pass validation, but it will throw an exception in the >>> `RelToSqlConverter` >>> stage. I notice `count(*)` is a special case similar to `hash(*)`, so I >>> have looked into the >>> code for COUNT function and AggConverter, where a trick is used to remove >>> the `*` identifier in `count(*)` >>> (org/apache/calcite/sql2rel/AggConverter.java:447), >>> we cannot use similar implementations in `hash(*)`. >>> >>> So i think one possible solution could be expanding `*` in `hash(*)` >> during >>> the >>> `SqlToRelConverter` stage (a litter complex, you should derive the type >> of >>> `*` >>> or `tbl1.*`), or check why `hash(*)` throws an exception in the >>> `RelToSqlConverter` >>> stage. >>> >>> I hope this helps :) >>> >>> Best >>> Hongyu Guo >>> >>> On 2023/08/15 04:31:41 Kian Nassre wrote: >>>> Hello, >>>> >>>> I’m working on a project that utilizes Calcite and I have interest in >>>> supporting functions that allow using * to specify all of the existing >>>> columns in a function call. For example, one function that I am >> looking at >>>> is the variadic Snowflake HASH function which can accept all columns >> with >>>> HASH(*). Trying to add this function as I would any other function I >>>> encounter issues in the validator because it does not recognize * as a >>>> valid identifier. >>>> >>>> My understanding is that the * is essentially syntactic sugar which >>> should >>>> be expanded in the same way that SELECT * FROM t is expanded early by >> the >>>> validator. >>>> >>>> I’m wondering if anyone has any advice on the best way to add support >> for >>>> this functionality. I have done some experimentation with replicating >> the >>>> approach that the validator uses for expanding SELECT * FROM T into >>>> multiple columns. I do this using a variant of the private method >>>> expandStar. >>>> >>>> The first problem comes when I try to write my variant of expandStar >>>> because it seems I need to access the package private method >>> getChildNames, >>>> or to the .names attribute of the package private class ScopeChild, so >> I’m >>>> not understanding if that’s what I should be doing or if there’s an >>>> alternate method. I’m using it because I need some way to access the >> table >>>> names from a scope. >>>> >>>> The other problem is that to get my version of expandStar called in all >>> the >>>> places I need it to be called (e.g. inside of function calls, or >> inside of >>>> ORDER BY terms) I think I need to add logic that calls it to the >> Expander >>>> classes or its subclasses. My initial idea was to subclass all of the >>>> expander classes and have my subclasses call my expandStar method, then >>>> have all methods that use the expanders instead use my subclasses. This >>>> would require changing the classes from private/package-private to >>>> protected, which makes me wonder if I'm barking up the wrong tree. >>>> >>>> The approach seems promising, but I’m not sure if I’m headed in the >> right >>>> direction? >>>> >>>> Sincerely, >>>> - Kian Nassre >>>> >>>
