Ahh, numeric with unspecified precision has the special meaning that it will
retain whatever precision is stored in the column. From
https://www.postgresql.org/docs/9.1/static/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL
:
"Specifying: NUMERIC without any precision or scale creates a column in which
numeric values of any precision and scale can be stored, up to the
implementation limit on precision. A column of this kind will not coerce input
values to any particular scale, whereas numeric columns with a declared scale
will coerce input values to that scale. (The SQL standard requires a default
scale of 0, i.e., coercion to integer precision. We find this a bit useless. If
you're concerned about portability, always specify the precision and scale
explicitly.)"
So even though the JDBC driver reports a precision of zero, it actually means
arbitrary precision when it is on a numeric/decimal column. I'm guessing that
extending SqlDialect.getCastSpec(RelDataType) in PostgresqlSqlDialect is the
right place to fix this?
-Marc
> On Oct 31, 2017, at 11:38 AM, Julian Hyde <[email protected]> wrote:
>
> I don’t recall whether DECIMAL without precision + scale is even valid. (Or,
> if Calcite treats it a “valid”, maybe Calcite is wrong, and should be giving
> an error.)
>
>> On Oct 31, 2017, at 7:07 AM, Marc Prud'hommeaux <[email protected]> wrote:
>>
>>
>> I understand, and I agree that the behavior is a sensible compromise. But in
>> this case, price is a decimal column, and so the average call should also be
>> a decimal, wheres it is being rounded to an integer (actually, a DECIMAL(19,
>> 0) as you can see in the server log).
>>
>> However, it looks like this might only be an issue with PostgreSQL:
>>
>> 0: jdbc:calcite:schemaType=JDBC> select "price" from "products" limit 1;
>> +---------------------+
>> | price |
>> +---------------------+
>> | 25.99 | <-- price is a decimal...
>> +---------------------+
>> 1 row selected (0.217 seconds)
>>
>> 0: jdbc:calcite:schemaType=JDBC> select avg("price") from "products";
>> +---------------------+
>> | EXPR$0 |
>> +---------------------+
>> | 20 | <-- ... but the average isn't
>> +---------------------+
>>
>> 1 row selected (0.063 seconds)
>>
>> 0: jdbc:calcite:schemaType=JDBC> select avg(cast("price" as decimal)) from
>> "products";
>> +---------------------+
>> | EXPR$0 |
>> +---------------------+
>> | 20 | <-- casting to a non-precision decimal doesn't
>> help...
>> +---------------------+
>> 1 row selected (0.067 seconds)
>>
>> 0: jdbc:calcite:schemaType=JDBC> select avg(cast("price" as decimal(6,4)))
>> from "products";
>> +--------+
>> | EXPR$0 |
>> +--------+
>> | 20.0151| <-- but specifying the precision does
>> +--------+
>> 1 row selected (0.066 seconds)
>>
>>
>>
>> Perhaps Calcite is missing the precision of the column when it reads the
>> metadata, or the driver is misreporting the precision? Because executing
>> directly against the PostgreSQL driver yields the correct behavior:
>>
>> 0: jdbc:postgresql://localhost/dvdstore> select avg(cast("price" as
>> decimal)) from "products";
>> +-----------------------+
>> | 20.0151000000000000 |
>> +-----------------------+
>> 1 row selected (0.022 seconds)
>>
>>
>> I’ll dig a bit further and submit a PR if I can find a fix.
>>
>> Thanks for your help!
>>
>> -Marc
>>
>>
>>
>>> On Oct 30, 2017, at 11:16 AM, Julian Hyde <[email protected]> wrote:
>>>
>>> The bug explains the current behavior. The behavior is not what everyone
>>> would like, but it is what it is. I happen to like it because it is simple.
>>> The behavior is this: if have a column c of type T, then AVG(c) will have
>>> type T. If c is an INTEGER, then AVG will return an INTEGER. If you cast
>>> that result to DOUBLE, surprise surprise, that DOUBLE has no fractional
>>> part.
>>>
>>> The solution is to convert the column before applying AVG: AVG(CAST(c AS
>>> DOUBLE)) will return DOUBLE.
>>>
>>> Julian
>>>
>>>
>>>> On Oct 30, 2017, at 5:54 AM, Marc Prud'hommeaux <[email protected]> wrote:
>>>>
>>>>
>>>> I had noticed that issue, but it purports to be closed for 1.14.0, which I
>>>> am using.
>>>>
>>>> It only seems to affect AVG; other aggregates don’t appear to be rounded.
>>>> E.g.:
>>>>
>>>> 0: jdbc:calcite:schemaType=JDBC> select min("price"), sum("price"),
>>>> avg("price") from "products";
>>>> +---------------------+---------------------+---------------------+
>>>> | EXPR$0 | EXPR$1 | EXPR$2 |
>>>> +---------------------+---------------------+---------------------+
>>>> | 9.99 | 200151.00 | 20 |
>>>> +---------------------+---------------------+---------------------+
>>>>
>>>> yields the server-side log:
>>>>
>>>> 2017-10-30 08:16:00 EDT [4272-13] dgdemo@dvdstore LOG: execute <unnamed>:
>>>> SELECT MIN("price"), CASE WHEN COUNT(*) = 0 THEN NULL ELSE SUM("price")
>>>> END, CAST(CASE WHEN COUNT(*) = 0 THEN NULL ELSE SUM("price") END /
>>>> COUNT(*) AS DECIMAL(19, 0)) FROM “products"
>>>>
>>>> Is this something I can get around by implementing my own
>>>> RelDataTypeSystem? If so, I’ll experiment with that.
>>>>
>>>> -Marc
>>>>
>>>>
>>>>
>>>>> On Oct 29, 2017, at 7:11 PM, Julian Hyde <[email protected]> wrote:
>>>>>
>>>>> See https://issues.apache.org/jira/browse/CALCITE-1945
>>>>> <https://issues.apache.org/jira/browse/CALCITE-1945>.
>>>>>
>>>>>> On Oct 29, 2017, at 3:31 PM, Marc Prud'hommeaux <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> When I run the following against a Calcite connection containing the
>>>>>> PostgreSQL "dvdstore" sample database:
>>>>>>
>>>>>> select avg(products.price) from dvdstore.products group by
>>>>>> products.category
>>>>>>
>>>>>> The following SQL is executed on the server:
>>>>>>
>>>>>> SELECT CAST(SUM("price") / COUNT(*) AS DECIMAL(19, 0)) FROM "products"
>>>>>> GROUP BY “category"
>>>>>>
>>>>>> Is there some way I can prevent Calcite from rounding it (price is a
>>>>>> decimal type)? Is there some reason it isn’t just sending the aggregate
>>>>>> as an AVG?
>>>>>>
>>>>>> -Marc
>>>>>>
>>>>>
>>>>
>>>
>>
>