[
https://issues.apache.org/jira/browse/PHOENIX-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alok Singh updated PHOENIX-2789:
--------------------------------
Description:
Table:
{code}
COST (
ACCOUNT_ID VARCHAR NOT NULL,
PRECISION TINYINT NOT NULL,
START_DATE TIMESTAMP NOT NULL,
SECONDARY_ACCOUNT_ID VARCHAR NOT NULL,
TAG VARCHAR NOT NULL,
VENDOR_ID VARCHAR NOT NULL,
SERVICE VARCHAR NOT NULL,
....
CONSTRAINT PK PRIMARY KEY (
ACCOUNT_ID,
PRECISION,
START_DATE,
SECONDARY_ACCOUNT_ID,
TAG,
VENDOR_ID,
SERVICE
) COMPRESSION='SNAPPY', SALT_BUCKETS=16;
{code}
When querying this table with a GROUP BY clause that contains the 'START_DATE"
column, the results returned are incorrectly aggregated. I find multiple rows
with the same "START_DATE" which should have been aggregated to a single row.
The workaround is to do a TO_CHAR(START_DATE) which causes the query to return
the correct results.
e.g:
{code}
----------------
select start_date, account_id,
sum(quantity) as total,
sum(discounted_quantity) as disc_total
from COST
where
start_date >= (to_date('2016-03-01'))
and precision = 1
and account_id in ('1234', '5678')
group start_date, account_id
--RESULT--
START_DATE ACCOUNT_ID
TOTAL DISC_TOTAL
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-03-01 16:00:00.0 1234
312 0
....
2016-03-07 16:00:00.0 1234
312 0
2016-03-08 16:00:00.0 1234
312 0
2016-03-09 16:00:00.0 1234
216 0
2016-02-29 16:00:00.0 5678
194 24
2016-03-01 16:00:00.0 5678
262 0
....
2016-03-07 16:00:00.0 5678
237 48
2016-03-08 16:00:00.0 5678
178 0
2016-03-09 16:00:00.0 1234
96 0
2016-03-09 16:00:00.0 5678
173 50
2016-03-10 16:00:00.0 1234
324 0
....
2016-03-07 16:00:00.0 5678
178 24
2016-03-08 16:00:00.0 5678
218 0
2016-03-09 16:00:00.0 5678
218 48
^^^^^^^^^^^^^^^^^^^^^^
Repeated rows
--------------------
{code}
Workaround:
{code}
select to_char(start_date), account_id,
sum(quantity) as total,
sum(discounted_quantity) as disc_total
from COST
where
start_date >= (to_date('2016-03-01'))
and precision = 1
and account_id in ('1234', '5678')
group to_char(start_date), account_id
--RESULT--
START_DATE ACCOUNT_ID
TOTAL DISC_TOTAL
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-03-01 00:00:00.000 1234
312 0
2016-03-01 00:00:00.000 5678
3,465 384
2016-03-02 00:00:00.000 1234
312 0
2016-03-02 00:00:00.000 5678
3,238 384
2016-03-03 00:00:00.000 1234
312 0
2016-03-03 00:00:00.000 5678
3,243 384
2016-03-04 00:00:00.000 1234
312 0
2016-03-04 00:00:00.000 5678
3,212 384
2016-03-05 00:00:00.000 1234
312 0
2016-03-05 00:00:00.000 5678
2,907 384
2016-03-06 00:00:00.000 1234
312 0
2016-03-06 00:00:00.000 5678
2,907 384
2016-03-07 00:00:00.000 1234
312 0
2016-03-07 00:00:00.000 5678
2,919 384
2016-03-08 00:00:00.000 1234
312 0
2016-03-08 00:00:00.000 5678
2,955 384
2016-03-09 00:00:00.000 1234
312 0
2016-03-09 00:00:00.000 5678
2,955 384
2016-03-10 00:00:00.000 1234
312 0
2016-03-10 00:00:00.000 5678
2,978 384
2016-03-11 00:00:00.000 1234
324 0
2016-03-11 00:00:00.000 5678
3,125 384
2016-03-12 00:00:00.000 1234
336 0
2016-03-12 00:00:00.000 5678
3,102 384
2016-03-13 00:00:00.000 1234
336 0
2016-03-13 00:00:00.000 5678
3,117 384
2016-03-14 00:00:00.000 1234
153 0
2016-03-14 00:00:00.000 5678
2,187 278
-----------------
{code}
was:
Table:
COST (
ACCOUNT_ID VARCHAR NOT NULL,
PRECISION TINYINT NOT NULL,
START_DATE TIMESTAMP NOT NULL,
SECONDARY_ACCOUNT_ID VARCHAR NOT NULL,
TAG VARCHAR NOT NULL,
VENDOR_ID VARCHAR NOT NULL,
SERVICE VARCHAR NOT NULL,
....
CONSTRAINT PK PRIMARY KEY (
ACCOUNT_ID,
PRECISION,
START_DATE,
SECONDARY_ACCOUNT_ID,
TAG,
VENDOR_ID,
SERVICE
) COMPRESSION='SNAPPY', SALT_BUCKETS=16;
When querying this table with a GROUP BY clause that contains the 'START_DATE"
column, the results returned are incorrectly aggregated. I find multiple rows
with the same "START_DATE" which should have been aggregated to a single row.
The workaround is to do a TO_CHAR(START_DATE) which causes the query to return
the correct results.
e.g:
----------------
select start_date, account_id,
sum(quantity) as total,
sum(discounted_quantity) as disc_total
from COST
where
start_date >= (to_date('2016-03-01'))
and precision = 1
and account_id in ('1234', '5678')
group start_date, account_id
--RESULT--
START_DATE ACCOUNT_ID
TOTAL DISC_TOTAL
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-03-01 16:00:00.0 1234
312 0
....
2016-03-07 16:00:00.0 1234
312 0
2016-03-08 16:00:00.0 1234
312 0
2016-03-09 16:00:00.0 1234
216 0
2016-02-29 16:00:00.0 5678
194 24
2016-03-01 16:00:00.0 5678
262 0
....
2016-03-07 16:00:00.0 5678
237 48
2016-03-08 16:00:00.0 5678
178 0
2016-03-09 16:00:00.0 1234
96 0
2016-03-09 16:00:00.0 5678
173 50
2016-03-10 16:00:00.0 1234
324 0
....
2016-03-07 16:00:00.0 5678
178 24
2016-03-08 16:00:00.0 5678
218 0
2016-03-09 16:00:00.0 5678
218 48
^^^^^^^^^^^^^^^^^^^^^^
Repeated rows
--------------------
Workaround:
select to_char(start_date), account_id,
sum(quantity) as total,
sum(discounted_quantity) as disc_total
from COST
where
start_date >= (to_date('2016-03-01'))
and precision = 1
and account_id in ('1234', '5678')
group to_char(start_date), account_id
--RESULT--
START_DATE ACCOUNT_ID
TOTAL DISC_TOTAL
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2016-03-01 00:00:00.000 1234
312 0
2016-03-01 00:00:00.000 5678
3,465 384
2016-03-02 00:00:00.000 1234
312 0
2016-03-02 00:00:00.000 5678
3,238 384
2016-03-03 00:00:00.000 1234
312 0
2016-03-03 00:00:00.000 5678
3,243 384
2016-03-04 00:00:00.000 1234
312 0
2016-03-04 00:00:00.000 5678
3,212 384
2016-03-05 00:00:00.000 1234
312 0
2016-03-05 00:00:00.000 5678
2,907 384
2016-03-06 00:00:00.000 1234
312 0
2016-03-06 00:00:00.000 5678
2,907 384
2016-03-07 00:00:00.000 1234
312 0
2016-03-07 00:00:00.000 5678
2,919 384
2016-03-08 00:00:00.000 1234
312 0
2016-03-08 00:00:00.000 5678
2,955 384
2016-03-09 00:00:00.000 1234
312 0
2016-03-09 00:00:00.000 5678
2,955 384
2016-03-10 00:00:00.000 1234
312 0
2016-03-10 00:00:00.000 5678
2,978 384
2016-03-11 00:00:00.000 1234
324 0
2016-03-11 00:00:00.000 5678
3,125 384
2016-03-12 00:00:00.000 1234
336 0
2016-03-12 00:00:00.000 5678
3,102 384
2016-03-13 00:00:00.000 1234
336 0
2016-03-13 00:00:00.000 5678
3,117 384
2016-03-14 00:00:00.000 1234
153 0
2016-03-14 00:00:00.000 5678
2,187 278
-----------------
> Bug in GROUP BY with timestamp column
> -------------------------------------
>
> Key: PHOENIX-2789
> URL: https://issues.apache.org/jira/browse/PHOENIX-2789
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 4.5.1
> Environment: * Phoenix 4.5.1
> * Hbase 1.1.2
> * Hadoop 2.7.1
> * JDK 1.7
> Reporter: Alok Singh
>
> Table:
> {code}
> COST (
> ACCOUNT_ID VARCHAR NOT NULL,
> PRECISION TINYINT NOT NULL,
> START_DATE TIMESTAMP NOT NULL,
> SECONDARY_ACCOUNT_ID VARCHAR NOT NULL,
> TAG VARCHAR NOT NULL,
> VENDOR_ID VARCHAR NOT NULL,
> SERVICE VARCHAR NOT NULL,
> ....
> CONSTRAINT PK PRIMARY KEY (
> ACCOUNT_ID,
> PRECISION,
> START_DATE,
> SECONDARY_ACCOUNT_ID,
> TAG,
> VENDOR_ID,
> SERVICE
> ) COMPRESSION='SNAPPY', SALT_BUCKETS=16;
> {code}
> When querying this table with a GROUP BY clause that contains the
> 'START_DATE" column, the results returned are incorrectly aggregated. I find
> multiple rows with the same "START_DATE" which should have been aggregated to
> a single row. The workaround is to do a TO_CHAR(START_DATE) which causes the
> query to return the correct results.
> e.g:
> {code}
> ----------------
> select start_date, account_id,
> sum(quantity) as total,
> sum(discounted_quantity) as disc_total
> from COST
> where
> start_date >= (to_date('2016-03-01'))
> and precision = 1
> and account_id in ('1234', '5678')
> group start_date, account_id
> --RESULT--
> START_DATE ACCOUNT_ID
> TOTAL DISC_TOTAL
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2016-03-01 16:00:00.0 1234
> 312 0
>
> ....
> 2016-03-07 16:00:00.0 1234
> 312 0
>
> 2016-03-08 16:00:00.0 1234
> 312 0
>
> 2016-03-09 16:00:00.0 1234
> 216 0
>
> 2016-02-29 16:00:00.0 5678
> 194 24
>
> 2016-03-01 16:00:00.0 5678
> 262 0
>
> ....
> 2016-03-07 16:00:00.0 5678
> 237 48
>
> 2016-03-08 16:00:00.0 5678
> 178 0
>
> 2016-03-09 16:00:00.0 1234
> 96 0
>
> 2016-03-09 16:00:00.0 5678
> 173 50
>
> 2016-03-10 16:00:00.0 1234
> 324 0
>
> ....
> 2016-03-07 16:00:00.0 5678
> 178 24
>
> 2016-03-08 16:00:00.0 5678
> 218 0
>
> 2016-03-09 16:00:00.0 5678
> 218 48
>
> ^^^^^^^^^^^^^^^^^^^^^^
> Repeated rows
> --------------------
> {code}
> Workaround:
> {code}
> select to_char(start_date), account_id,
> sum(quantity) as total,
> sum(discounted_quantity) as disc_total
> from COST
> where
> start_date >= (to_date('2016-03-01'))
> and precision = 1
> and account_id in ('1234', '5678')
> group to_char(start_date), account_id
> --RESULT--
> START_DATE ACCOUNT_ID
> TOTAL DISC_TOTAL
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 2016-03-01 00:00:00.000 1234
> 312 0
>
> 2016-03-01 00:00:00.000 5678
> 3,465 384
>
> 2016-03-02 00:00:00.000 1234
> 312 0
>
> 2016-03-02 00:00:00.000 5678
> 3,238 384
>
> 2016-03-03 00:00:00.000 1234
> 312 0
>
> 2016-03-03 00:00:00.000 5678
> 3,243 384
>
> 2016-03-04 00:00:00.000 1234
> 312 0
>
> 2016-03-04 00:00:00.000 5678
> 3,212 384
>
> 2016-03-05 00:00:00.000 1234
> 312 0
>
> 2016-03-05 00:00:00.000 5678
> 2,907 384
>
> 2016-03-06 00:00:00.000 1234
> 312 0
>
> 2016-03-06 00:00:00.000 5678
> 2,907 384
>
> 2016-03-07 00:00:00.000 1234
> 312 0
>
> 2016-03-07 00:00:00.000 5678
> 2,919 384
>
> 2016-03-08 00:00:00.000 1234
> 312 0
>
> 2016-03-08 00:00:00.000 5678
> 2,955 384
>
> 2016-03-09 00:00:00.000 1234
> 312 0
>
> 2016-03-09 00:00:00.000 5678
> 2,955 384
>
> 2016-03-10 00:00:00.000 1234
> 312 0
>
> 2016-03-10 00:00:00.000 5678
> 2,978 384
>
> 2016-03-11 00:00:00.000 1234
> 324 0
>
> 2016-03-11 00:00:00.000 5678
> 3,125 384
>
> 2016-03-12 00:00:00.000 1234
> 336 0
>
> 2016-03-12 00:00:00.000 5678
> 3,102 384
>
> 2016-03-13 00:00:00.000 1234
> 336 0
>
> 2016-03-13 00:00:00.000 5678
> 3,117 384
>
> 2016-03-14 00:00:00.000 1234
> 153 0
>
> 2016-03-14 00:00:00.000 5678
> 2,187 278
>
> -----------------
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
