[
https://issues.apache.org/jira/browse/KUDU-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534764#comment-16534764
]
Rodion Myronov commented on KUDU-2463:
--------------------------------------
[~tlipcon] , it seems you've hit the root cause. We've inserted and then
deleted some records into each and every partition and it seemed to help. We
are still checking the data, but at least now we get stable results for our
Impala queries. Direct kudu-python reads in different modes return the
consistent result as well.
Thanks a lot, that was a brilliant catch!
> Different results returned by group by on count() metric
> --------------------------------------------------------
>
> Key: KUDU-2463
> URL: https://issues.apache.org/jira/browse/KUDU-2463
> Project: Kudu
> Issue Type: Bug
> Components: impala
> Affects Versions: 1.5.0
> Reporter: Tomas Farkas
> Priority: Critical
> Fix For: n/a
>
> Attachments: checksum_scan.txt, fs_check_master_servers.txt,
> fs_check_tablet_servers.txt
>
>
> Hi,
> I have a static table in Kudu, no inserts/updates or deletes are running on
> the cluster. The query returns DIFFERENT result when I change the where
> condition on one of the primary key columns, which is in the group_by list.
> The created_date is part of the PK and is type of int.
> PK contains subscriber, time, date, identifier and created_date.
> I tried to check if the inserted count is equal to the HDFS table, and
> noticed on one day, that the count differs based on the where criteria!!
> {quote}
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu
> where created_date >= 20180601 group by created_date;}}
> {{Query: select created_date, count(*) from base.usage_kudu where
> created_date >= 20180601 group by created_date}}
> {{Query submitted at: 2018-06-04 21:06:30 (Coordinator:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
> {{Query progress can be monitored at:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=ce4e92eda5aaa02f:ea07aa4600000000]}}
> {{+---------------+---------+}}
> {{| created_date | count(*) |}}
> {{+---------------+---------+}}
> {{| 20180603 | 12145131 |}}
> {{| 20180601 | 18076448 | -> 195k MORE!!!}}
> {{| 20180602 | 13325080 |}}
> {{| 20180604 | 3788161 |}}
> {{+---------------+---------+}}
> {{Fetched 4 row(s) in 0.37s}}
> {{[10.197.0.164:21000] >}}
> {{[10.197.0.164:21000] >}}
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu
> where created_date >= 20180601 group by created_date order by 1;}}
> {{Query: select created_date, count(*) from base.usage_kudu where
> created_date >= 20180601 group by created_date order by 1}}
> {{Query submitted at: 2018-06-04 21:06:55 (Coordinator:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
> {{Query progress can be monitored at:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=d541a9dda19e28e4:be4a2ca000000000]}}
> {{+---------------+---------+}}
> {{| created_date | count(*) |}}
> {{+---------------+---------+}}
> {{| 20180601 | 18076448 | -> 195k MORE!!!}}
> {{| 20180602 | 13325080 |}}
> {{| 20180603 | 12145131 |}}
> {{| 20180604 | 3788161 |}}
> {{+---------------+---------+}}
> {{Fetched 4 row(s) in 1.14s}}
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu
> where created_date >= 20180528 group by created_date order by 1;}}
> {{Query: select created_date, count(*) from base.usage_kudu where
> created_date >= 20180528 group by created_date order by 1}}
> {{Query submitted at: 2018-06-04 21:07:12 (Coordinator:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
> {{Query progress can be monitored at:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=774a261fb94ad2bb:aab28b8b00000000]}}
> {{+---------------+---------+}}
> {{| created_date | count(*) |}}
> {{+---------------+---------+}}
> {{| 20180528 | 17607943 |}}
> {{| 20180529 | 20741097 |}}
> {{| 20180530 | 17362364 |}}
> {{| 20180531 | 16877228 |}}
> \{{| 20180601 | 17925671 | -> 44k MORE!! }}
> {{| 20180602 | 13325080 |}}
> {{| 20180603 | 12145131 |}}
> {{| 20180604 | 3788161 |}}
> {{+---------------+---------+}}
> {{Fetched 8 row(s) in 0.67s}}
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu
> where created_date >= 20180525 group by created_date order by 1;}}
> {{Query: select created_date, count(*) from base.usage_kudu where
> created_date >= 20180525 group by created_date order by 1}}
> {{Query submitted at: 2018-06-04 21:07:25 (Coordinator:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
> {{Query progress can be monitored at:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=38483ad3ae5c8eb9:a538cb6300000000]}}
> {{+---------------+---------+}}
> {{| created_date | count(*) |}}
> {{+---------------+---------+}}
> {{| 20180525 | 22309857 |}}
> {{| 20180526 | 15268520 |}}
> {{| 20180527 | 14939691 |}}
> {{| 20180528 | 17607943 |}}
> {{| 20180529 | 20741097 |}}
> {{| 20180530 | 17362364 |}}
> {{| 20180531 | 16903829 |}}
> {{| 20180601 | 18047010 | -> 165k MORE!!!}}
> {{| 20180602 | 13325080 |}}
> {{| 20180603 | 12145131 |}}
> {{| 20180604 | 3788161 |}}
> {{+---------------+---------+}}
> {{Fetched 11 row(s) in 0.85s}}
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu
> where created_date = 20180601 group by created_date;}}
> {{Query: select created_date, count(*) from base.usage_kudu where
> created_date = 20180601 group by created_date}}
> {{Query submitted at: 2018-06-04 21:07:42 (Coordinator:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
> {{Query progress can be monitored at:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=7343ba31f6b4c86f:621a7b8c00000000]}}
> {{+---------------+---------+}}
> {{| created_date | count(*) |}}
> {{+---------------+---------+}}
> {{| 20180601 | 17881253 | -> CORRECT ONE}}
> {{+---------------+---------+}}
> {{Fetched 1 row(s) in 0.27s}}
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu
> where created_date >= 20180525 group by created_date order by 1;}}
> {{Query: select created_date, count(*) from base.usage_kudu where
> created_date >= 20180525 group by created_date order by 1}}
> {{Query submitted at: 2018-06-04 21:12:02 (Coordinator:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
> {{Query progress can be monitored at:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=4141df26117f35c3:9ab2f0700000000]}}
> {{+---------------+---------+}}
> {{| created_date | count(*) |}}
> {{+---------------+---------+}}
> {{| 20180525 | 22309857 |}}
> {{| 20180526 | 15268520 |}}
> {{| 20180527 | 14939691 |}}
> {{| 20180528 | 17607943 |}}
> {{| 20180529 | 20741097 |}}
> {{| 20180530 | 17362364 |}}
> {{| 20180531 | 16903829 |}}
> {{| 20180601 | 18047010 | -> AGAIN WRONG RESULT!!}}
> {{| 20180602 | 13325080 |}}
> {{| 20180603 | 12145131 |}}
> {{| 20180604 | 3788161 |}}
> {{+---------------+---------+}}
> {{Fetched 11 row(s) in 1.04s}}{{}}{{}}
> Again, no other inserts/selects/updates or deletes were running between these
> statements on the cluster.
>
> I checked the explain, if there is a difference,but it looks ok. But the
> result is different!
>
> {{[10.197.0.164:21000] > explain select created_date, count(*) from
> base.usage_kudu where created_date = 20180601 group by created_date;}}
> {{Query: explain select created_date, count(*) from base.usage_kudu where
> created_date = 20180601 group by created_date}}
> {{+--------------------------------------------------+}}
> {{| Explain String |}}
> {{+--------------------------------------------------+}}
> {{| Max Per-Host Resource Reservation: Memory=3.94MB |}}
> {{| Per-Host Resource Estimates: Memory=20.00MB |}}
> {{| |}}
> {{| PLAN-ROOT SINK |}}
> {{| | |}}
> {{| 04:EXCHANGE [UNPARTITIONED] |}}
> {{| | |}}
> {{| 03:AGGREGATE [FINALIZE] |}}
> {{| | output: count:merge(*) |}}
> {{| | group by: created_date |}}
> {{| | |}}
> {{| 02:EXCHANGE [HASH(created_date)] |}}
> {{| | |}}
> {{| 01:AGGREGATE [STREAMING] |}}
> {{| | output: count(*) |}}
> {{| | group by: created_date |}}
> {{| | |}}
> {{| 00:SCAN KUDU [base.usage_kudu] |}}
> {{| kudu predicates: created_date = 20180601 |}}
> {{+--------------------------------------------------+}}
> {{Fetched 19 row(s) in 0.06s}}
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu
> where created_date = 20180601 group by created_date;}}
> {{Query: select created_date, count(*) from base.usage_kudu where
> created_date = 20180601 group by created_date}}
> {{Query submitted at: 2018-06-04 21:17:21 (Coordinator:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
> {{Query progress can be monitored at:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=c449aabea51e7456:612f096400000000]}}
> {{+---------------+---------+}}
> {{| created_date | count(*) |}}
> {{+---------------+---------+}}
> {{| 20180601 | 17881253 |}}
> {{+---------------+---------+}}
> {{Fetched 1 row(s) in 0.38s}}
> {{[10.197.0.164:21000] > explain select created_date, count(*) from
> base.usage_kudu where created_date >= 20180525 group by created_date order by
> 1;}}
> {{Query: explain select created_date, count(*) from base.usage_kudu where
> created_date >= 20180525 group by created_date order by 1}}
> {{+--------------------------------------------------+}}
> {{| Explain String |}}
> {{+--------------------------------------------------+}}
> {{| Max Per-Host Resource Reservation: Memory=9.94MB |}}
> {{| Per-Host Resource Estimates: Memory=26.00MB |}}
> {{| |}}
> {{| PLAN-ROOT SINK |}}
> {{| | |}}
> {{| 05:MERGING-EXCHANGE [UNPARTITIONED] |}}
> {{| | order by: created_date ASC |}}
> {{| | |}}
> {{| 02:SORT |}}
> {{| | order by: created_date ASC |}}
> {{| | |}}
> {{| 04:AGGREGATE [FINALIZE] |}}
> {{| | output: count:merge(*) |}}
> {{| | group by: created_date |}}
> {{| | |}}
> {{| 03:EXCHANGE [HASH(created_date)] |}}
> {{| | |}}
> {{| 01:AGGREGATE [STREAMING] |}}
> {{| | output: count(*) |}}
> {{| | group by: created_date |}}
> {{| | |}}
> {{| 00:SCAN KUDU [base.usage_kudu] |}}
> {{| kudu predicates: created_date >= 20180525 |}}
> {{+--------------------------------------------------+}}
> {{Fetched 23 row(s) in 0.05s}}
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu
> where created_date >= 20180525 group by created_date order by 1;}}
> {{Query: select created_date, count(*) from base.usage_kudu where
> created_date >= 20180525 group by created_date order by 1}}
> {{Query submitted at: 2018-06-04 21:17:32 (Coordinator:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
> {{Query progress can be monitored at:
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=bc4a36f2a7ad3280:c7b09a5100000000]}}
> {{+---------------+---------+}}
> {{| created_date | count(*) |}}
> {{+---------------+---------+}}
> {{| 20180525 | 22309857 |}}
> {{| 20180526 | 15268520 |}}
> {{| 20180527 | 14939691 |}}
> {{| 20180528 | 17607943 |}}
> {{| 20180529 | 20741097 |}}
> {{| 20180530 | 17362364 |}}
> {{| 20180531 | 16903829 |}}
> {{| 20180601 | 18047010 |}}
> {{| 20180602 | 13325080 |}}
> {{| 20180603 | 12145131 |}}
> {{| 20180604 | 3788161 |}}
> {{+---------------+---------+}}
> {{Fetched 11 row(s) in 0.88s}}
>
> {quote}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)