[ 
https://issues.apache.org/jira/browse/KUDU-2463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534009#comment-16534009
 ] 

Todd Lipcon commented on KUDU-2463:
-----------------------------------

[~Rodion Myronov] thanks for reporting this new information. I am wondering if 
this is due to timestamp assignment. For 'ksck', the client picks some 
particular snapshot timestamp so that the scan is guaranteed to use the same 
snapshot at different replicas. For a scan in the default scan mode, it won't. 
I'm guessing that as another side effect of KUDU-2233, scans in this default 
mode might be reading at some time in the past and therefore not seeing all 
rows appropriately. As a test of this, would it be possible to attempt to 
insert or update one row in each of the tablets? That should "kick" the time 
advancement forward and confirm whether or not that's what we're seeing.

> Different results returned by group by on count() metric
> --------------------------------------------------------
>
>                 Key: KUDU-2463
>                 URL: https://issues.apache.org/jira/browse/KUDU-2463
>             Project: Kudu
>          Issue Type: Bug
>          Components: impala
>    Affects Versions: 1.5.0
>            Reporter: Tomas Farkas
>            Priority: Critical
>             Fix For: n/a
>
>         Attachments: checksum_scan.txt, fs_check_master_servers.txt, 
> fs_check_tablet_servers.txt
>
>
> Hi, 
> I have a static table in Kudu, no inserts/updates or deletes are running on 
> the cluster. The query returns DIFFERENT result when I change the where 
> condition on one of the primary key columns, which is in the group_by list.
> The created_date is part of the PK and is type of int.
> PK contains subscriber, time, date, identifier and created_date.
> I tried to check if the inserted count is equal to the HDFS table, and 
> noticed on one day, that the count differs based on the where criteria!!
> {quote} 
> {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu 
> where created_date >= 20180601 group by created_date;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where 
> created_date >= 20180601 group by created_date}}
>  {{Query submitted at: 2018-06-04 21:06:30 (Coordinator: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=ce4e92eda5aaa02f:ea07aa4600000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180601 | 18076448 | -> 195k MORE!!!}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 4 row(s) in 0.37s}}
>  {{[10.197.0.164:21000] >}}
>  {{[10.197.0.164:21000] >}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu 
> where created_date >= 20180601 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where 
> created_date >= 20180601 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:06:55 (Coordinator: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=d541a9dda19e28e4:be4a2ca000000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180601 | 18076448 | -> 195k MORE!!!}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 4 row(s) in 1.14s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu 
> where created_date >= 20180528 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where 
> created_date >= 20180528 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:07:12 (Coordinator: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=774a261fb94ad2bb:aab28b8b00000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180528 | 17607943 |}}
>  {{| 20180529 | 20741097 |}}
>  {{| 20180530 | 17362364 |}}
>  {{| 20180531 | 16877228 |}}
>  \{{| 20180601 | 17925671 | -> 44k MORE!! }}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 8 row(s) in 0.67s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu 
> where created_date >= 20180525 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where 
> created_date >= 20180525 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:07:25 (Coordinator: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=38483ad3ae5c8eb9:a538cb6300000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180525 | 22309857 |}}
>  {{| 20180526 | 15268520 |}}
>  {{| 20180527 | 14939691 |}}
>  {{| 20180528 | 17607943 |}}
>  {{| 20180529 | 20741097 |}}
>  {{| 20180530 | 17362364 |}}
>  {{| 20180531 | 16903829 |}}
>  {{| 20180601 | 18047010 | -> 165k MORE!!!}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 11 row(s) in 0.85s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu 
> where created_date = 20180601 group by created_date;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where 
> created_date = 20180601 group by created_date}}
>  {{Query submitted at: 2018-06-04 21:07:42 (Coordinator: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=7343ba31f6b4c86f:621a7b8c00000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180601 | 17881253 | -> CORRECT ONE}}
>  {{+---------------+---------+}}
>  {{Fetched 1 row(s) in 0.27s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu 
> where created_date >= 20180525 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where 
> created_date >= 20180525 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:12:02 (Coordinator: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=4141df26117f35c3:9ab2f0700000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180525 | 22309857 |}}
>  {{| 20180526 | 15268520 |}}
>  {{| 20180527 | 14939691 |}}
>  {{| 20180528 | 17607943 |}}
>  {{| 20180529 | 20741097 |}}
>  {{| 20180530 | 17362364 |}}
>  {{| 20180531 | 16903829 |}}
>  {{| 20180601 | 18047010 | -> AGAIN WRONG RESULT!!}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 11 row(s) in 1.04s}}{{}}{{}}
> Again, no other inserts/selects/updates or deletes were running between these 
> statements on the cluster. 
>  
> I checked the explain, if there is a difference,but it looks ok. But the 
> result is different!
>  
> {{[10.197.0.164:21000] > explain select created_date, count(*) from 
> base.usage_kudu where created_date = 20180601 group by created_date;}}
>  {{Query: explain select created_date, count(*) from base.usage_kudu where 
> created_date = 20180601 group by created_date}}
>  {{+--------------------------------------------------+}}
>  {{| Explain String |}}
>  {{+--------------------------------------------------+}}
>  {{| Max Per-Host Resource Reservation: Memory=3.94MB |}}
>  {{| Per-Host Resource Estimates: Memory=20.00MB |}}
>  {{| |}}
>  {{| PLAN-ROOT SINK |}}
>  {{| | |}}
>  {{| 04:EXCHANGE [UNPARTITIONED] |}}
>  {{| | |}}
>  {{| 03:AGGREGATE [FINALIZE] |}}
>  {{| | output: count:merge(*) |}}
>  {{| | group by: created_date |}}
>  {{| | |}}
>  {{| 02:EXCHANGE [HASH(created_date)] |}}
>  {{| | |}}
>  {{| 01:AGGREGATE [STREAMING] |}}
>  {{| | output: count(*) |}}
>  {{| | group by: created_date |}}
>  {{| | |}}
>  {{| 00:SCAN KUDU [base.usage_kudu] |}}
>  {{| kudu predicates: created_date = 20180601 |}}
>  {{+--------------------------------------------------+}}
>  {{Fetched 19 row(s) in 0.06s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu 
> where created_date = 20180601 group by created_date;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where 
> created_date = 20180601 group by created_date}}
>  {{Query submitted at: 2018-06-04 21:17:21 (Coordinator: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=c449aabea51e7456:612f096400000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180601 | 17881253 |}}
>  {{+---------------+---------+}}
>  {{Fetched 1 row(s) in 0.38s}}
>  {{[10.197.0.164:21000] > explain select created_date, count(*) from 
> base.usage_kudu where created_date >= 20180525 group by created_date order by 
> 1;}}
>  {{Query: explain select created_date, count(*) from base.usage_kudu where 
> created_date >= 20180525 group by created_date order by 1}}
>  {{+--------------------------------------------------+}}
>  {{| Explain String |}}
>  {{+--------------------------------------------------+}}
>  {{| Max Per-Host Resource Reservation: Memory=9.94MB |}}
>  {{| Per-Host Resource Estimates: Memory=26.00MB |}}
>  {{| |}}
>  {{| PLAN-ROOT SINK |}}
>  {{| | |}}
>  {{| 05:MERGING-EXCHANGE [UNPARTITIONED] |}}
>  {{| | order by: created_date ASC |}}
>  {{| | |}}
>  {{| 02:SORT |}}
>  {{| | order by: created_date ASC |}}
>  {{| | |}}
>  {{| 04:AGGREGATE [FINALIZE] |}}
>  {{| | output: count:merge(*) |}}
>  {{| | group by: created_date |}}
>  {{| | |}}
>  {{| 03:EXCHANGE [HASH(created_date)] |}}
>  {{| | |}}
>  {{| 01:AGGREGATE [STREAMING] |}}
>  {{| | output: count(*) |}}
>  {{| | group by: created_date |}}
>  {{| | |}}
>  {{| 00:SCAN KUDU [base.usage_kudu] |}}
>  {{| kudu predicates: created_date >= 20180525 |}}
>  {{+--------------------------------------------------+}}
>  {{Fetched 23 row(s) in 0.05s}}
>  {{[10.197.0.164:21000] > select created_date, count(*) from base.usage_kudu 
> where created_date >= 20180525 group by created_date order by 1;}}
>  {{Query: select created_date, count(*) from base.usage_kudu where 
> created_date >= 20180525 group by created_date order by 1}}
>  {{Query submitted at: 2018-06-04 21:17:32 (Coordinator: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000|http://ip-10-197-0-164.eu-west-1.compute.internal:25000/])}}
>  {{Query progress can be monitored at: 
> [http://ip-10-197-0-164.eu-west-1.compute.internal:25000/query_plan?query_id=bc4a36f2a7ad3280:c7b09a5100000000]}}
>  {{+---------------+---------+}}
>  {{| created_date | count(*) |}}
>  {{+---------------+---------+}}
>  {{| 20180525 | 22309857 |}}
>  {{| 20180526 | 15268520 |}}
>  {{| 20180527 | 14939691 |}}
>  {{| 20180528 | 17607943 |}}
>  {{| 20180529 | 20741097 |}}
>  {{| 20180530 | 17362364 |}}
>  {{| 20180531 | 16903829 |}}
>  {{| 20180601 | 18047010 |}}
>  {{| 20180602 | 13325080 |}}
>  {{| 20180603 | 12145131 |}}
>  {{| 20180604 | 3788161 |}}
>  {{+---------------+---------+}}
>  {{Fetched 11 row(s) in 0.88s}}
>  
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to