huajian created KUDU-2648:
-----------------------------

             Summary: compaction does not run
                 Key: KUDU-2648
                 URL: https://issues.apache.org/jira/browse/KUDU-2648
             Project: Kudu
          Issue Type: Bug
          Components: tablet
    Affects Versions: 1.7.0
         Environment: 3 master nodes, 4c32g, ubuntu16.04
3 data nodes, 8c64g, 1.8T ssd, ubuntu16.04
            Reporter: huajian


Here is a table : project_construction_record, 62 columns, 170k records, no 
partition

The table has many crud operations every day

I run a simple sql on it (using impala): 

 
{code:java}
SELECT * FROM project_construction_record ORDER BY id LIMIT 1{code}
it takes 7 seconds

By checking the profile, I found this:
{quote}
h4. KUDU_SCAN_NODE (id=0) (6.06秒)
 * BytesRead: *0 字节*
 * CollectionItemsRead: *0*
 * InactiveTotalTime: *0纳秒*
 * KuduRemoteScanTokens: *0*
 * NumScannerThreadsStarted: *1*
 * PeakMemoryUsage: *3.4 兆字节*
 * RowsRead: *177,007*
 * RowsReturned: *177,007*
 * RowsReturnedRate: *29188/秒*
 * ScanRangesComplete: *1*
 * ScannerThreadsInvoluntaryContextSwitches: *0*
 * ScannerThreadsTotalWallClockTime: *6.09秒*
 ** MaterializeTupleTime(*): *6.06秒*
 ** ScannerThreadsSysTime: *48毫秒*
 ** ScannerThreadsUserTime: *172毫秒*****{quote}
So i check the scan of this sql, and found this:
|column|cells read|bytes read|blocks read|
|id|176.92k|1.91M|19.96k|
|org_id|176.92k|1.91M|19.96k|
|work_date|176.92k|2.03M|19.96k|
|description|176.92k|1.21M|19.96k|
|user_name|176.92k|775.9K|19.96k|
|spot_name|176.92k|825.8K|19.96k|
|spot_start_pile|176.92k|778.7K|19.96k|
|spot_end_pile|176.92k|780.4K|19.96k|
|......|......|......|......|

There are so many blocks read.

Then I run the _*kudu fs list*_ command, and I got a 70M report data, here is 
the bottom:

 
{code:java}
0b6ac30b449043a68905e02b797144fc | 25024 | 40310988 | column
 0b6ac30b449043a68905e02b797144fc | 25024 | 40310989 | column
 0b6ac30b449043a68905e02b797144fc | 25024 | 40310990 | column
 0b6ac30b449043a68905e02b797144fc | 25024 | 40310991 | column
 0b6ac30b449043a68905e02b797144fc | 25024 | 40310992 | column
 0b6ac30b449043a68905e02b797144fc | 25024 | 40310993 | column
 0b6ac30b449043a68905e02b797144fc | 25024 | 40310996 | undo
 0b6ac30b449043a68905e02b797144fc | 25024 | 40310994 | bloom
 0b6ac30b449043a68905e02b797144fc | 25024 | 40310995 | adhoc-index{code}
 

there are 25024 rowsets, and more than 1m blocks in the tablet



I left the maintenance and the compact flags by default, only change the 
tablet_history_max_age_sec to one day:

 

 
{code:java}
--maintenance_manager_history_size=8
--maintenance_manager_num_threads=1
--maintenance_manager_polling_interval_ms=250
--budgeted_compaction_target_rowset_size=33554432
--compaction_approximation_ratio=1.0499999523162842
--compaction_minimum_improvement=0.0099999997764825821
--deltafile_default_block_size=32768
--deltafile_default_compression_codec=lz4
--default_composite_key_index_block_size_bytes=4096
--tablet_delta_store_major_compact_min_ratio=0.10000000149011612
--tablet_delta_store_minor_compact_max=1000
--mrs_use_codegen=true
--compaction_policy_dump_svgs_pattern=
--enable_undo_delta_block_gc=true
--fault_crash_before_flush_tablet_meta_after_compaction=0
--fault_crash_before_flush_tablet_meta_after_flush_mrs=0
--max_cell_size_bytes=65536
--max_encoded_key_size_bytes=16384
--tablet_bloom_block_size=4096
--tablet_bloom_target_fp_rate=9.9999997473787516e-05
--tablet_compaction_budget_mb=128
--tablet_history_max_age_sec=86400{code}
So my question is, *why the compaction does not run? is it a bug? and what can 
i do to compact manually?* 

It is a production enviroment, and many other tables have same issue, the 
performance is getting slower and slower.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to