[ 
https://issues.apache.org/jira/browse/KUDU-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated KUDU-1341:
-------------------------------------
    Fix Version/s: 0.7.1

> Out of order UNDO in delta file, possibly related to REINSERT
> -------------------------------------------------------------
>
>                 Key: KUDU-1341
>                 URL: https://issues.apache.org/jira/browse/KUDU-1341
>             Project: Kudu
>          Issue Type: Bug
>          Components: tablet
>    Affects Versions: 0.6.0
>            Reporter: Mike Percy
>            Assignee: Todd Lipcon
>             Fix For: 0.7.1, 0.8.0
>
>
> As reported by Bruce Song Zhang, there is a crasher in debug mode in tablet 
> compaction under load.
> Here is a snippet of the log file he sent:
> {noformat}
> I0219 10:03:27.503398 31721 delta_tracker.cc:456] Flushing 8 deltas from DMS 
> 114...
> I0219 10:03:27.504659 31721 delta_tracker.cc:410] Flushed delta block: 
> 2109978395164279242
> I0219 10:03:27.504781 31721 delta_tracker.cc:416] Reopened delta block for 
> read: 2109978395164279242
> I0219 10:03:28.150115 31721 maintenance_manager.cc:359] Time spent running 
> FlushDeltaMemStoresOp(c6069618c9864c1d8422912bf69efd44): real 0.667s       
> user 0.647s     sys 0.020s
> I0219 10:03:28.665895 31721 log_reader.cc:415] T 
> c6069618c9864c1d8422912bf69efd44: removed 1 log segments from log reader
> I0219 10:03:28.665935 31721 log.cc:736] Deleting log segment in path: 
> /export/servers/kudu/tserver_wal_data/wals/c6069618c9864c1d8422912bf69efd44/wal-000000223
>  (GCed ops < 61476431)
> I0219 10:03:28.704988 31721 maintenance_manager.cc:359] Time spent running 
> LogGCOp(c6069618c9864c1d8422912bf69efd44): real 0.039s     user 0.000s     
> sys 0.039s
> I0219 10:03:29.456583 31721 tablet.cc:653] Flush: entering stage 1 (old 
> memrowset already frozen for inserts)
> I0219 10:03:29.456614 31721 compaction.cc:591] Selected 1 rowsets to compact:
> I0219 10:03:29.456630 31721 compaction.cc:594] memrowset(current size on 
> disk: ~0 bytes)
> I0219 10:03:29.456686 31721 tablet.cc:655] Memstore in-memory size: 33095447 
> bytes
> I0219 10:03:29.456742 31721 tablet.cc:1132] Flush: entering phase 1 (flushing 
> snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 
> 5963147244092280832 or (T in {5963147244092280832})}]
> I0219 10:03:29.457515 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column country_id[int32 NOT NULL]
> I0219 10:03:29.457700 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column sku_id[int64 NOT NULL]
> I0219 10:03:29.457865 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cur_day[int32 NOT NULL]
> I0219 10:03:29.458027 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column country_name[string NOT NULL]
> I0219 10:03:29.458173 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column sku_name[string NOT NULL]
> I0219 10:03:29.458333 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column brand_id[int32 NOT NULL]
> I0219 10:03:29.458493 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column brand_name[string NOT NULL]
> I0219 10:03:29.458636 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column product_id[int64 NOT NULL]
> I0219 10:03:29.458777 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column product_name[string NOT NULL]
> I0219 10:03:29.458927 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid1[int32 NOT NULL]
> I0219 10:03:29.459066 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid1_name[string NOT NULL]
> I0219 10:03:29.459198 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid2[int32 NOT NULL]
> I0219 10:03:29.459347 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid2_name[string NOT NULL]
> I0219 10:03:29.459491 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid3[int32 NOT NULL]
> I0219 10:03:29.459630 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid3_name[string NOT NULL]
> I0219 10:03:29.459774 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column data_type[int8 NOT NULL]
> I0219 10:03:29.459910 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column price[double NOT NULL]
> I0219 10:03:29.460060 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column sale_date[timestamp NOT NULL]
> I0219 10:03:29.460197 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column sale_number[int64 NOT NULL]
> I0219 10:03:29.460340 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column gmv[double NOT NULL]
> W0219 10:03:29.572443 31721 compaction.cc:714] Found REINSERT REDO truncating 
> row history for Source Row: (int32 country_id=387, int64 sku_id=50804, int32 
> cur_day=387, string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 
> 产后下垂增大美胸膏霜 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 
> product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 
> 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string 
> cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, 
> double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 
> sale_number=12332, double gmv=12233) Redo Mutations: 
> [@5963145475847274496(DELETE), @5963145476833153024(REINSERT (int32 
> country_id=387, int64 sku_id=50804, int32 cur_day=387, string 
> country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, 
> int32 brand_id=74398, string brand_name=纯贞, int64 product_id=3143432413, 
> string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 少女美胸部护理, int32 cid1=123, 
> string cid1_name=个护化妆, int32 cid2=123, string cid2_name=身体护肤, int32 cid3=123, 
> string cid3_name=美胸, int8 data_type=12, double price=1234, timestamp 
> sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 sale_number=12332, double 
> gmv=12233))] Undo Mutations: [@5963144433666650112(DELETE)]
> Dest Row: (int32 country_id=387, int64 sku_id=50804, int32 cur_day=387, 
> string country_name=中桥乡, string sku_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 
> 少女美胸部护理, int32 brand_id=74398, string brand_name=纯贞, int64 
> product_id=3143432413, string product_name=纯贞美胸精油 美胸紧致美胸仪器产品 美胸贴 产后下垂增大美胸膏霜 
> 少女美胸部护理, int32 cid1=123, string cid1_name=个护化妆, int32 cid2=123, string 
> cid2_name=身体护肤, int32 cid3=123, string cid3_name=美胸, int8 data_type=12, 
> double price=1234, timestamp sale_date=Sat, 23 May 1970 15:07:24 GMT, int64 
> sale_number=12332, double gmv=12233) Redo Mutations: [] Undo Mutations: 
> [@5963145476833153024(DELETE)] Note: this warning will appear only for the 
> first truncated row
> W0219 10:03:32.302188 31721 compaction.cc:834] Total 7 rows lost some history 
> due to REINSERT after DELETE
> I0219 10:03:32.346256 31721 tablet.cc:1222] Flush: entering phase 2 (starting 
> to duplicate updates in new rowsets)
> I0219 10:03:32.846774 31721 tablet.cc:1277] Flush Phase 2: carrying over any 
> updates which arrived during Phase 1
> I0219 10:03:32.846839 31721 tablet.cc:1278] Phase 2 snapshot: 
> MvccSnapshot[committed={T|T < 5963147244092280832 or (T in 
> {5963147244092280832})}]
> I0219 10:03:34.485435 31721 tablet.cc:1311] Flush successful on 52051 rows 
> (19462877 bytes)
> I0219 10:03:34.501569 31721 maintenance_manager.cc:359] Time spent running 
> FlushMRSOp(c6069618c9864c1d8422912bf69efd44): real 5.541s  user 5.454s     
> sys 0.081s
> I0219 10:05:45.657285 31721 tablet.cc:1333] Compaction: stage 1 complete, 
> picked 6 rowsets to compact
> I0219 10:05:45.657618 31721 compaction.cc:591] Selected 6 rowsets to compact:
> I0219 10:05:45.657727 31721 compaction.cc:594] RowSet(15418)(current size on 
> disk: ~30348706 bytes)
> I0219 10:05:45.657793 31721 compaction.cc:594] RowSet(15544)(current size on 
> disk: ~1123338 bytes)
> I0219 10:05:45.657855 31721 compaction.cc:594] RowSet(15551)(current size on 
> disk: ~26151132 bytes)
> I0219 10:05:45.657886 31721 compaction.cc:594] RowSet(15610)(current size on 
> disk: ~30315579 bytes)
> I0219 10:05:45.657913 31721 compaction.cc:594] RowSet(15611)(current size on 
> disk: ~32233272 bytes)
> I0219 10:05:45.657934 31721 compaction.cc:594] RowSet(15617)(current size on 
> disk: ~18863054 bytes)
> I0219 10:05:45.657984 31721 tablet.cc:1132] Compaction: entering phase 1 
> (flushing snapshot). Phase 1 snapshot: MvccSnapshot[committed={T|T < 
> 5963147244092280832 or (T in {5963147244092280832})}]
> I0219 10:05:45.661224 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column country_id[int32 NOT NULL]
> I0219 10:05:45.661432 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column sku_id[int64 NOT NULL]
> I0219 10:05:45.661598 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cur_day[int32 NOT NULL]
> I0219 10:05:45.661761 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column country_name[string NOT NULL]
> I0219 10:05:45.661933 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column sku_name[string NOT NULL]
> I0219 10:05:45.662081 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column brand_id[int32 NOT NULL]
> I0219 10:05:45.662228 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column brand_name[string NOT NULL]
> I0219 10:05:45.662384 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column product_id[int64 NOT NULL]
> I0219 10:05:45.662542 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column product_name[string NOT NULL]
> I0219 10:05:45.662695 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid1[int32 NOT NULL]
> I0219 10:05:45.662832 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid1_name[string NOT NULL]
> I0219 10:05:45.662962 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid2[int32 NOT NULL]
> I0219 10:05:45.663100 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid2_name[string NOT NULL]
> I0219 10:05:45.663239 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid3[int32 NOT NULL]
> I0219 10:05:45.663393 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column cid3_name[string NOT NULL]
> I0219 10:05:45.663538 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column data_type[int8 NOT NULL]
> I0219 10:05:45.663679 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column price[double NOT NULL]
> I0219 10:05:45.663830 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column sale_date[timestamp NOT NULL]
> I0219 10:05:45.663964 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column sale_number[int64 NOT NULL]
> I0219 10:05:45.664091 31721 multi_column_writer.cc:82] Opened CFile writer 
> for column gmv[double NOT NULL]
> F0219 10:06:01.038265 31721 deltafile.cc:136] Check failed: 
> last_key_.CompareTo<UNDO>(key) <= 0 must insert undo deltas in sorted order 
> (ascending key, then descending ts): got key (row 
> 278573@tx5962430420551352320) after (row 278573@tx5962430179277082624)
> {noformat}
> It appears that the timestamps are misordered (in ascending order instead of 
> descending order), triggering a DCHECK. We don't normally see this in load 
> tests because we run with RELEASE mode in most of our long running stress 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to