zhannngchen opened a new pull request, #19619:
URL: https://github.com/apache/doris/pull/19619
# Proposed changes
Issue Number: close #xxx
## Problem summary
When segment compaction enabled, our p1 case tpcds_sf1_unique_p1 will cause
the BE core:
```
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/mnt/disk2/zhangchen/doris-dev/be/src/common/signal_handler.h:413
1# 0x00007FEBC3D35570 in /lib64/libc.so.6
2# __GI_raise in /lib64/libc.so.6
3# abort in /lib64/libc.so.6
4# 0x0000556D14814819 in /mnt/disk2/zhangchen/deploy/be/lib/doris_be
5# 0x0000556D14809E2D in /mnt/disk2/zhangchen/deploy/be/lib/doris_be
6# google::LogMessage::SendToLog() in
/mnt/disk2/zhangchen/deploy/be/lib/doris_be
7# google::LogMessage::Flush() in
/mnt/disk2/zhangchen/deploy/be/lib/doris_be
8# google::LogMessageFatal::~LogMessageFatal() in
/mnt/disk2/zhangchen/deploy/be/lib/doris_be
9# doris::Tablet::calc_delete_bitmap(std::shared_ptr<doris::Rowset>,
std::vector<std::shared_ptr<doris::segment_v2::Segment>,
std::allocator<std::shared_ptr<doris::segment_v2::Segment> > > const&,
std::unordered
_set<doris::RowsetId, doris::HashOfRowsetId, std::equal_to<doris::RowsetId>,
std::allocator<doris::RowsetId> > const*, std::shared_ptr<doris::DeleteBitmap>,
long, bool, doris::RowsetWriter*) in /mnt/disk2/zhangch
en/deploy/be/lib/doris_be
10# doris::DeltaWriter::close_wait(doris::PSlaveTabletNodes const&, bool) at
/mnt/disk2/zhangchen/doris-dev/be/src/olap/delta_writer.cpp:403
11# doris::TabletsChannel::_close_wait(doris::DeltaWriter*,
google::protobuf::RepeatedPtrField<doris::PTabletInfo>*,
google::protobuf::RepeatedPtrField<doris::PTabletError>*,
doris::PSlaveTabletNodes, bool) at /m
nt/disk2/zhangchen/doris-dev/be/src/runtime/tablets_channel.cpp:228
12# doris::TabletsChannel::close(doris::LoadChannel*, int, long, bool*,
google::protobuf::RepeatedField<long> const&,
google::protobuf::RepeatedPtrField<doris::PTabletInfo>*,
google::protobuf::RepeatedPtrField<do
ris::PTabletError>*, google::protobuf::Map<long, doris::PSlaveTabletNodes>
const&, google::protobuf::Map<long, doris::PSuccessSlaveTabletNodeIds>*, bool)
at /mnt/disk2/zhangchen/doris-dev/be/src/runtime/tablets_c
hannel.cpp:193
13# doris::LoadChannel::_handle_eos(std::shared_ptr<doris::TabletsChannel>&,
doris::PTabletWriterAddBlockRequest const&,
doris::PTabletWriterAddBlockResult*) at
/mnt/disk2/zhangchen/doris-dev/be/src/runtime/load_
channel.h:125
14# doris::LoadChannel::add_batch(doris::PTabletWriterAddBlockRequest
const&, doris::PTabletWriterAddBlockResult*) at
/mnt/disk2/zhangchen/doris-dev/be/src/runtime/load_channel.cpp:126
15# doris::LoadChannelMgr::add_batch(doris::PTabletWriterAddBlockRequest
const&, doris::PTabletWriterAddBlockResult*) at
/mnt/disk2/zhangchen/doris-dev/be/src/runtime/load_channel_mgr.cpp:183
16#
doris::PInternalServiceImpl::_tablet_writer_add_block(google::protobuf::RpcController*,
doris::PTabletWriterAddBlockRequest const*,
doris::PTabletWriterAddBlockResult*, google::protobuf::Closure*)::$_13::oper
ator()() const at
/mnt/disk2/zhangchen/doris-dev/be/src/service/internal_service.cpp:355
```
The core is due to a DCHECK:
```
F0513 22:48:56.059758 3996895 tablet.cpp:2690] Check failed: num_to_read ==
num_read
```
Finally, we found that the DCHECK failure is due to page cache:
1. At first we have 20 segments, which id is 0-19.
2. For MoW table, memtable flush process will calculate the delete bitmap.
In this procedure, the index pages and data pages of PrimaryKeyIndex is loaded
to cache
3. Segment compaction compact all these 10 segments to 2 segment, and rename
it to id 0,1
4. Finally, before the load commit, we'll calculate delete bitmap between
segments in current rowset. This procedure need to iterator primary key index
of each segments, but when we access data of new compacted segments, we read
data of old segments in page cache
To fix this issue, the best policy is:
1. Add a crc32 or last modified time to CacheKey.
2. Or invalid related cache keys after segment compaction.
For policy 1, we don't have crc32 in segment footer, and getting the
last-modified-time needs to perform 1 additional disk IO.
For policy 2, we need to add additional page cache invalidation methods,
which may cause the page cache not stable
So I think we can simply add a file size to identify that the file is
changed.
In LSM-Tree, all modification will generate new files, such file-name reuse
is not normal case(as far as I know, only segment compaction), file size is
enough to identify the file change.
## Checklist(Required)
* [ ] Does it affect the original behavior
* [ ] Has unit tests been added
* [ ] Has document been added or modified
* [ ] Does it need to update dependencies
* [ ] Is this PR support rollback (If NO, please explain WHY)
## Further comments
If this is a relatively large or complex change, kick off the discussion at
[[email protected]](mailto:[email protected]) by explaining why you
chose the solution you did and what alternatives you considered, etc...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]