zclllyybb commented on issue #64227:
URL: https://github.com/apache/doris/issues/64227#issuecomment-4647957349
Breakwater-GitHub-Analysis-Slot: slot_59868e2812ba
Initial triage result: this looks like a real master build/linkage bug, not
an environment-only TSAN report.
I checked refreshed `upstream/master` at
`8255f94bc5fbee08c45f133a3d2fad87667e3e03`. `BlockFileCache::get_cell` is
declared as a constrained template in `be/src/io/cache/block_file_cache.h` and
defined only in `be/src/io/cache/block_file_cache.cpp`. The failing
`BlockFileCacheTest.late_holder_remove_skips_replaced_cache_cell` path calls
`cache.get_cell(key, 0, cache_lock)` directly from the test translation unit,
and `SCOPED_CACHE_LOCK` creates `std::lock_guard<std::mutex> cache_lock`.
That matches the undefined symbol:
```cpp
doris::io::BlockFileCache::get_cell<std::lock_guard<std::mutex>>(...)
```
I found an explicit instantiation for `BlockFileCache::remove(...
std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, ...)`, and the same
explicit-instantiation pattern exists for `LRUQueue`, but I did not find an
explicit instantiation for
`BlockFileCache::get_cell<std::lock_guard<std::mutex>>` in `be/src/io/cache`.
So the reported root cause is consistent with the source. Since the template
definition is not in the header, the test object can compile against the
declaration but still need an emitted specialization from
`block_file_cache.cpp`. A focused fix is to add the missing specialization near
the existing explicit template instantiations:
```cpp
template FileBlockCell* BlockFileCache::get_cell(const UInt128Wrapper& hash,
size_t offset,
std::lock_guard<std::mutex>& cache_lock);
```
No behavior change should be expected; this is a
template-instantiation/linkage fix.
There is already a matching public PR:
https://github.com/apache/doris/pull/64228. Suggested next steps for
maintainers:
1. Review that PR as the direct fix for this issue.
2. Ask for `bash build.sh --be --ut --tsan` or the failing BE UT link step
to be rerun after the fix.
3. Keep or split any unrelated test-file changes in the PR if they are not
required for this TSAN linkage failure, so the issue fix remains narrow.
No additional reproduction information is needed to accept the diagnosis. If
the proposed fix still fails, the next useful evidence would be the full TSAN
link command and the `block_file_cache.cpp.o` symbol table for the `get_cell`
specialization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]