zclllyybb commented on issue #64227:
URL: https://github.com/apache/doris/issues/64227#issuecomment-4647957349

   Breakwater-GitHub-Analysis-Slot: slot_59868e2812ba
   
   Initial triage result: this looks like a real master build/linkage bug, not 
an environment-only TSAN report.
   
   I checked refreshed `upstream/master` at 
`8255f94bc5fbee08c45f133a3d2fad87667e3e03`. `BlockFileCache::get_cell` is 
declared as a constrained template in `be/src/io/cache/block_file_cache.h` and 
defined only in `be/src/io/cache/block_file_cache.cpp`. The failing 
`BlockFileCacheTest.late_holder_remove_skips_replaced_cache_cell` path calls 
`cache.get_cell(key, 0, cache_lock)` directly from the test translation unit, 
and `SCOPED_CACHE_LOCK` creates `std::lock_guard<std::mutex> cache_lock`.
   
   That matches the undefined symbol:
   
   ```cpp
   doris::io::BlockFileCache::get_cell<std::lock_guard<std::mutex>>(...)
   ```
   
   I found an explicit instantiation for `BlockFileCache::remove(... 
std::lock_guard<std::mutex>&, std::lock_guard<std::mutex>&, ...)`, and the same 
explicit-instantiation pattern exists for `LRUQueue`, but I did not find an 
explicit instantiation for 
`BlockFileCache::get_cell<std::lock_guard<std::mutex>>` in `be/src/io/cache`.
   
   So the reported root cause is consistent with the source. Since the template 
definition is not in the header, the test object can compile against the 
declaration but still need an emitted specialization from 
`block_file_cache.cpp`. A focused fix is to add the missing specialization near 
the existing explicit template instantiations:
   
   ```cpp
   template FileBlockCell* BlockFileCache::get_cell(const UInt128Wrapper& hash, 
size_t offset,
                                                    
std::lock_guard<std::mutex>& cache_lock);
   ```
   
   No behavior change should be expected; this is a 
template-instantiation/linkage fix.
   
   There is already a matching public PR: 
https://github.com/apache/doris/pull/64228. Suggested next steps for 
maintainers:
   
   1. Review that PR as the direct fix for this issue.
   2. Ask for `bash build.sh --be --ut --tsan` or the failing BE UT link step 
to be rerun after the fix.
   3. Keep or split any unrelated test-file changes in the PR if they are not 
required for this TSAN linkage failure, so the issue fix remains narrow.
   
   No additional reproduction information is needed to accept the diagnosis. If 
the proposed fix still fails, the next useful evidence would be the full TSAN 
link command and the `block_file_cache.cpp.o` symbol table for the `get_cell` 
specialization.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to