guoliushui opened a new issue, #3274:
URL: https://github.com/apache/brpc/issues/3274

   **Describe the bug**
   在适配华为arm时,bthread_join() 在 ARM (aarch64) 架构下无法保证被 join 线程的内存写入对 joining 
线程可见。bthread_join() 返回后,joining 线程可能读取到被 join 线程写入的过期数据。
   
   原因分析:
   In TaskGroup::join() (src/bthread/task_group.cpp:502-507):
   
   while (*m->version_butex == expected_version) {
       if (butex_wait(m->version_butex, expected_version, NULL) < 0 &&
           errno != EWOULDBLOCK && errno != EINTR) {
           return errno;
       }
   }
   // No acquire fence here
   return 0;
   
   生产端(bthread 结束时,task_group.cpp:340-345)通过 pthread_spinlock_t 的 unlock 提供了 
release 语义
   {
       BAIDU_SCOPED_LOCK(m->version_lock);  // spinlock lock (acquire)
       if (0 == ++*m->version_butex) {      // plain write
           ++*m->version_butex;
       }
   }                                         // spinlock unlock (release)
   
   然而消费端(join)通过普通读 *m->version_butex 退出 while 循环,没有任何 acquire 屏障,因此写端 
version_lock unlock 的 release 在读端没有匹配的 acquire。
   
   在 x86 (TSO) 上,由于硬件强内存序,此问题被掩盖。在 ARM 弱内存序下,joining 线程可能因为 invalidate queue 
未及时处理而观察到过期的缓存行数据。
   
   
   **To Reproduce**
   在生产环境 (aarch64) 中,bthread_join() 返回后,joining 线程从被 join 线程通过 
std::vector::emplace_back() 写入的 vector 中读取到了 0xa5a5a5a5a5a5a5a5(jemalloc 的 
JEMALLOC_ALLOC_JUNK 填充值)。在被 join 线程内部,写入时已验证指针正常
   
   **Expected behavior**
   
   
   **Versions**
   OS: openEuler 22.03 (LTS-SP4)
   Compiler:gcc (GCC) 14.3.1
   brpc: 1.10
   protobuf:
   
   **Additional context/screenshots**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to