jemc opened a new issue, #2853:
URL: https://github.com/apache/kvrocks/issues/2853

   ### Search before asking
   
   - [x] I had searched in the 
[issues](https://github.com/apache/kvrocks/issues) and found no similar issues.
   
   
   ### Version
   
   This is data race bug is present both on the latest release tag (2.11.1) and 
on the latest unstable branch commit 
(1639c90f094728fee860c3c0f518350df40ecb90), here:
   
https://github.com/apache/kvrocks/blob/1639c90f094728fee860c3c0f518350df40ecb90/src/server/server.h#L202
   
   ### Minimal reproduce step
   
   This bug was discovered just by reading the code.
   
   Data races are usually difficult to trigger reliably in practice, so I don't 
have steps which would reliably trigger misbehavior at runtime.
   
   
   ### What did you expect to see?
   
   I expect to see writable fields of `Server` to be guarded by either locks or 
atomic operations, for both writes and reads, if that field is 
writable/readable from concurrent threads, to avoid data races.
   
   Failing to do so results in undefined behavior, up to and including SIGSEGV.
   
   ### What did you see instead?
   
   The `Server::IsSlave()` method is implemented by reading the `master_host_` 
field and checking if it's an empty string:
   
https://github.com/apache/kvrocks/blob/1639c90f094728fee860c3c0f518350df40ecb90/src/server/server.h#L202
   
   This field can be written to at runtime, whenever a new master is assigned 
(e.g. via the `REPLICAOF`/`SLAVEOF` commands), inside the `Server::AddMaster` 
method here:
   
https://github.com/apache/kvrocks/blob/a2051946b655d46ffbdfc93b71b8f4d91d9d4868/src/server/server.cc#L305
   
   That method which writes to this field is guarded by the `slaveof_mu_` lock 
here:
   
https://github.com/apache/kvrocks/blob/a2051946b655d46ffbdfc93b71b8f4d91d9d4868/src/server/server.cc#L278
   
   But the `Server::IsSlave()` method is _not_ guarded by the `slaveof_mu_` 
lock: 
   
https://github.com/apache/kvrocks/blob/1639c90f094728fee860c3c0f518350df40ecb90/src/server/server.h#L202
   
   And it is called from call sites that definitely originate in other threads, 
and are _not_ guarded by the `slaveof_mu_` lock.
   
   One such example (though there are probably more) is inside the 
`Connection::ExecuteCommands()` method, here:
   
https://github.com/apache/kvrocks/blob/1639c90f094728fee860c3c0f518350df40ecb90/src/server/redis_connection.cc#L488-L491
   
   In that example, if a `REPLICAOF`/`SLAVEOF` command (which writes this field 
via the `Server::AddMaster()` method) is being processed in parallel (on 
multiple worker threads) with any other command (which reads this field via the 
`Server::IsSlave()` method that is checked before executing), then we will have 
a write colliding with a parallel read to the same memory - a classic example 
of a data race. Undefined behavior will result, up to and including a potential 
`SIGSEGV`.
   
   ### Anything Else?
   
   Note that introducing a guard with the `slaveof_mu_` lock inside 
`Server::IsSlave` would solve this issue, but it will also have a performance 
cost. It might be worth coming up with a design based on atomic operations 
instead, though that will require more subtle thinking and will still not be 
zero-overhead.
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to