On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdill...@redhat.com>
> Maxim, can you provide steps for a reproducer?
Yes, but it involves adding two artificial delays: one in tcmu-runner and
another in kernel iscsi. If you're willing to take pains of recompiling
kernel and tcmu-runner on one of gateway nodes, I'll help to reproduce.
Generally, the idea of reproducer is simple: let's model a situation when
two stale requests got stuck in kernel mailbox waiting to be consumed by
tcmu-runner, and another one got stuck in iscsi layer -- immediately after
reading iscsi request from the socket. If we unblock tcmu-runner after
newer data went through another gateway, the first stale request will
switch tcmu-runner state from LOCKED to UNLOCKED state, then the second
stale request will trigger alua_thread to re-acquire the lock, so when the
third request comes to tcmu-runner, the lock is already reacquired and it
goes to OSD smoothly overwriting newer data.
> On Wed, Mar 14, 2018 at 2:06 PM, Maxim Patlasov <mpatla...@skytap.com>
> > On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie <mchri...@redhat.com>
> >> On 03/11/2018 08:54 AM, shadow_lin wrote:
> >> > Hi Jason,
> >> > How the old target gateway is blacklisted? Is it a feature of the
> >> > gateway(which can support active/passive multipath) should provide or
> >> > it only by rbd excusive lock?
> >> > I think excusive lock only let one client can write to rbd at the same
> >> > time,but another client can obtain the lock later when the lock is
> >> > released.
> >> For the case where we had the lock and it got taken:
> >> If IO was blocked, then unjammed and it has already passed the target
> >> level checks then the IO will be failed by the OSD due to the
> >> blacklisting. When we get IO errors from ceph indicating we are
> >> blacklisted the tcmu rbd layer will fail the IO indicating the state
> >> change and that the IO can be retried. We will also tell the target
> >> layer rbd does not have the lock anymore and to just stop the iscsi
> >> connection while we clean up the blacklisting, running commands and
> >> update our state.
> > Mike, can you please give more details on how you tell the target layer
> > does not have the lock and to stop iscsi connection. Which
> > tcmu-runner/kernel-target functions are used for that?
> > In fact, I performed an experiment with three stale write requests stuck
> > blacklisted gateway, and one of them managed to overwrite newer data. I
> > followed all instructions from
> > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/
> > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/, so I'm
> > what I'm missing...
> > Thanks,
> > Maxim
> > Thanks,
> > Maxim
ceph-users mailing list