RE: 'Racing read got wrong version' during proxy write testing

Wang, Zhiqiang Mon, 08 Jun 2015 04:33:05 -0700

Oops, we have another check in do_op which returns ealier when r is ENOENT and 
obc doesn't exist.


  if (r && (r != -ENOENT || !obc)) {
    dout(20) << __func__ << "find_object_context got error " << r << dendl;
    osd->reply_op_error(op, r);
    return;
  }

We can't simply remove this check as with the obs not existing case because the 
obc is used later...

-----Original Message-----
From: Wang, Zhiqiang 
Sent: Friday, June 5, 2015 8:29 AM
To: Samuel Just
Cc: David Zafman; Sage Weil; [email protected]
Subject: RE: 'Racing read got wrong version' during proxy write testing

Hi Sam,

You are right, making the 'copy get' as a cache op works for the promotion 
case, but it has some troubles in the copy from operation. I tried it yesterday 
and ran into some failures on the copy from operation. I think removing the 
check in do_op is the right answer. For coy_get, I think we can still return 
ENOENT along with the reqids in the payload, instead of returning 0 and 
indicating it doesn't exist in the payload. This can keep most of the 
'finishing promote' code unchanged. What do you think?

-----Original Message-----
From: Samuel Just [mailto:[email protected]] 
Sent: Friday, June 5, 2015 2:20 AM
To: Wang, Zhiqiang
Cc: David Zafman; Sage Weil; [email protected]
Subject: Re: 'Racing read got wrong version' during proxy write testing

Yeah, I think the right answer is to remove that check and restructure all of 
the existing reader ops which rely on it to return ENOENT from do_osd_ops.  For 
copy_get, we instead return 0 and indicate in the structured payload that the 
object logically does not exist (along with the relevant log entries).
-Sam

----- Original Message -----
From: "Samuel Just" <[email protected]>
To: "Zhiqiang Wang" <[email protected]>
Cc: "David Zafman" <[email protected]>, "Sage Weil" <[email protected]>, 
[email protected]
Sent: Thursday, June 4, 2015 11:12:14 AM
Subject: Re: 'Racing read got wrong version' during proxy write testing

copy_get can happen outside of the context of a cache op, so we can't 
(shouldn't) just flag it as a cache op.  I wonder whether it wouldn't be better 
to live without that check at all and let all of the ops in do_osd_ops 
individually return ENOENT as appropriate.
-Sam

----- Original Message -----
From: "Zhiqiang Wang" <[email protected]>
To: "David Zafman" <[email protected]>, "Sage Weil" <[email protected]>
Cc: [email protected]
Sent: Wednesday, June 3, 2015 7:08:35 PM
Subject: RE: 'Racing read got wrong version' during proxy write testing

Hi David,

Proxy write hasn't been merge into master yet. It's not likely this is causing 
#11511.

-----Original Message-----
From: David Zafman [mailto:[email protected]] 
Sent: Thursday, June 4, 2015 9:46 AM
To: Wang, Zhiqiang; Sage Weil
Cc: [email protected]
Subject: Re: 'Racing read got wrong version' during proxy write testing


I'm wonder if this issue could be the cause of #11511.  Could a proxy write 
have raced with the fill_in_copy_get() so object_info_t size doesn't correspond 
with the size of the object in the filestore?

David


On 6/3/15 6:22 PM, Wang, Zhiqiang wrote:
> Making the 'copy get' op to be a cache op seems like a good idea.
>
> -----Original Message-----
> From: Sage Weil [mailto:[email protected]]
> Sent: Thursday, June 4, 2015 9:14 AM
> To: Wang, Zhiqiang
> Cc: [email protected]
> Subject: RE: 'Racing read got wrong version' during proxy write 
> testing
>
> On Wed, 3 Jun 2015, Wang, Zhiqiang wrote:
>> I ran into the 'op not idempotent' problem during the testing today.
>> There is one bug in the previous fix. In that fix, we copy the reqids 
>> in the final step of 'fill_in_copy_get'. If the object is deleted, 
>> since the 'copy get' op is a read op, it returns earlier with ENOENT in 
>> do_op.
>> No reqids will be copied during promotion in this case. This again 
>> leads to the 'op not idempotent' problem. We need a 'smart' way to 
>> detect the op is a 'copy get' op (looping the ops vector doesn't seem
>> smart?) and copy the reqids in this case.
> Hmm.  I think the idea here is/was that that ENOENT would somehow include the 
> reqid list from PGLog::get_object_reqids().
>
> I think teh trick is getting it past the generic check in do_op:
>
>    if (!op->may_write() &&
>        !op->may_cache() &&
>        (!obc->obs.exists ||
>         ((m->get_snapid() != CEPH_SNAPDIR) &&
>       obc->obs.oi.is_whiteout()))) {
>      reply_ctx(ctx, -ENOENT);
>      return;
>    }
>
> Maybe we mark these as cache operations so that may_cache is true?
>
> Sam, what do you think?
>
> sage
>
>
>> -----Original Message-----
>> From: Sage Weil [mailto:[email protected]]
>> Sent: Tuesday, May 26, 2015 12:27 AM
>> To: Wang, Zhiqiang
>> Cc: [email protected]
>> Subject: Re: 'Racing read got wrong version' during proxy write 
>> testing
>>
>> On Mon, 25 May 2015, Wang, Zhiqiang wrote:
>>> Hi all,
>>>
>>> I ran into a problem during the teuthology test of proxy write. It is like 
>>> this:
>>>
>>> - Client sends 3 writes and a read on the same object to base tier
>>> - Set up cache tiering
>>> - Client retries ops and sends the 3 writes and 1 read to the cache 
>>> tier
>>> - The 3 writes finished on the base tier, say with versions v1, v2 
>>> and
>>> v3
>>> - Cache tier proxies the 1st write, and start to promote the object 
>>> for the 2nd write, the 2nd and 3rd writes and the read are blocked
>>> - The proxied 1st write finishes on the base tier with version v4, 
>>> and returns to cache tier. But somehow the cache tier fails to send 
>>> the reply due to socket failure injecting
>>> - Client retries the writes and the read again, the writes are 
>>> identified as dup ops
>>> - The promotion finishes, it copies the pg_log entries from the base 
>>> tier and put it in the cache tier's pg_log. This includes the 3 
>>> writes on the base tier and the proxied write
>>> - The writes dispatches after the promotion, they are identified as 
>>> completed dup ops. Cache tier replies these write ops with the 
>>> version from the base tier (v1, v2 and v3)
>>> - In the last, the read dispatches, it reads the version of the 
>>> proxied write (v4) and replies to client
>>> - Client complains that 'racing read got wrong version'
>>>
>>> In a previous discussion of the 'ops not idempotent' problem, we solved it 
>>> by copying the pg_log entries in the base tier to cache tier during 
>>> promotion. Seems like there is still a problem with this approach in the 
>>> above scenario. My first thought is that when proxying the write, the cache 
>>> tier should use the original reqid from the client. But currently we don't 
>>> have a way to pass the original reqid from cache to base. Any ideas?
>> I agree--I think the correct fix here is to make the proxied op be 
>> recognized as a dup.  We can either do that by passing in an optional reqid 
>> to the Objecter, or extending the op somehow so that both reqids are listed. 
>>  I think the first option will be cleaner, but I think we will also need to 
>> make sure the 'retry' count is preserved as (I think) we skip the dup check 
>> if retry==0.  And we probably want to preserve the behavior that a given 
>> (reqid, retry) only exists once in the system.
>>
>> This probably means adding more optional args to Objecter::read()...?
>>
>> sage
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel"
>> in the body of a message to [email protected] More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to [email protected] More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: 'Racing read got wrong version' during proxy write testing

Reply via email to