Re: [Gluster-devel] Some questions about requisites of translators

Xavier Hernandez Mon, 07 May 2012 01:08:15 -0700

On 05/05/2012 08:02 AM, Anand Avati wrote:

On Wed, May 2, 2012 at 3:55 AM, Xavier Hernandez<[email protected] <mailto:[email protected]>> wrote:
    Hello,

    I'm wondering if there are any requisites that translators must
    satisfy to work correctly inside glusterfs.

    In particular I need to know two things:

    1. Are translators required to respect the order in which they
    receive the requests ?

    This is specially important in translators such as
    performance/io-threads or caching ones. It seems that these
    translators can reorder requests. If this is the case, is there
    any way to force some order between requests ? can inodelk/entrylk
    be used to force the order ?
Translators are not expected to maintain ordering of requests. Theonly translator which takes care of ordering calls is write-behind.After acknowledging back write requests it has to make sure futurerequests see the true "effect" as though the previous write actuallycompleted. To that end, it queues future "dependent" requests till thewrite acknowledgement is received from the server.
inodelk/entrylk calls help achieve synchronization among clients (bygetting into a critical section) - just like a mutex. It is anarbitrator. It does not help for ordering of two calls. If one callmust strictly complete after another call from your translator's pointof view (i.e, if it has such a requirement), then the latter call'sSTACK_WIND must happen in the callback of the former's STACK_UNWINDpath. There are no guarantees maintained by the system to ensure thata second STACK_WIND issued right after a first STACK_WIND willcomplete and callback in the same order. Write-behind does all itsordering gimmicks only because it STACK_UNWINDs a write callprematurely and therefore must maintain the causal effects by means ofqueueing new requests behind the downcall towards the server.

Good to know

    2. Are translators required to propagate callback arguments even
    if the result of the operation is an error ? and if an internal
    translator error occurs ?
Usually no. If op_ret is -1, only op_errno is expected to be a usablevalue. Rest of the callback parameters are junk.
    When a translator has multiple subvolumes, I've seen that some
    arguments, such as xdata, are replaced with NULL. This can be
    understood, but are regular translators (those that only have one
    subvolume) allowed to do that or must they preserve the value of
    xdata, even in the case of an internal error ?
It is best to preserve the arguments unless you know specifically whatyou are doing. In case of error, all the non-op_{ret,errno} argumentsare typically junk, including xdata.
    If this is not a requisite, xdata loses it's function of
    delivering back extra information.
Can you explain? Are you seeing a use case for having a valid xdata inthe callback even with op_ret == -1?

As a part of a translator that I'm developing that works with multiplesubvolumes, I need to implement some healing support to mantain datacoherency (similar to AFR). After some thought, I decided that it couldbe advantageous to use a dedicated healing translator located near thebottom of the translators stack on the servers. This translator won'twork by itself, it only adds support to be used by a higher leveltranslator, which have to manage the logic of the healing and decidewhen a node needs to be healed.

To do this, sometimes I need to return an error because an operationcannot be completed due to some condition related with healing itself(not with the underlying storage). However I need to send some specifichealing information to let the upper translator know how it has tohandle the detected condition.

I cannot send a success answer because intermediate translators couldtake the fake data as valid and they could begin to operate incorrectlyor even create inconsistencies. The other alternative is to use op_errnoto encode the extra data, but this will also be difficult, evenimpossible in some cases, due to the amount of data and the complexityto combine it with an error code without mislead intermediatetranslators with strange or invalid error codes.

I talked with John Mark about this translator and he suggested me todiscuss it over the list. Therefore I'll initiate another thread toexpose in more detail how it works and I would appreciate very much youropinion, and that of the other developers, about it. Especially if itcan really be faster/safer that other solutions or not, or if you findany problem or have any suggestion to improve it. I think it could alsobe used by AFR and any future translator that may need some healingcapabilities.


Thank you very much,

Xavi

_______________________________________________
Gluster-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Some questions about requisites of translators

Reply via email to