[Nbd] libnbd

Wouter Verhelst Thu, 18 Apr 2013 02:37:15 -0700

So, it's probably time I start thinking about how to implement that.

Since I want to end up with a clean API, it's probably best to start
from what an ideal API would look like, and then implement that, rather
than to start from the current code and try to nudge that into a clean
API. That would probably fail; and having a clean API to work towards
would also encourage me to clean up the current code where necessary.


To get a clean API, it's probably a good idea to first figure out what
we would like to allow library users to do. I'm envisioning several use
cases:

* Replacing the backend: something like qemu-nbd or gznbd would be
implemented by having libnbd do "almost everything", except that the
actual reads and writes are performed by that application.
* Extending the backend: you'd notify libnbd somehow that you support
this extra option in the protocol (which can then be negotiated with the
client) or in the config file, and that if that option is enabled, that
this particular function needs to be called at a particular place during
the handling of a request. We could use this to implement, say, the
copy-on-write feature. This should be implemented carefully enough so
that users can optionally choose to replace the copy-on-write
implementation by something else; this could make sense for backends
that natively support snapshots or similar features.
* Extending the protocol: something like xnbd (which has an additional
protocol message for synchronization with failover NBD servers) would
notify libnbd that it supports an extra option (which can then be
negotiated with with the client). If this option is enabled during
negotiation and the client then sends a particular message type or a
message with a particular flag, a particular function should be called
to handle it.
* Alternate implementation of particular bits of the library, for
performance improvements. For instance, one might wish to replace the
select() etc calls with things like libevent.
* Alternate protocol handling. For instance, someone might wish to
implement unix domain socket handling, rather than TCP sockets.
* Embedding. You would use everything from libnbd, but the main loop
would be implemented elsewhere since you have an application that just
happens to be exporting something over NBD, but does a load of other
things as well.

I believe that's about it.

To get at something like that, I think it's pretty obvious we'll need a
state machine. This would need to have the following features:
* A function pointer to be invoked when entering a given state.
* A condition which would cause the state to be entered. This could be
things like "socket X is ready to be read", "we have an outstanding
request and flag Y was negotiated with the client", "we have an
outstanding request and flag Z was enabled in configuration".
* For reasons of performance, *most* lookups in the state machine would
preferably not use a hash table but O(1) algorithms instead. E.g., we
could have a hash table of possible states out of which an actual state
machine is built at accept() time for a client socket, which then uses
->next_state pointers or some such.
* We'd need some functions to create new states.
* There should be some API to be able to explicitly set a particular
state, or to set a particular flag in the state machine.
* Some states may need the state machine to skip or ignore if conditions
aren't satisfied (e.g., a copy-on-write state would need to be
skipped/ignored if the option isn't enabled; or a "sync data for this
request after the write" state would need to be skipped if the request
we're handling doesn't have the FUA flag set) while others may need to
the state machine to wait until all conditions are met (e.g., the "read
data" state shouldn't be entered until the socket actually has data
waiting). Maybe some states may need the state machine to wait in some
cases but skip in others for one and the same state?

I'm undecided whether the state machine should be primarily linked to
the socket or primarily linked to a request. In the former case,
select() would just need to ensure that a state machine is moved from
the "waiting for data"  to the "ready to read" state (or not touched if
it isn't in the "waiting for data" state), which would be fairly easy to
implement and shouldn't have a lot of performance issues, but would make
handling requests in parallel fairly complicated. In the latter case,
handling requests in parallel should be fairly trivial (we just read
requests from a socket and create a new state machine instance), but
doing so quickly might be an issue.

Negotiation would need to be pretty much rewritten. Negotiation needs to
be pretty much rewritten regardless, so that's not really an issue. I'm
thinking of:
* Having a data structure in which the key is the NBD_OPT_* value that
the client would send
* The value in that data structure would contain a function pointer (for
options that need to calculate something) or just some data to send back
(for options that only affect the state machine later on)
* The negotiate() function would then just do the initial negotiation
(NBDMAGIC, flags, etc) and loop over option haggling with a hash table
rather than a switch() statement.

...I think that pretty much covers it.

Thoughts? Anything I missed?

Thanks,

-- 
Copyshops should do vouchers. So that next time some bureaucracy
requires you to mail a form in triplicate, you can mail it just once,
add a voucher, and save on postage.

------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Nbd-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nbd-general

[Nbd] libnbd

Reply via email to