Goswin,

> Isn't there a way to get the drive to tell you when it has actually
> commited the data to physical storage and to flush specific
> requests only?

There may be with certain drive technologies, but remember you aren't
writing to a drive. You aren't even writing to the Linux block layer.
You are writing to a VFS filesystem.

>>> Let the client send FLUSH
>>> requests instead. Same effect in the end.
>>
>> You can't control what the client sends (save for disabling stuff
>
> The server tells the client wether it supports FUA. If it doesn't and
> the client sends one then that is a protocol violation and should
> probably abort the connection.

I don't understand your point. As the server operator, you can
control whether or not you get sent FUA independently from FLUSH.
If you don't want it, don't set FUA in the config file for the disk.
It's that easy.

There are good reasons why you might want it (even if it is more
expensive than it needs be), including the fact that as I have explained
some filing systems are starting to use FUA without a flush.

Note that the option isn't even on by default!

> My understanding is that a FUA request from the upper layers gets turned
> into a FLUSH automatically when the driver doesn't support FUA. So if
> the nbd-client doesn't enable FUA for the kernel then any FUA request
> from a filesystem should send a FLUSH over the socket. Right?

Sure, and you will get an even more expensive operation as a result.
See the multiple file case. Why would you want that? (Note that if
you do want it, it's available to you by setting flush, and not
FUA in the per-disk config file).

>>>>> c) Requests should be ACKed as soon as possible to minimize the delay
>>>>>    until a client can savely issue a FLUSH.
>>>>
>>>> That's probably true performance wise as a general point, but there is
>>>> a complexity / safety / memory use tradeoff. If you ACK every request
>>>> as soon as it comes in, you will use a lot of memory.
>>>
>>> How do you figure that? For me a write request (all others can be freed
>>> once they send their reply) allways uses the same amount of memory from
>>> the time it gets read from the socket till the time it is written to
>>> disk (cache). The memory needed doesn't change wether you ACK it once it
>>> is read from the socket, when the write is issued or when the write
>>> returned.
>>
>> If you ACK a write request before you've written it somewhere, you
>> need to keep it in memory so you can write it later.
...
> That should make no difference to the client. If the kernel has 1000
> dirty pages it can legally send 1000 write request to the nbd-server
> without waiting for a single ACK. As long as the filesystem (or whatever
> uses the nbd device) doesn't run into a barrier and needs to drain its
> queue (e.g. for fsync()) there is no limit on the number of in-flight
> requests the kernel could have in parallel. Obviously in practice there
> will be some limits on the client side regarding the amount of in-flight
> requests and filesystems usualy hit a flush/fua all to quickly. The
> maximum of in-flight data can probably be seen with a simple dd.
>
> I agree that the server should have some limits on how much in-flight
> data it will allow before it pauses to parse more requests. There should
> probably be a config option to set this limit to prevent a client from
> causing an OOM situation, say default 100MB. I don't think filesystems,
> or other normal use, will hit that limit though.

No, this is something for the server to deal with, not the client. Only
the server knows whether it is running on a 512MB Intel Atom or a 128GB
multiprocessor machine. The server needs to consider whether it should
ACK write requests before dealing with them. Sometimes (for maximum speed)
it may want to ACK them immediately. Sometimes (for simplicity - see current
code) it will want to deal with them before ACKing them. Sometimes (for
memory reasons) it will want to start not ACKing them until it has room
to buffer them. Sometimes (for maximum safety) it will want not to ACK
them until they have been dealt with (current server in sync mode
for instance). It is just not true to say that requests should always
be ACK'ed as soon as possible.

-- 
Alex Bligh

------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery, 
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now. 
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Nbd-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nbd-general

Reply via email to