We have been doing regular performance runs using various workloads over
NFS(v3,v4.1), SMB3, iSCSI and FC16 & 32 for the past few years.
Compression is enabled for all datasets and zvols in our runs. What we
have observed is, under load, compression consumes the highest CPU
cycles, after that it is a toss up of dnode locking (a well known issue)
and other things that might come into play depending on the protocol.
At least in our use cases check summing of blocks does not appear to an
issue.
-Sanjay
On 10/14/22 10:15 AM, Garrett D'Amore wrote:
I can tell from past experiences that offloads like what you are
proposing are rarely worth it. The set up and tear down of the
mappings to allow the data transport are not necessarily cheap. You
can avoid that by having a preallocated region, but then you need to
copy the data. Fortunately for this case you only need to copy once,
since the result will be very small compared to the data.
Then there is the complexity (additional branches, edge cases, etc.)
that have to be coded. These become performance sapping as well.
Add to this the fact that CPUs are always getting faster, and
advancements like extensions to the SIMD instructions mean that the
disparity between the offload and just doing the natural thing inline
gets ever smaller.
At the end of the day, it’s often the case that your “offload” is
actually a performance killer.
The exceptions to this are when the work is truly expensive. For
example, running (in the old days) RSA on an offload engine makes a
lot of sense. (I’m not sure it does for elliptic curve crypto
though.) Running 3DES (again if you wanted to do that, which you
should not) used to make sense. AES used to, but with AES-NI not
anymore. I suspect that for SHA2 its a toss up. Fletcher probably
does not make sense. If you want to compress, LZJB does not make
sense, but GZIP (especially at higher levels) would, if you had such a
device.
Algorithms are always getting better (newer ones that are more
optimized for actual CPUs etc.) and CPUs are always improving — the
GPU is probably best reserved for truly expensive operations for which
it was designed — complex transforms for 3D rendering, expensive
hashing (although I wish that wasn’t a thing), long running scientific
analysis, machine learning, etc.
As an I/O accelerator, not so much.
On Oct 14, 2022, 7:52 AM -0700, Thijs Cramer <thijs.cra...@gmail.com>,
wrote:
I've been searching the GitHub Repository and the Mailing list, but
couldn't find any discussion about this.
I know it's probably silly, but I would like to understand the workings.
Let's say one could offload the Checksumming process to a dedicated
GPU. This might save some amount of CPU, *but* might increase
latency incredibly.
To my understanding ZFS uses the Fletcher4 Checksum Algorithm by
default, and this requires a pass of the data in-memory as it
calculates the checksum. If we skip this step, and instead send the
data to the GPU, that would also require a pass of the data (no gains
there).
The actual calculation is not that hard for a CPU it seems, there are
specific SIMD instructions for calculating specific Checksums, and
after a quick pass over the code, it seems they are already used (if
available).
I think the only time that a GPU could calculate checksums 'faster',
is with a form of readahead.
If you would pre-read a lot of data, and dump it to the GPU's
internal memory, and make the GPU calculate checksums of the entire
block in parallel, it might be able to do it faster than a CPU.
Has anyone considered the idea?
- Thijs
*openzfs <https://openzfs.topicbox.com/latest>* / openzfs-developer /
see discussions <https://openzfs.topicbox.com/groups/developer> +
participants <https://openzfs.topicbox.com/groups/developer/members> +
delivery options
<https://openzfs.topicbox.com/groups/developer/subscription> Permalink
<https://openzfs.topicbox.com/groups/developer/T2be6db01da63a639-M522b09520eb8e026499c20e8>
------------------------------------------
openzfs: openzfs-developer
Permalink:
https://openzfs.topicbox.com/groups/developer/T2be6db01da63a639-M718cf4283623ae2e907b2356
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription