behlendorf requested changes on this pull request.
> + * vdevs. In either case, we try every combination. This ensures that if
+ * a mirror has small silent errors on all of its children, we can still
+ * reconstruct the correct data, as long as those errors are at
+ * sufficiently-separated offsets (specifically, separated by the largest
+ * block size - default of 128KB, but up to 16MB).
+ */
+static void
+vdev_indirect_reconstruct_io_done(zio_t *zio)
+{
+ indirect_vsd_t *iv = zio->io_vsd;
+
+ for (;;) {
+ /* copy data from splits to main zio */
+ int ret;
+ for (indirect_split_t *is = list_head(&iv->iv_splits);
+ is != NULL; is = list_next(&iv->iv_splits, is)) {
When testing with `zloop.sh` I observed that when there are a large number of
segment copies and the damaged copy is one of the "high bits" then this
function can't be expected to check everything. This manifested itself as a
`zdb` hang in my case and a few suspended pools due to stalling the IO pipeline.
I'm proposing
https://github.com/behlendorf/zfs/commit/2ad01af1b4a50117779aab7e47c7006a7f2451e6
as a fix . It slightly reworks the logic in
`vdev_indirect_reconstruct_io_done` to adopt a random sampling apprach to
reconstruction when there are a large number of segments, otherwise we use the
existing logic and check everything. We could further optimize this if we
decide it's needed.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#pullrequestreview-107483336
------------------------------------------
openzfs: openzfs-developer
Permalink:
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M78532ea4d3a8c8e84252d238
Delivery options: https://openzfs.topicbox.com/groups