behlendorf requested changes on this pull request.


> + * vdevs.  In either case, we try every combination.  This ensures that if
+ * a mirror has small silent errors on all of its children, we can still
+ * reconstruct the correct data, as long as those errors are at
+ * sufficiently-separated offsets (specifically, separated by the largest
+ * block size - default of 128KB, but up to 16MB).
+ */
+static void
+vdev_indirect_reconstruct_io_done(zio_t *zio)
+{
+       indirect_vsd_t *iv = zio->io_vsd;
+
+       for (;;) {
+               /* copy data from splits to main zio */
+               int ret;
+               for (indirect_split_t *is = list_head(&iv->iv_splits);
+                   is != NULL; is = list_next(&iv->iv_splits, is)) {

When testing with `zloop.sh` I observed that when there are a large number of 
segment copies and the damaged copy is one of the "high bits" then this 
function can't be expected to check everything.  This manifested itself as a 
`zdb` hang in my case and a few suspended pools due to stalling the IO pipeline.

I'm proposing 
https://github.com/behlendorf/zfs/commit/2ad01af1b4a50117779aab7e47c7006a7f2451e6
 as a fix . It slightly reworks the logic in 
`vdev_indirect_reconstruct_io_done` to adopt a random sampling apprach to 
reconstruction when there are a large number of segments, otherwise we use the 
existing logic and check everything.  We could further optimize this if we 
decide it's needed.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/591#pullrequestreview-107483336
------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/discussions/T497ad70ebb1401b5-M78532ea4d3a8c8e84252d238
Delivery options: https://openzfs.topicbox.com/groups

Reply via email to