I am looking at a rare but nasty case of corruption in which a block pointer has one of ditto dva's out of range. This causes vdev_lookup_top() to return NULL for a vdev pointer. Going through the callers of vdev_lookup_top(), I noticed that in some cases NULL vdev are handled properly, while in others it is not. In particular, in case of an import, ditto blocks are handled by a mirror vdev code path that does not have proper handling for NULL and would panic the system.
My question, though, is not about the mirror (for which I have a fix that I will upstream eventually),but about metaslab allocator. In particular, I am looking at metaslab_allocate() and metaslab_allocate_dva(). In the latter function NULL vdev is properly handled in the case of a gang block (since a hint may be stale and the device may be gone). However, that very same function does not handle NULL vdev in case of a regular block pointer with ditto blocks. The dilemma I am facing here is that I can either just use rotor (i.e. mimic gang block behavior), or return an error immediately. In the latter case, the caller, metaslab_allocate() would handle it properly. I am inclined to go with the second option, but I would greatly appreciate an insight on this from someone who is familiar with the internal logic and the theory behind metaslab allocator. Thank you very much, Ilya.
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
