On Sep 25, 2014, at 9:36 AM, Venci Vatashki <[email protected]> wrote:

> I expect when I have 2 out of 3 disk to be able to access pool as if it was 
> plain raid5 
> There is something called rewind, It should discard some transactions, lose 
> last writes.
> But pool should not be in FAULTED state. Currently trying to rewind by hand 
> and need suggestions for it.

Unfortunately, your expectation is faulty. You are expecting a system designed 
for surviving a
single failure to survive two failures. In your case, these are temporal 
failures. Since the removed
device did not get updated with the changes that took place since it was 
removed, you cannot
expect the state of the pool (or data) to be able to remain consistent for the 
second failure, "remove
another one." This is a double failure.

OTOH, if you had not "remove another one" until after the data was consistent 
on the first disk,
then it would have survived two single failures.
 -- richard

> 
> On Thu, Sep 25, 2014 at 6:30 PM, Richard Elling 
> <[email protected]> wrote:
> 
> On Sep 25, 2014, at 5:15 AM, Venci Vatashki <[email protected]> wrote:
> 
>> Hello,
>> I'm debugging a Faulted raidz1 on zfs-fuse.This is dev pool, created from 
>> loop devices and files. The way I broke is first to degrade pool by removing 
>> a device from it. After that do some transactions. ie copy file. 
>> Stop zfs and reattach the removed device and remove another one.
> 
> What did you expect to happen?
>  -- richard
> 
>> After restart this is what i get:
>>   pool: tank
>>  state: FAULTED
>> status: One or more devices could not be used because the label is missing
>>         or invalid.  There are insufficient replicas for the pool to continue
>>         functioning.
>> action: Destroy and re-create the pool from
>>         a backup source.
>>    see: http://www..sun.com/msg/ZFS-8000-5E
>>  scrub: none requested
>> config:
>> 
>>         NAME        STATE     READ WRITE CKSUM
>>         tank        FAULTED      0     0     0  corrupted data
>>           raidz1-0  ONLINE       0     0     0
>>             loop1   ONLINE       0     0     0
>>             loop2   UNAVAIL      0     0     0  corrupted data
>>             loop3   ONLINE       0     0     0
>> 
>> Question is since raidz1-0 is in ONLINE  state, should be able to restore 
>> data?
>> There should be enough copies from raid to continue fuctioning.
>> I'm interested in opening pool in some emergency mode to be able to see what 
>> is recoverable.
>> The code that fails is zap_lookup in dsl_pool_open
>>      err = zap_lookup(dp->dp_meta_objset, DMU_POOL_DIRECTORY_OBJECT,
>>          DMU_POOL_ROOT_DATASET, sizeof (uint64_t), 1,
>>          &dp->dp_root_dir_obj);
>> returns EIO(5)
>> Thanks for any ideas. Currently I'm thinking to locate a older uberblock and 
>> try with it.
>> I'm thinking that should work as a snapshot. Am I right?
>> Another option would be to play with txg parameter in dsl_pool_open
>> 
>> 
>> Stack trace:
>> 
>> dbuf_hold_impl(dn = 0x7ffff7e8c9f0, level = 0 \000, blkid = 0, fail_sparse = 
>> 0, tag = 0x510a30 <__func__.12754>, dbp = 0x7fffef1a07b0)
>> dbuf_hold(dn = 0x7ffff7e8c9f0, blkid = 0, tag = 0x510a30 <__func__.12754>)
>> dnode_hold_impl(os = 0x7ffff7f81c40, object = 1, flag = 1, tag = 0x50d8ba 
>> <__func__.13943>, dnp = 0x7fffef1a0950)
>> dnode_hold(os = 0x7ffff7f81c40, object = 1, tag = 0x50d8ba <__func__.13943>, 
>> dnp = 0x7fffef1a0950)
>> dmu_buf_hold(os = 0x7ffff7f81c40, object = 1, offset = 0, tag = 0x0, dbp = 
>> 0x7fffef1a0a10)
>> zap_lockdir(os = 0x7ffff7f81c40, obj = 1, tx = 0x0, lti = 0, fatreader = 
>> B_TRUE, adding = B_FALSE, zapp = 0x7fffef1a0ab0)
>> zap_lookup_norm(os = 0x7ffff7f81c40, zapobj = 1, name = 0x514393 
>> "root_dataset", integer_size = 8, num_integers = 1, buf = 0x7ffff7f83968, mt 
>> = MT_EXACT, realname = 0x0, rn_len = 0, ncp = 0x0)
>> zap_lookup(os = 0x7ffff7f81c40, zapobj = 1, name = 0x514393 "root_dataset", 
>> integer_size = 8, num_integers = 1, buf = 0x7ffff7f83968)
>> dsl_pool_open(spa = 0x7ffff7f9a000, txg = 128, dpp = 0x7ffff7f9a210)
>> spa_load_impl(spa = 0x7ffff7f9a000, pool_guid = 9993829304951742789, config 
>> = 0x7ffff7fc0f40, state = SPA_LOAD_OPEN, type = SPA_IMPORT_EXISTING, 
>> mosconfig = B_FALSE, ereport = 0x7fffef1a0ca8)
>> spa_load(spa = 0x7ffff7f9a000, state = SPA_LOAD_OPEN, type = 
>> SPA_IMPORT_EXISTING, mosconfig = B_FALSE)
>> spa_load_best(spa = 0x7ffff7f9a000, state = SPA_LOAD_OPEN, mosconfig = 0, 
>> max_request = 18446744073709551615, rewind_flags = 1)
>> spa_open_common(pool = 0x7ffff7ea7000 "tank", spapp = 0x7fffef1a0dd0, tag = 
>> 0x51a5a6 <__func__.14627>, nvpolicy = 0x0, config = 0x7fffef1a0e00)
>> spa_get_stats(name = 0x7ffff7ea7000 "tank", config = 0x7fffef1a0e00, altroot 
>> = 0x7ffff7ea8000 "", buflen = 8192)
>> zfs_ioc_pool_stats(zc = 0x7ffff7ea7000)
>> zfsdev_ioctl(dev = 0, cmd = 23045, arg = 140733420915696, flag = 0, cr = 
>> 0x7fffef1a0e90, rvalp = 0x0)
>> handle_connection(sock = 8)
>> zfsfuse_ioctl_queue_worker_thread(init = 0x79ddc0 <ioctl_queue>)
>> start_thread(arg = 0x7fffef1a1700)
>> clone()
>> _______________________________________________
>> developer mailing list
>> [email protected]
>> http://lists.open-zfs.org/mailman/listinfo/developer
> 
> 
> 
> 
> -- 
> Venci Vatashki

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to