Thank you all  for your valuable information.
We survived and about 1 million files survived. At the first time I wanted get 
recovery professional under our support contract, but it's not possible to get 
the right guy in the right time.
So we had to do it on our own, roughly following the procedure Adrian 
mentioned, but we still felt risky and we needed good luck, now I feel that I 
do not want to do this ever again.

For your information,
dd_rescue showed that about 4MB at the almost end of the disk had bad sector. 
It took about 20 hrs to run for 1 TB SATA disk, we ran this on an OSS whose 
load was relatively small.

After inserting the fresh one into the original oss(oss07) in question, we 
found that mdadm with " -A --force" could assemble it with some errors, and 
it's state was  "active, degraded, Not Started", and we had to use the 
following to start and resync it.
echo "clean" > /sys/block/md12/md/array_state   
I didn't know other method to start it.

At the 1st try, we failed and two disks fell into faulty, maybe because at that 
times (we had a periodic maintenance), we rebooted the pair OSS node(oss08) to 
patch the lustre kernel(1.8.5), raid5 one-line fix which was mentioned by Kevin 
before.
For the next try, I updated the raid5 patched lustre kernel on oss07 and just 
power-cycled the jbod(J4400) and oss07 and then we made it without any error 
while resyncing and we found that just only 2 inodes were stale by running 
e2fsck.

Thank you also for the detailed information why we need periodic scrubbing.

Taeyoung Hong
Senior Researcher
Supercomputing Center, KISTI 

2012. 5. 8., 오전 4:24, Mark Hahn 작성:

>> I'd also recommend to start periodic scrubbing: We do this once per month
>> with low priority (~5MBPS) with little impact to the users.
> 
> yes.  and if you think a rebuild might overstress marginal disks,
> throttling via the dev.raid.speed_limit_max sysctl can help.
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to