Re: [ceph-users] Power outages!!! help!

Ronny Aasen Wed, 30 Aug 2017 07:19:11 -0700

On 30.08.2017 15:32, Steve Taylor wrote:

I'm not familiar with dd_rescue, but I've just been reading about it.I'm not seeing any features that would be beneficial in this scenariothat aren't also available in dd. What specific features give it"really a far better chance of restoring a copy of your disk" than dd?I'm always interested in learning about new recovery tools.

i see i wrote dd_rescue from old habit, but the package one should useon debian is gddrescue or also called gnu ddrecue.

this page have some details on the differences on dd vs the ddrescuevariants.

http://www.toad.com/gnu/sysadmin/index.html#ddrescue

kind regards
Ronny Aasen

------------------------------------------------------------------------

        
*Steve Taylor* | Senior Software Engineer |***StorageCraft TechnologyCorporation* <https://storagecraft.com>
380 Data Drive Suite 300 | Draper | Utah | 84020
*Office:* 801.871.2799 |

------------------------------------------------------------------------
If you are not the intended recipient of this message or received iterroneously, please notify the sender and delete it, together with anyattachments, and be advised that any dissemination or copying of thismessage is prohibited.
------------------------------------------------------------------------

On Tue, 2017-08-29 at 21:49 +0200, Willem Jan Withagen wrote:
On 29-8-2017 19:12, Steve Taylor wrote:
Hong, Probably your best chance at recovering any data withoutspecial, expensive, forensic procedures is to perform a dd from/dev/sdb to somewhere else large enough to hold a full disk imageand attempt to repair that. You'll want to use 'conv=noerror' withyour dd command since your disk is failing. Then you could eitherre-attach the OSD from the new source or attempt to retrieve objectsfrom the filestore on it.
Like somebody else already pointed out
In problem "cases like disk, use dd_rescue.
It has really a far better chance of restoring a copy of your disk

--WjW
I have actually done this before by creating an RBD that matches thedisk size, performing the dd, running xfs_repair, and eventuallyadding it back to the cluster as an OSD. RBDs as OSDs is certainly atemporary arrangement for repair only, but I'm happy to report thatit worked flawlessly in my case. I was able to weight the OSD to 0,offload all of its data, then remove it for a full recovery, atwhich point I just deleted the RBD. The possibilities afforded byCeph inception are endless. ☺ Steve Taylor | Senior SoftwareEngineer | StorageCraft Technology Corporation 380 Data Drive Suite300 | Draper | Utah | 84020 Office: 801.871.2799 | If you are notthe intended recipient of this message or received it erroneously,please notify the sender and delete it, together with anyattachments, and be advised that any dissemination or copying ofthis message is prohibited. On Mon, 2017-08-28 at 23:17 +0100,Tomasz Kusmierz wrote:
Rule of thumb with batteries is: - more “proper temperature” yourun them at the more life you get out of them - more battery isoverpowered for your application the longer it will survive. Getyour self a LSI 94** controller and use it as HBA and you will befine. but get MORE DRIVES !!!!! …
On 28 Aug 2017, at 23:10, hjcho616 <hjcho...@yahoo.com<mailto:hjcho...@yahoo.com>> wrote: Thank you Tomasz and Ronny.I'll have to order some hdd soon and try these out. Car batteryidea is nice! I may try that.. =) Do they last longer? Onesthat fit the UPS original battery spec didn't last very long...part of the reason why I gave up on them.. =P My wife probablywon't like the idea of car battery hanging out though ha! The OSD1(one with mostly ok OSDs, except that smart failure) motherboarddoesn't have any additional SATA connectors available. Would itbe safe to add another OSD host? Regards, Hong On Monday, August28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmierz@g mail.com> wrote:Sorry for being brutal … anyway 1. get the battery for UPS ( a carbattery will do as well, I’ve moded on ups in the past with truckbattery and it was working like a charm :D ) 2. get spare drivesand put those in because your cluster CAN NOT get out of error dueto lack of space 3. Follow advice of Ronny Aasen on hot to recoverdata from hard drives 4 get cooling to drives or you will loosemore !
On 28 Aug 2017, at 22:39, hjcho616 <hjcho...@yahoo.com<mailto:hjcho...@yahoo.com>> wrote: Tomasz, Those machines arebehind a surge protector. Doesn't appear to be a good one! I dohave a UPS... but it is my fault... no battery. Power was prettyreliable for a while... and UPS was just beeping every chance ithad, disrupting some sleep.. =P So running on surge protectoronly. I am running this in home environment. So far, HDDfailures have been very rare for this environment. =) It justdoesn't get loaded as much! I am not sure what to expect, seeingthat "unfound" and just a feeling of possibility of maybe gettingOSD back made me excited about it. =) Thanks for letting me knowwhat should be the priority. I just lack experience andknowledge in this. =) Please do continue to guide me though this.Thank you for the decode of that smart messages! I do agree thatlooks like it is on its way out. I would like to know how to getgood portion of it back if possible. =) I think I just set thesize and min_size to 1. # ceph osd lspools 0 data,1 metadata,2rbd, # ceph osd pool set rbd size 1 set pool 2 size to 1 # cephosd pool set rbd min_size 1 set pool 2 min_size to 1 Seems to bedoing some backfilling work. # ceph health HEALTH_ERR 22 pgs arestuck inactive for more than 300 seconds; 2 pgs backfill_toofull;74 pgs backfill_wait; 3 pgs backfilling; 108 pgs degraded; 6 pgsdown; 6 pgs inconsistent; 6 pgs peering; 7 pgs recovery_wait; 16pgs stale; 108 pgs stuck degraded; 6 pgs stuck inactive; 16 pgsstuck stale; 130 pgs stuck unclean; 101 pgs stuck undersized; 101pgs undersized; 1 requests are blocked
32 sec; recovery 1790657/4502340 objects degraded (39.772%);
recovery 641906/4502340 objects misplaced (14.257%); recovery147/2251990 unfound (0.007%); 50 scrub errors; mds cluster isdegraded; no legacy OSD present but 'sortbitwise' flag is not setRegards, Hong On Monday, August 28, 2017 4:18 PM, Tomasz Kusmierz<tom.kusmierz @gmail.com> wrote: So to decode few things aboutyour disk: 1 Raw_Read_Error_Rate 0x002f 100 100 051Pre-fail Always - 37 37 read erros and only one sectormarked as pending - fun disk :/ 181 Program_Fail_Cnt_Total0x0022 099 099 000 Old_age Always - 35325174 Sofirmware has quite few bugs, that’s nice 191 G-Sense_Error_Rate0x0022 100 100 000 Old_age Always - 2855 diskwas thrown around while operational even more nice. 194Temperature_Celsius 0x0002 047 041 000 Old_age Always- 53 (Min/Max 15/59) if your disk passes 50 you should notconsider using it, high temperatures demagnetise plate layer andyou will see more errors in very near future. 197Current_Pending_Sector 0x0032 100 100 000 Old_age Always- 1 as mentioned before :) 200 Multi_Zone_Error_Rate0x002a 100 100 000 Old_age Always - 4222 yourheads keep missing tracks … bent ? I don’t even know how tocomment here. generally fun drive you’ve got there … rescue asmuch as you can and throw it away !!!
_______________________________________________ ceph-users mailinglist ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Power outages!!! help!

Reply via email to