Re: [ceph-users] Power outages!!! help!

Tomasz Kusmierz Mon, 28 Aug 2017 15:17:40 -0700

Rule of thumb with batteries is:
- more “proper temperature” you run them at the more life you get out of them
- more battery is overpowered for your application the longer it will survive.


Get your self a LSI 94** controller and use it as HBA and you will be fine. but 
get MORE DRIVES !!!!! … 
> On 28 Aug 2017, at 23:10, hjcho616 <hjcho...@yahoo.com> wrote:
> 
> Thank you Tomasz and Ronny.  I'll have to order some hdd soon and try these 
> out.  Car battery idea is nice!  I may try that.. =)  Do they last longer?  
> Ones that fit the UPS original battery spec didn't last very long... part of 
> the reason why I gave up on them.. =P  My wife probably won't like the idea 
> of car battery hanging out though ha!
> 
> The OSD1 (one with mostly ok OSDs, except that smart failure) motherboard 
> doesn't have any additional SATA connectors available.  Would it be safe to 
> add another OSD host?
> 
> Regards,
> Hong
> 
> 
> 
> On Monday, August 28, 2017 4:43 PM, Tomasz Kusmierz <tom.kusmi...@gmail.com> 
> wrote:
> 
> 
> Sorry for being brutal … anyway 
> 1. get the battery for UPS ( a car battery will do as well, I’ve moded on ups 
> in the past with truck battery and it was working like a charm :D )
> 2. get spare drives and put those in because your cluster CAN NOT get out of 
> error due to lack of space
> 3. Follow advice of Ronny Aasen on hot to recover data from hard drives 
> 4 get cooling to drives or you will loose more ! 
> 
> 
>> On 28 Aug 2017, at 22:39, hjcho616 <hjcho...@yahoo.com 
>> <mailto:hjcho...@yahoo.com>> wrote:
>> 
>> Tomasz,
>> 
>> Those machines are behind a surge protector.  Doesn't appear to be a good 
>> one!  I do have a UPS... but it is my fault... no battery.  Power was pretty 
>> reliable for a while... and UPS was just beeping every chance it had, 
>> disrupting some sleep.. =P  So running on surge protector only.  I am 
>> running this in home environment.   So far, HDD failures have been very rare 
>> for this environment. =)  It just doesn't get loaded as much!  I am not sure 
>> what to expect, seeing that "unfound" and just a feeling of possibility of 
>> maybe getting OSD back made me excited about it. =) Thanks for letting me 
>> know what should be the priority.  I just lack experience and knowledge in 
>> this. =) Please do continue to guide me though this. 
>> 
>> Thank you for the decode of that smart messages!  I do agree that looks like 
>> it is on its way out.  I would like to know how to get good portion of it 
>> back if possible. =)
>> 
>> I think I just set the size and min_size to 1.
>> # ceph osd lspools
>> 0 data,1 metadata,2 rbd,
>> # ceph osd pool set rbd size 1
>> set pool 2 size to 1
>> # ceph osd pool set rbd min_size 1
>> set pool 2 min_size to 1
>> 
>> Seems to be doing some backfilling work.
>> 
>> # ceph health
>> HEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 2 pgs 
>> backfill_toofull; 74 pgs backfill_wait; 3 pgs backfilling; 108 pgs degraded; 
>> 6 pgs down; 6 pgs inconsistent; 6 pgs peering; 7 pgs recovery_wait; 16 pgs 
>> stale; 108 pgs stuck degraded; 6 pgs stuck inactive; 16 pgs stuck stale; 130 
>> pgs stuck unclean; 101 pgs stuck undersized; 101 pgs undersized; 1 requests 
>> are blocked > 32 sec; recovery 1790657/4502340 objects degraded (39.772%); 
>> recovery 641906/4502340 objects misplaced (14.257%); recovery 147/2251990 
>> unfound (0.007%); 50 scrub errors; mds cluster is degraded; no legacy OSD 
>> present but 'sortbitwise' flag is not set
>> 
>> 
>> 
>> Regards,
>> Hong
>> 
>> 
>> On Monday, August 28, 2017 4:18 PM, Tomasz Kusmierz <tom.kusmi...@gmail.com 
>> <mailto:tom.kusmi...@gmail.com>> wrote:
>> 
>> 
>> So to decode few things about your disk:
>> 
>>   1 Raw_Read_Error_Rate    0x002f  100  100  051    Pre-fail  Always      -  
>>     37
>> 37 read erros and only one sector marked as pending - fun disk :/ 
>> 
>> 181 Program_Fail_Cnt_Total  0x0022  099  099  000    Old_age  Always      -  
>>     35325174
>> So firmware has quite few bugs, that’s nice
>> 
>> 191 G-Sense_Error_Rate      0x0022  100  100  000    Old_age  Always      -  
>>     2855
>> disk was thrown around while operational even more nice.
>> 
>> 194 Temperature_Celsius    0x0002  047  041  000    Old_age  Always      -   
>>    53 (Min/Max 15/59)
>> if your disk passes 50 you should not consider using it, high temperatures 
>> demagnetise plate layer and you will see more errors in very near future.
>> 
>> 197 Current_Pending_Sector  0x0032  100  100  000    Old_age  Always      -  
>>     1
>> as mentioned before :)
>> 
>> 200 Multi_Zone_Error_Rate  0x002a  100  100  000    Old_age  Always      -   
>>    4222
>> your heads keep missing tracks … bent ? I don’t even know how to comment 
>> here.
>> 
>> 
>> generally fun drive you’ve got there … rescue as much as you can and throw 
>> it away !!!
>> 
>> 
> 
> 
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Power outages!!! help!

Reply via email to