Just to circle back to this: Drives: Seagate ST8000NM0065 Controller: LSI 3108 RAID-on-Chip At the time, no BBU on RoC controller. Each OSD drive was configured as a single RAID0 VD.
What I believe to be the snake that bit us was the Seagate drives’ on-board caching. Using storcli to manage the controller/drive, the pdcache value for /cx/vx was set to default, which in this case is on. So now all of the VD’s have the pdcache value set to off. At the time the controller’s write-cache setting was also set to write back, and has since been set to write-through until BBU’s are installed. Below is an example of our current settings in use post power-event: > $ sudo /opt/MegaRAID/storcli/storcli64 /c0/v0 show all > Controller = 0 > Status = Success > Description = None > > > /c0/v0 : > ====== > > -------------------------------------------------------------- > DG/VD TYPE State Access Consist Cache Cac sCC Size Name > -------------------------------------------------------------- > 0/0 RAID0 Optl RW Yes RWTD - ON 7.276 TB ceph1 > -------------------------------------------------------------- > > Cac=CacheCade|Rec=Recovery|OfLn=OffLine|Pdgd=Partially Degraded|dgrd=Degraded > Optl=Optimal|RO=Read Only|RW=Read > Write|HD=Hidden|TRANS=TransportReady|B=Blocked| > Consist=ConsistentR=Read Ahead Always|NR=No Read Ahead|WB=WriteBack| > AWB=Always WriteBack|WT=WriteThrough|C=Cached IO|D=Direct IO|sCC=Scheduled > Check Consistency > > > PDs for VD 0 : > ============ > > ----------------------------------------------------------------------- > EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp > ----------------------------------------------------------------------- > 252:0 9 Onln 0 7.276 TB SAS HDD N N 4 KB ST8000NM0065 U > ----------------------------------------------------------------------- > > EID-Enclosure Device ID|Slt-Slot No.|DID-Device ID|DG-DriveGroup > DHS-Dedicated Hot Spare|UGood-Unconfigured Good|GHS-Global Hotspare > UBad-Unconfigured Bad|Onln-Online|Offln-Offline|Intf-Interface > Med-Media Type|SED-Self Encryptive Drive|PI-Protection Info > SeSz-Sector Size|Sp-Spun|U-Up|D-Down|T-Transition|F-Foreign > UGUnsp-Unsupported|UGShld-UnConfigured shielded|HSPShld-Hotspare shielded > CFShld-Configured shielded|Cpybck-CopyBack|CBShld-Copyback Shielded > > > VD0 Properties : > ============== > Strip Size = 256 KB > Number of Blocks = 1953374208 > VD has Emulated PD = No > Span Depth = 1 > Number of Drives Per Span = 1 > Write Cache(initial setting) = WriteThrough > Disk Cache Policy = Disabled > Encryption = None > Data Protection = Disabled > Active Operations = None > Exposed to OS = Yes > Creation Date = 17-06-2016 > Creation Time = 02:49:02 PM > Emulation type = default > Cachebypass size = Cachebypass-64k > Cachebypass Mode = Cachebypass Intelligent > Is LD Ready for OS Requests = Yes > SCSI NAA Id = 600304801bb4c0001ef6ca5ea0fcb283 Hopefully this configuration is a much safer configuration, and can help anyone else before incurring any destructive issues. The only less than great part of this configuration is the hit to write I/O due to less than optimal write scheduling compared to cached writes. Hope to enable write-back at the controller level after BBU installation. Thanks, Reed > On Sep 1, 2016, at 6:21 AM, Cloud List <[email protected]> wrote: > > > > On Thu, Sep 1, 2016 at 3:50 PM, Nick Fisk <[email protected] > <mailto:[email protected]>> wrote: > > > Op 31 augustus 2016 om 23:21 schreef Reed Dier <[email protected] > > > <mailto:[email protected]>>: > > > > > > > > > Multiple XFS corruptions, multiple leveldb issues. Looked to be result of > > > write cache settings which have been adjusted now. > > Reed, I realise that you are probably very busy attempting recovery at the > moment, but when things calm down, I think it would be very beneficial to the > list if you could expand on what settings caused this to happen. It might > just stop this happening to someone else in the future. > > Agree with Nick, when things settle down and (hopefully) all the data is > recovered, appreciate if Reed can share what kinid of write cache settings > can cause this problem and what adjustment was made to prevent this kind of > problem from happening. > > Thank you. > > -ip-
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
