[ceph-users] НА: XFS and nobarriers on Intel SSD

Межов Игорь Александрович Mon, 07 Sep 2015 04:09:08 -0700

Hi!

>And for the record, _ALL_ the drives I tested are faster on Intel SAS than on 
>LSI (2308) and
>often faster on a regular SATA AHCI then on their "high throughput" HBAs.


But most of Intel HBAs are LSI based. It is the same chips with slightly 
different firmware, i think.
We use RS2MB044, RS2BL080 and RS25CB080. The first and the second are LSI 2108 
based,
and the latter - LSI 2208 based. So, basicaly, the same problems can happens 
with Intel's HBA
also.

Megov Igor
CIO, Yuterra


________________________________
От: ceph-users <ceph-users-boun...@lists.ceph.com> от имени Jan Schermer 
<j...@schermer.cz>
Отправлено: 7 сентября 2015 г. 12:07
Кому: Richard Bade
Копия: ceph-us...@ceph.com
Тема: Re: [ceph-users] XFS and nobarriers on Intel SSD

Are you absolutely sure there's nothing in dmesg before this? There seems to be 
something missing. Is this from dmesg or a different log? There should be 
something before that. Usually if a drive drops out there is I/O error (itself 
caused by a timed-out SCSI command), and then the error recovery kicks in and 
emits such messages. But this message by itself just should not be there. Or is 
that with the debugging already enabled? In that case it's a red herring, not 
_the_ problem.
Synchronize cache is a completely ordinary command  - you _want_ it in there 
absolutely. The only case when you could avoid it is if you trust the 
capacitors on the drives _and_ the OS to order the requests right (a bold 
assumption IMO) by disabling barriers (btw that will not work for a journal on 
block device).

How often does this happen? You could try recording the events with "btrace" so 
you know what the block device is really doing from the kernel block device 
perspective.
In any case, this command should be harmless and is expected to occur quite 
often, so LSI telling you "don't do it" is like Ford telling me "your brakes 
are broken so don't use them when driving".

I'm getting real angry at LSI. We have problems with them as well and their 
support is just completely uselesss. And for the record, _ALL_ the drives I 
tested are faster on Intel SAS than on LSI (2308) and often faster on a regular 
SATA AHCI then on their "high throughput" HBAs.
The drivers have barely documented parameters and if you google a bit you'll 
find many people having problems with them (not limited to linux).

I'll definitely avoid LSI HBAs in the future if I can.

Feel free to mail me off-list, I'm very interested in your issue because I have 
the same combination (LSI + Intels) in my cluster right now. Seem to work fine 
though.

Jan


On 05 Sep 2015, at 01:04, Richard Bade 
<hitr...@gmail.com<mailto:hitr...@gmail.com>> wrote:

Hi Jan,
Thanks for your response.
How exactly do you know this is the cause? This is usually just an effect of 
something going wrong and part of error recovery process.
Preceding this event should be the real error/root cause...
We have been working with LSI/Avago to resolve this. We get a bunch of these 
type log events:

2015-09-04T14:58:59.169677+12:00 <server_name> ceph-osd: - ceph-osd:  
2015-09-04 14:58:59.168444 7fbc5ec71700  0 log [WRN] : slow request 30.894936 
seconds old, received at 2015-09-04 14:58:28.272976: 
osd_op(client.42319583.0:1185218039 rbd_data.1d8a5a92eb141f2.00000000000056a0 
[read 3579392~8192] 4.f9f016cb ack+read e66603) v4 currently no flag points 
reached

Followed by the task abort I mentioned:
 sd 11:0:4:0: attempting task abort! scmd(ffff8804c07d0480)
 sd 11:0:4:0: [sdf] CDB:
 Write(10): 2a 00 24 6f 01 a8 00 00 08 00
 scsi target11:0:4: handle(0x000d), sas_address(0x4433221104000000), phy(4)
 scsi target11:0:4: enclosure_logical_id(0x5003048000000000), slot(4)
 sd 11:0:4:0: task abort: SUCCESS scmd(ffff8804c07d0480)

LSI had us enable debugging on our card and send them many logs and debugging 
data. Their response was:
Please do not send in the Synchronize cache command(35h). That's the one 
causing the drive from not responding to Read/write commands quick enough.
A Synchronize cache command instructs the ATA device to flush the cache 
contents to medium and so while the disk is in the process of doing it, it's 
probably causing the read/write commands to take longer time to complete.
LSI/Avago believe this to be the root cause of the IO delay based on the 
debugging info.

and from what I've seen it is not necessary with fast drives (such as S3700).
While I agree with you that it should not be necessary as the S3700's should be 
very fast, our current experience does not show this to be the case.

Just a little more about our setup. We're using Ceph Firefly (0.80.10) on 
Ubuntu 14.04. We see this same thing on every S3700/10 on four hosts. We do not 
see this happening on the spinning disks in the same cluster but different pool 
on similar hardware.

If you know of any other reason this may be happening, we would appreciate it. 
Otherwise we will need to continue investigating the possibility of setting 
nobarriers.

Regards,
Richard

On 5 September 2015 at 09:32, Jan Schermer 
<j...@schermer.cz<mailto:j...@schermer.cz>> wrote:

We are seeing some significant I/O delays on the disks causing a "SCSI Task 
Abort" from the OS. This seems to be triggered by the drive receiving a 
"Synchronize cache command".


How exactly do you know this is the cause? This is usually just an effect of 
something going wrong and part of error recovery process.
Preceding this event should be the real error/root cause...

It is _supposedly_ safe to disable barriers in this scenario, but IMO the 
assumptions behind that are deeply flawed, and from what I've seen it is not 
necessary with fast drives (such as S3700).

Take a look in the mailing list archives, I elaborated on this quite a bit in 
the past, including my experience with Kingston drives + XFS + LSI (and the 
effect is present even on Intels, but because they are much faster it shouldn't 
cause any real problems).

Jan


On 04 Sep 2015, at 21:55, Richard Bade 
<hitr...@gmail.com<mailto:hitr...@gmail.com>> wrote:


Hi Everyone,

We have a Ceph pool that is entirely made up of Intel S3700/S3710 enterprise 
SSD's.

We are seeing some significant I/O delays on the disks causing a "SCSI Task 
Abort" from the OS. This seems to be triggered by the drive receiving a 
"Synchronize cache command".

My current thinking is that setting nobarriers in XFS will stop the drive 
receiving a sync command and therefore stop the I/O delay associated with it.

In the XFS FAQ it looks like the recommendation is that if you have a Battery 
Backed raid controller you should set nobarriers for performance reasons.

Our LSI card doesn't have battery backed cache as it's configured in HBA mode 
(IT) rather than Raid (IR). Our Intel s37xx SSD's do have a capacitor backed 
cache though.

So is it recommended that barriers are turned off as the drive has a safe cache 
(I am confident that the cache will write out to disk on power failure)?

Has anyone else encountered this issue?

Any info or suggestions about this would be appreciated.

Regards,

Richard

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] НА: XFS and nobarriers on Intel SSD

Reply via email to