Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-10 Thread Menion
This would be a cosmetic fix.
The problem is that there is no good logic so the attached device use
the optimal read_capacity based on storage size.
I mean, I can understand that it is safer to go for read_capacity10
first because of the jungle of the USB attached storages, but when you
realize that read_capacity16 works and because of the storage size you
must use it, then I think the scsi layer shall go for it

2018-03-10 11:29 GMT+01:00 Christoph Hellwig :
> On Tue, Mar 06, 2018 at 09:40:56AM +0100, Menion wrote:
>> Hi all
>> Operating big capacity HDD such 8TB with complex filesystems like
>> BTRFS in RAID mode endup in dmesg get flooded by this log, due too
>> many capacity checks (opaque to the filesystem itself)
>> The logs come from here:
>>
>> https://elixir.bootlin.com/linux/latest/source/drivers/scsi/sd.c#L2508
>>
>> The general guideline tells that KERN_NOTICE (which is the default log
>> level for dmesg in most distribution) should report information for
>> any user interest
>> I think that this information is not really of user interest, rather
>> more of DEBUG interest
>> So my suggestion is to lower this log to KERN_DEBUG
>> Do you agree?
>
> That warning and log level is correct, but maybe we can add a flag
> so that we only print the warning once per device?


Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-10 Thread Christoph Hellwig
On Tue, Mar 06, 2018 at 09:40:56AM +0100, Menion wrote:
> Hi all
> Operating big capacity HDD such 8TB with complex filesystems like
> BTRFS in RAID mode endup in dmesg get flooded by this log, due too
> many capacity checks (opaque to the filesystem itself)
> The logs come from here:
> 
> https://elixir.bootlin.com/linux/latest/source/drivers/scsi/sd.c#L2508
> 
> The general guideline tells that KERN_NOTICE (which is the default log
> level for dmesg in most distribution) should report information for
> any user interest
> I think that this information is not really of user interest, rather
> more of DEBUG interest
> So my suggestion is to lower this log to KERN_DEBUG
> Do you agree?

That warning and log level is correct, but maybe we can add a flag
so that we only print the warning once per device?


Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-09 Thread Menion
>> static int sd_try_rc16_first(struct scsi_device *sdp)
>> {
>> if (sdp->host->max_cmd_len < 16)
>> return 0;
>
>
> option
>
>> if (sdp->try_rc_10_first)
>> return 0;
>
>
> option
>
>> if (sdp->scsi_level > SCSI_SPC_2)
>> return 1;
>> if (scsi_device_protection(sdp))
>> return 1;
>> return 0;
>
>
> option
>
>> }
>
>
> just picking one arbitrary option and not being entirely sure that's the
> code path but you mentioned USB to SATA bridge, it might be related to:
>

Steffen, since the reason why it goes for read_capacity_10 is that the
upper layer asked for try_rc_10_first = 1, would it be ok if, we
realize that the reported capacity of the attached HDDs is greater
than what it is possible to report via capacity_10 the scsi layer
clear this flag, so the following requests go for read_capacity_16?
Something like:

if (sd_try_rc16_first(sdp)) {
sector_size = read_capacity_16(sdkp, sdp, buffer);
if (sector_size == -EOVERFLOW)
goto got_data;
if (sector_size == -ENODEV)
return;
if (sector_size < 0)
sector_size = read_capacity_10(sdkp, sdp, buffer);
if (sector_size < 0)
return;
} else {
sector_size = read_capacity_10(sdkp, sdp, buffer);
if (sector_size == -EOVERFLOW)
goto got_data;
if (sector_size < 0)
return;
if ((sizeof(sdkp->capacity) > 4) &&
(sdkp->capacity > 0xULL)) {
int old_sector_size = sector_size;
sd_printk(KERN_NOTICE, sdkp, "Very big device. "
  "Trying
to use READ CAPACITY(16).\n");
sector_size = read_capacity_16(sdkp, sdp, buffer);
if (sector_size < 0) {
sd_printk(KERN_NOTICE, sdkp,
 "Using 0x
as device size\n");
sdkp->capacity = 1 + (sector_t) 0x;
sector_size = old_sector_size;
goto got_data;
}
/*
   The attached device needs read_capacity_16 and
read_capacity_16 works, go for it
   for the next capacity checks
*/
+  sdp->try_rc_10_first = 0;
}
}


Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-08 Thread Martin K. Petersen

Menion,

> So, assuming that there is no disconnection ad USB level (and it is
> not since I don't get any log of it), the question is: how can trigger
> a probe or call the sd_revalidate_disk?  Can it be the filesystem?

revalidate is either a function of either device discovery following a
controller reset or rereading the drive partition table.

-- 
Martin K. Petersen  Oracle Linux Engineering


Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-08 Thread Menion
neither, there are no dbg for kernel ppa in ubuntu :(

2018-03-08 12:10 GMT+01:00 Steffen Maier :
>
> On 03/08/2018 12:07 PM, Menion wrote:
>>
>> Unfortunately the Ubuntu kernel is not configured for ftrace or
>> kprobe, and I am operating this server so I am not sure if I will
>> eventually find the time and the risk to install a self-compiled
>> kernel
>
>
> systemtap?
>


Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-08 Thread Steffen Maier


On 03/08/2018 12:07 PM, Menion wrote:

Unfortunately the Ubuntu kernel is not configured for ftrace or
kprobe, and I am operating this server so I am not sure if I will
eventually find the time and the risk to install a self-compiled
kernel


systemtap?



Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-08 Thread Menion
Unfortunately the Ubuntu kernel is not configured for ftrace or
kprobe, and I am operating this server so I am not sure if I will
eventually find the time and the risk to install a self-compiled
kernel

2018-03-08 11:53 GMT+01:00 Steffen Maier :
>
> On 03/08/2018 11:34 AM, Menion wrote:
>>
>> I did some more test
>> This log is specific from the function sd_read_capacitysd_revalidate_disk
>>  From what I can see, it seems that it is called only when probing
>> newly attached devices
>> A quick look in the code I see that it is called by  sd_revalidate_disk
>> This function is registered by fops for the scsi device or called
>> directly by sd_probe (via sd_probe_async)
>> So, assuming that there is no disconnection ad USB level (and it is
>> not since I don't get any log of it), the question is: how can trigger
>> a probe or call the sd_revalidate_disk?
>> Can it be the filesystem?
>
>
> echo 1 > /sys/class/scsi_device/.../device/rescan
> ?
>
> That's what I meant with "sdev _rescan_" in my previous mail.
>
> Not sure what call paths lead to sd_revalidate_disk().
>
>> 2018-03-08 11:10 GMT+01:00 Menion :
>>>
>>> Anyhow, I checked something that I should have checked since the
>>> beginning.
>>> I have stopped smartd and I still get this log, so it is something
>>> else doing it, but does anyone have an idea how understand what
>>> subsystem is calling again and again the read_capacity_10?
>
>
> ftrace: kernel function trace
> [https://lwn.net/Articles/365835/, https://lwn.net/Articles/366796/]
> or dynamically attach a kprobe
> [https://www.kernel.org/doc/Documentation/trace/kprobetrace.txt]
> to see which process calls this (indirectly)
>
>>> 2018-03-08 10:16 GMT+01:00 Menion :

 I have tried it, but it does not work:

 [   39.230095] sd 0:0:0:0: [sda] Very big device. Trying to use READ
 CAPACITY(16).
>
>
 [  348.134002] sd 0:0:0:0: [sda] Very big device. Trying to use READ
 CAPACITY(16).
>
>
 [  657.963478] sd 0:0:0:0: [sda] Very big device. Trying to use READ
 CAPACITY(16).
>
>
 2018-03-07 18:14 GMT+01:00 Douglas Gilbert :
>
> On 2018-03-07 09:02 AM, Menion wrote:
>>
>> 2018-03-07 14:51 GMT+01:00 Steffen Maier :
>>>
>>> On 03/07/2018 09:24 AM, Menion wrote:
>
>
>>> but from then on, you only get it roughly once every 300 seconds,
>>> i.e. 5
>>> minutes
>>>
>>> that's where I suspect user space as trigger, unless there is a
>>> kernel
>>> feature I'm not aware of doing such sdev rescans
>>>
>>> preventing this would be a workaround
>
>
>> Is it possible that it is smartd? It is the only daemon that could do
>> some low level access to the device (bypassing the filesystem)
>
>
>https://github.com/mirror/smartmontools
>
> To check it is the revision (svn rev >= 4718) you need for this fix,
> look
> at the top of the ChangeLog file and look for today's date (20180307).
>
>
> Currently smartmontools only has a quirks database (and it is large)
> for ATA devices, not real or pseudo SCSI device, nor NVMe devices
> (yet).
> Hopefully this fix will be sufficient.
>
> If it does not work, please send me the details.
>
>
   /*
* Many devices do not respond properly to
 READ_CAPACITY_16.
* Tell the SCSI layer to try READ_CAPACITY_10
 first.
* However some USB 3.0 drive enclosures return
 capacity
* modulo 2TB. Those must use READ_CAPACITY_16
*/
   if (!(us->fflags & US_FL_NEEDS_CAP16))
   sdev->try_rc_10_first = 1;
>>>
>>>
>>> if that's the cause, maybe an entry in
>>> drivers/usb/storage/unusual_devs.h
>>> would help, but that's really just guessing as I'm not familiar with
>>> USB
>>
>>
>> It seems that the bridge does have an entry in unusual_devs.h:
>>
>> /* Reported by Michael Büsch  */
>> UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0116,
>> "JMicron",
>> "USB to ATA/ATAPI Bridge",
>> USB_SC_DEVICE, USB_PR_DEVICE, NULL,
>> US_FL_BROKEN_FUA ),
>>
>> VID:PID is 0x152d 0x0567, not sure what are the other two numbers, so
>> I went back and used another enclosure with same USB to SATA bridge.
>> The strange thing is that this other enclosure goes in UAS mode while
>> the one for which I am reporting the issue goes in usb-storage mode
>> because it gets somehow the quirks 0x5000
>> Unfortunately I cannot move these 5 HDDs in the other enclosure. So do
>> you think that it shall be reported to linux-usb maybe?
>
>
> --
> Mit freundlichen Grüßen / Kind regards
> Steffen Maier
>
> Linux on z Systems Development

Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-08 Thread Steffen Maier


On 03/08/2018 11:34 AM, Menion wrote:

I did some more test
This log is specific from the function sd_read_capacitysd_revalidate_disk
 From what I can see, it seems that it is called only when probing
newly attached devices
A quick look in the code I see that it is called by  sd_revalidate_disk
This function is registered by fops for the scsi device or called
directly by sd_probe (via sd_probe_async)
So, assuming that there is no disconnection ad USB level (and it is
not since I don't get any log of it), the question is: how can trigger
a probe or call the sd_revalidate_disk?
Can it be the filesystem?


echo 1 > /sys/class/scsi_device/.../device/rescan
?

That's what I meant with "sdev _rescan_" in my previous mail.

Not sure what call paths lead to sd_revalidate_disk().


2018-03-08 11:10 GMT+01:00 Menion :

Anyhow, I checked something that I should have checked since the beginning.
I have stopped smartd and I still get this log, so it is something
else doing it, but does anyone have an idea how understand what
subsystem is calling again and again the read_capacity_10?


ftrace: kernel function trace
[https://lwn.net/Articles/365835/, https://lwn.net/Articles/366796/]
or dynamically attach a kprobe
[https://www.kernel.org/doc/Documentation/trace/kprobetrace.txt]
to see which process calls this (indirectly)


2018-03-08 10:16 GMT+01:00 Menion :

I have tried it, but it does not work:

[   39.230095] sd 0:0:0:0: [sda] Very big device. Trying to use READ
CAPACITY(16).



[  348.134002] sd 0:0:0:0: [sda] Very big device. Trying to use READ
CAPACITY(16).



[  657.963478] sd 0:0:0:0: [sda] Very big device. Trying to use READ
CAPACITY(16).



2018-03-07 18:14 GMT+01:00 Douglas Gilbert :

On 2018-03-07 09:02 AM, Menion wrote:

2018-03-07 14:51 GMT+01:00 Steffen Maier :

On 03/07/2018 09:24 AM, Menion wrote:



but from then on, you only get it roughly once every 300 seconds, i.e. 5
minutes

that's where I suspect user space as trigger, unless there is a kernel
feature I'm not aware of doing such sdev rescans

preventing this would be a workaround



Is it possible that it is smartd? It is the only daemon that could do
some low level access to the device (bypassing the filesystem)



   https://github.com/mirror/smartmontools

To check it is the revision (svn rev >= 4718) you need for this fix, look
at the top of the ChangeLog file and look for today's date (20180307).



Currently smartmontools only has a quirks database (and it is large)
for ATA devices, not real or pseudo SCSI device, nor NVMe devices (yet).
Hopefully this fix will be sufficient.

If it does not work, please send me the details.



  /*
   * Many devices do not respond properly to
READ_CAPACITY_16.
   * Tell the SCSI layer to try READ_CAPACITY_10 first.
   * However some USB 3.0 drive enclosures return
capacity
   * modulo 2TB. Those must use READ_CAPACITY_16
   */
  if (!(us->fflags & US_FL_NEEDS_CAP16))
  sdev->try_rc_10_first = 1;


if that's the cause, maybe an entry in drivers/usb/storage/unusual_devs.h
would help, but that's really just guessing as I'm not familiar with USB


It seems that the bridge does have an entry in unusual_devs.h:

/* Reported by Michael Büsch  */
UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0116,
"JMicron",
"USB to ATA/ATAPI Bridge",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BROKEN_FUA ),

VID:PID is 0x152d 0x0567, not sure what are the other two numbers, so
I went back and used another enclosure with same USB to SATA bridge.
The strange thing is that this other enclosure goes in UAS mode while
the one for which I am reporting the issue goes in usb-storage mode
because it gets somehow the quirks 0x5000
Unfortunately I cannot move these 5 HDDs in the other enclosure. So do
you think that it shall be reported to linux-usb maybe?


--
Mit freundlichen Grüßen / Kind regards
Steffen Maier

Linux on z Systems Development

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-08 Thread Menion
Anyhow, I checked something that I should have checked since the beginning.
I have stopped smartd and I still get this log, so it is something
else doing it, but does anyone have an idea how understand what
subsystem is calling again and again the read_capacity_10?

2018-03-08 10:16 GMT+01:00 Menion :
> Hi
> I have tried it, but it does not work:
>
> [   39.230095] sd 0:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [   39.338032] sd 0:0:0:1: [sdb] Very big device. Trying to use READ
> CAPACITY(16).
> [   39.618268] sd 0:0:0:2: [sdc] Very big device. Trying to use READ
> CAPACITY(16).
> [   39.762801] sd 0:0:0:3: [sdd] Very big device. Trying to use READ
> CAPACITY(16).
> [   39.901059] sd 0:0:0:4: [sde] Very big device. Trying to use READ
> CAPACITY(16).
> [  348.134002] sd 0:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [  348.231327] sd 0:0:0:1: [sdb] Very big device. Trying to use READ
> CAPACITY(16).
> [  348.353151] sd 0:0:0:2: [sdc] Very big device. Trying to use READ
> CAPACITY(16).
> [  348.549558] sd 0:0:0:3: [sdd] Very big device. Trying to use READ
> CAPACITY(16).
> [  348.722858] sd 0:0:0:4: [sde] Very big device. Trying to use READ
> CAPACITY(16).
> [  657.963478] sd 0:0:0:0: [sda] Very big device. Trying to use READ
> CAPACITY(16).
> [  658.090253] sd 0:0:0:1: [sdb] Very big device. Trying to use READ
> CAPACITY(16).
> [  658.291130] sd 0:0:0:2: [sdc] Very big device. Trying to use READ
> CAPACITY(16).
> [  658.524039] sd 0:0:0:3: [sdd] Very big device. Trying to use READ
> CAPACITY(16).
> [  658.840440] sd 0:0:0:4: [sde] Very big device. Trying to use READ
> CAPACITY(16).
>
> smartd details (before I had the ver 6.5 of 2016)
>
> menion@Menionubuntu:/lib/firmware/brcm$ smartd --version
> smartd 6.7 (build date Mar  8 2018)
> [x86_64-linux-4.15.5-041505-generic] (local build)
> Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
>
> smartd comes with ABSOLUTELY NO WARRANTY. This is free
> software, and you are welcome to redistribute it under
> the terms of the GNU General Public License; either
> version 2, or (at your option) any later version.
> See http://www.gnu.org for further details.
>
> smartmontools release 6.7 dated 2017-11-05 at 15:20:58 UTC
> smartmontools SVN rev is unknown
> smartmontools build host: x86_64-pc-linux-gnu
> smartmontools build with: C++98, GCC 5.4.0 20160609
> smartmontools configure arguments: '--prefix=/usr'
> '--build=x86_64-linux-gnu' '--host=x86_64-linux-gnu'
> '--sysconfdir=/etc' '--mandir=/usr/share/man'
> '--with-initscriptdir=no' '--docdir=/usr/share/doc/smartmontools'
> '--with-savestates=/var/lib/smartmontools/smartd.'
> '--with-attributelog=/var/lib/smartmontools/attrlog.'
> '--with-exampledir=/usr/share/doc/smartmontools/examples/'
> '--with-drivedbdir=/var/lib/smartmontools/drivedb'
> '--with-systemdsystemunitdir=/lib/systemd/system'
> '--with-smartdscriptdir=/usr/share/smartmontools'
> '--with-smartdplugindir=/etc/smartmontools/smartd_warning.d'
> '--with-systemdenvfile=/etc/default/smartmontools' '--with-selinux'
> 'build_alias=x86_64-linux-gnu' 'host_alias=x86_64-linux-gnu'
> 'CXXFLAGS=-g -O2 -fPIC -fstack-protector-strong -Wformat
> -Werror=format-security -fsigned-char -Wall -O2'
> 'LDFLAGS=-Wl,-Bsymbolic-functions -fPIC -Wl,-z,relro -Wl,-z,now'
> 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CFLAGS=-g -O2 -fPIC
> -fstack-protector-strong -Wformat -Werror=format-security
> -fsigned-char -Wall -O2'
>
> to make sure that I picked and compiled the correct code I have
> checked the new set_rcap16_first method and it is here:
>
> menion@Menionubuntu:~/smartmontools$ cat scsicmds.cpp |grep set_rcap
> device->set_rcap16_first();
> menion@Menionubuntu:~/smartmontools$
>
> I have dome doubts about your code:
>
> if (avoid_rcap16) {
> res = scsiReadCapacity10(device, _lba, _size);
> if (res) {
> if (scsi_debugmode)
> pout("%s: READ CAPACITY(10) failed, res=%d\n", __func__, res);
> try_16 = true;
> } else { /* rcap10 succeeded */
> if (0x == last_lba) {
> /* so number of blocks needs > 32 bits to represent */
> try_16 = true;
> device->set_rcap16_first();
> } else {
> ret_val = last_lba + 1;
> if (srrp) {
> memset(srrp, 0, sizeof(*srrp));
> srrp->num_lblocks = ret_val;
> srrp->lb_size = lb_size;
> }
> }
> }
> }
>
> from the scsi kernel code I see that also read_capacity_10 return
> capacity in 64bit variable
>
> sector_size = read_capacity_10(sdkp, sdp, buffer);
> if (sector_size == -EOVERFLOW)
>   goto got_data;
> if (sector_size < 0)
>   return;
> if ((sizeof(sdkp->capacity) > 4) &&
>(sdkp->capacity > 0xULL)) {
>
> so is this if statement correct?
>
> } else { /* rcap10 succeeded */
> if (0x == last_lba) {
>
> I means, the last_lba 

Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-08 Thread Menion
Hi
I have tried it, but it does not work:

[   39.230095] sd 0:0:0:0: [sda] Very big device. Trying to use READ
CAPACITY(16).
[   39.338032] sd 0:0:0:1: [sdb] Very big device. Trying to use READ
CAPACITY(16).
[   39.618268] sd 0:0:0:2: [sdc] Very big device. Trying to use READ
CAPACITY(16).
[   39.762801] sd 0:0:0:3: [sdd] Very big device. Trying to use READ
CAPACITY(16).
[   39.901059] sd 0:0:0:4: [sde] Very big device. Trying to use READ
CAPACITY(16).
[  348.134002] sd 0:0:0:0: [sda] Very big device. Trying to use READ
CAPACITY(16).
[  348.231327] sd 0:0:0:1: [sdb] Very big device. Trying to use READ
CAPACITY(16).
[  348.353151] sd 0:0:0:2: [sdc] Very big device. Trying to use READ
CAPACITY(16).
[  348.549558] sd 0:0:0:3: [sdd] Very big device. Trying to use READ
CAPACITY(16).
[  348.722858] sd 0:0:0:4: [sde] Very big device. Trying to use READ
CAPACITY(16).
[  657.963478] sd 0:0:0:0: [sda] Very big device. Trying to use READ
CAPACITY(16).
[  658.090253] sd 0:0:0:1: [sdb] Very big device. Trying to use READ
CAPACITY(16).
[  658.291130] sd 0:0:0:2: [sdc] Very big device. Trying to use READ
CAPACITY(16).
[  658.524039] sd 0:0:0:3: [sdd] Very big device. Trying to use READ
CAPACITY(16).
[  658.840440] sd 0:0:0:4: [sde] Very big device. Trying to use READ
CAPACITY(16).

smartd details (before I had the ver 6.5 of 2016)

menion@Menionubuntu:/lib/firmware/brcm$ smartd --version
smartd 6.7 (build date Mar  8 2018)
[x86_64-linux-4.15.5-041505-generic] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

smartd comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
the terms of the GNU General Public License; either
version 2, or (at your option) any later version.
See http://www.gnu.org for further details.

smartmontools release 6.7 dated 2017-11-05 at 15:20:58 UTC
smartmontools SVN rev is unknown
smartmontools build host: x86_64-pc-linux-gnu
smartmontools build with: C++98, GCC 5.4.0 20160609
smartmontools configure arguments: '--prefix=/usr'
'--build=x86_64-linux-gnu' '--host=x86_64-linux-gnu'
'--sysconfdir=/etc' '--mandir=/usr/share/man'
'--with-initscriptdir=no' '--docdir=/usr/share/doc/smartmontools'
'--with-savestates=/var/lib/smartmontools/smartd.'
'--with-attributelog=/var/lib/smartmontools/attrlog.'
'--with-exampledir=/usr/share/doc/smartmontools/examples/'
'--with-drivedbdir=/var/lib/smartmontools/drivedb'
'--with-systemdsystemunitdir=/lib/systemd/system'
'--with-smartdscriptdir=/usr/share/smartmontools'
'--with-smartdplugindir=/etc/smartmontools/smartd_warning.d'
'--with-systemdenvfile=/etc/default/smartmontools' '--with-selinux'
'build_alias=x86_64-linux-gnu' 'host_alias=x86_64-linux-gnu'
'CXXFLAGS=-g -O2 -fPIC -fstack-protector-strong -Wformat
-Werror=format-security -fsigned-char -Wall -O2'
'LDFLAGS=-Wl,-Bsymbolic-functions -fPIC -Wl,-z,relro -Wl,-z,now'
'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CFLAGS=-g -O2 -fPIC
-fstack-protector-strong -Wformat -Werror=format-security
-fsigned-char -Wall -O2'

to make sure that I picked and compiled the correct code I have
checked the new set_rcap16_first method and it is here:

menion@Menionubuntu:~/smartmontools$ cat scsicmds.cpp |grep set_rcap
device->set_rcap16_first();
menion@Menionubuntu:~/smartmontools$

I have dome doubts about your code:

if (avoid_rcap16) {
res = scsiReadCapacity10(device, _lba, _size);
if (res) {
if (scsi_debugmode)
pout("%s: READ CAPACITY(10) failed, res=%d\n", __func__, res);
try_16 = true;
} else { /* rcap10 succeeded */
if (0x == last_lba) {
/* so number of blocks needs > 32 bits to represent */
try_16 = true;
device->set_rcap16_first();
} else {
ret_val = last_lba + 1;
if (srrp) {
memset(srrp, 0, sizeof(*srrp));
srrp->num_lblocks = ret_val;
srrp->lb_size = lb_size;
}
}
}
}

from the scsi kernel code I see that also read_capacity_10 return
capacity in 64bit variable

sector_size = read_capacity_10(sdkp, sdp, buffer);
if (sector_size == -EOVERFLOW)
  goto got_data;
if (sector_size < 0)
  return;
if ((sizeof(sdkp->capacity) > 4) &&
   (sdkp->capacity > 0xULL)) {

so is this if statement correct?

} else { /* rcap10 succeeded */
if (0x == last_lba) {

I means, the last_lba is read from the responde of read_capacity_10 in
scsiReadCapacity10 via this conversion:

if (last_lbap)
*last_lbap = get_unaligned_be32(resp + 0);

so are you sure that if the device return more than about 4TB the
value of this variable will be 0xUL?
Yes, I can make some test myself but today I have so little time to
experiment, so I have reported it to you I am sure that you will sort
out quickly if this is the problem or not
Bye


2018-03-07 18:14 GMT+01:00 Douglas Gilbert :
> On 2018-03-07 

Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-07 Thread Douglas Gilbert

On 2018-03-07 09:02 AM, Menion wrote:

2018-03-07 14:51 GMT+01:00 Steffen Maier :


On 03/07/2018 09:24 AM, Menion wrote:



...

but from then on, you only get it roughly once every 300 seconds, i.e. 5
minutes

that's where I suspect user space as trigger, unless there is a kernel
feature I'm not aware of doing such sdev rescans

preventing this would be a workaround



Is it possible that it is smartd? It is the only daemon that could do
some low level access to the device (bypassing the filesystem)


If you wait about 5 hours from the time of this post then go to the
smartmontools mirror at:
  https://github.com/mirror/smartmontools

To check it is the revision (svn rev >= 4718) you need for this fix, look
at the top of the ChangeLog file and look for today's date (20180307).
Assuming it is there, clone it then try to build smartmontools
( './autogen.sh ; ./configure ; make install') and try the new smartd ***.
You should get one warning per 8 TB device (for each run of smartd) and no
more.

Currently smartmontools only has a quirks database (and it is large)
for ATA devices, not real or pseudo SCSI device, nor NVMe devices (yet).
Hopefully this fix will be sufficient.

If it does not work, please send me the details.

Doug Gilbert


*** without a --prefix=/usr/sbin or similar option to .configure I think
that smartd will be placed in /usr/local/sbin which may be different
from where your distro places it. Your PATH will determine which one
is used.





 /*
  * Many devices do not respond properly to
READ_CAPACITY_16.
  * Tell the SCSI layer to try READ_CAPACITY_10 first.
  * However some USB 3.0 drive enclosures return capacity
  * modulo 2TB. Those must use READ_CAPACITY_16
  */
 if (!(us->fflags & US_FL_NEEDS_CAP16))
 sdev->try_rc_10_first = 1;



if that's the cause, maybe an entry in drivers/usb/storage/unusual_devs.h
would help, but that's really just guessing as I'm not familiar with USB



It seems that the bridge does have an entry in unusual_devs.h:

/* Reported by Michael Büsch  */
UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0116,
"JMicron",
"USB to ATA/ATAPI Bridge",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BROKEN_FUA ),

VID:PID is 0x152d 0x0567, not sure what are the other two numbers, so
I went back and used another enclosure with same USB to SATA bridge.
The strange thing is that this other enclosure goes in UAS mode while
the one for which I am reporting the issue goes in usb-storage mode
because it gets somehow the quirks 0x5000
Unfortunately I cannot move these 5 HDDs in the other enclosure. So do
you think that it shall be reported to linux-usb maybe?





Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-07 Thread Menion
2018-03-07 14:51 GMT+01:00 Steffen Maier :
>
> On 03/07/2018 09:24 AM, Menion wrote:
>>
> ...
>
> but from then on, you only get it roughly once every 300 seconds, i.e. 5
> minutes
>
> that's where I suspect user space as trigger, unless there is a kernel
> feature I'm not aware of doing such sdev rescans
>
> preventing this would be a workaround
>

Is it possible that it is smartd? It is the only daemon that could do
some low level access to the device (bypassing the filesystem)

>
>> /*
>>  * Many devices do not respond properly to
>> READ_CAPACITY_16.
>>  * Tell the SCSI layer to try READ_CAPACITY_10 first.
>>  * However some USB 3.0 drive enclosures return capacity
>>  * modulo 2TB. Those must use READ_CAPACITY_16
>>  */
>> if (!(us->fflags & US_FL_NEEDS_CAP16))
>> sdev->try_rc_10_first = 1;
>
>
> if that's the cause, maybe an entry in drivers/usb/storage/unusual_devs.h
> would help, but that's really just guessing as I'm not familiar with USB
>

It seems that the bridge does have an entry in unusual_devs.h:

/* Reported by Michael Büsch  */
UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0116,
"JMicron",
"USB to ATA/ATAPI Bridge",
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_BROKEN_FUA ),

VID:PID is 0x152d 0x0567, not sure what are the other two numbers, so
I went back and used another enclosure with same USB to SATA bridge.
The strange thing is that this other enclosure goes in UAS mode while
the one for which I am reporting the issue goes in usb-storage mode
because it gets somehow the quirks 0x5000
Unfortunately I cannot move these 5 HDDs in the other enclosure. So do
you think that it shall be reported to linux-usb maybe?


Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-07 Thread Steffen Maier


On 03/07/2018 09:24 AM, Menion wrote:

By flooded I mean that it continously fill the dmesg log with no
interruption, check attached a log that I have just taken from my
server
Some more details on my setup. I have these 5 HDD, WD RED 8TB in an
Orico 5 bay enclosure, running JMS567 USBtoSATA bridge and an internal
SATA multiplexer
This is connected to the USB 3.0 host port of my server, it is an Intel Atom



2018-03-07 3:45 GMT+01:00 Martin K. Petersen :

Also, what kind of controller are these disks attached to? The reason
you see these messages is that to the kernel it looks like a legacy disk
device that predates capacities in the TB range. The warnings are logged
because we're surprised to be going down this path based on what the
device has previously told us.


Of course Martin's statement regarding the occurrence holds true.

It does not look like continuously flooding, but rather like a 
repetition at some not even high frequency. Do you have some user space 
periodically performing SCSI target or SCSI device rescans?

Each repetition is per drive, i.e. a junk of 5 messages in your case.


[4.929517] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).


first occurrence after initial probing


[4.933893] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).
[4.946474] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).


looks like we go through the code path more than once during initial probing


[   99.057592] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).



[  409.335119] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).



[  719.760106] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).



[ 1018.089562] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).



[ 1328.086120] sd 0:0:0:0: [sda] Very big device. Trying to use READ 
CAPACITY(16).


...

but from then on, you only get it roughly once every 300 seconds, i.e. 5 
minutes


that's where I suspect user space as trigger, unless there is a kernel 
feature I'm not aware of doing such sdev rescans


preventing this would be a workaround

assuming the Linux check is correct, the proper fix might be that the 
device should present itself according to standards such that Linux 
silently uses READ CAPACITY(16) in the first place



static int sd_try_rc16_first(struct scsi_device *sdp)
{
if (sdp->host->max_cmd_len < 16)
return 0;


option


if (sdp->try_rc_10_first)
return 0;


option


if (sdp->scsi_level > SCSI_SPC_2)
return 1;
if (scsi_device_protection(sdp))
return 1;
return 0;


option


}


just picking one arbitrary option and not being entirely sure that's the 
code path but you mentioned USB to SATA bridge, it might be related to:



*** drivers/usb/storage/scsiglue.c:
slave_configure[239]   sdev->try_rc_10_first = 1;



/*
 * Many devices do not respond properly to READ_CAPACITY_16.
 * Tell the SCSI layer to try READ_CAPACITY_10 first.
 * However some USB 3.0 drive enclosures return capacity
 * modulo 2TB. Those must use READ_CAPACITY_16
 */
if (!(us->fflags & US_FL_NEEDS_CAP16))
sdev->try_rc_10_first = 1;


if that's the cause, maybe an entry in 
drivers/usb/storage/unusual_devs.h would help, but that's really just 
guessing as I'm not familiar with USB


--
Mit freundlichen Grüßen / Kind regards
Steffen Maier

Linux on z Systems Development

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294



Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-07 Thread Menion
Hello Martin
Thanks for your answer.
By flooded I mean that it continously fill the dmesg log with no
interruption, check attached a log that I have just taken from my
server
Some more details on my setup. I have these 5 HDD, WD RED 8TB in an
Orico 5 bay enclosure, running JMS567 USBtoSATA bridge and an internal
SATA multiplexer
This is connected to the USB 3.0 host port of my server, it is an Intel Atom
So in total the array is 5x8TB and it is configured in BTRFS RAID5 mode.
I have already reported this issue to the linux-btrfs mailing list, I
got a feedback that the filesystem itself has nothing to do with this
capacity check, and I should have reported this here.
Bye


2018-03-07 3:45 GMT+01:00 Martin K. Petersen :
>
> Menion,
>
>> Operating big capacity HDD such 8TB with complex filesystems like
>> BTRFS in RAID mode endup in dmesg get flooded by this log, due too
>> many capacity checks (opaque to the filesystem itself)
>
> What's your definition of flooded? How many do you see?
>
> Also, what kind of controller are these disks attached to? The reason
> you see these messages is that to the kernel it looks like a legacy disk
> device that predates capacities in the TB range. The warnings are logged
> because we're surprised to be going down this path based on what the
> device has previously told us.
>
> --
> Martin K. Petersen  Oracle Linux Engineering

 * Documentation:  https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support:https://ubuntu.com/advantage

8 packages can be updated.
0 updates are security updates.


Last login: Mon Mar  5 15:32:57 2018 from 10.8.0.10
menion@Menionubuntu:~$ dmesg
[0.00] Linux version 4.15.5-041505-generic (kernel@gloin) (gcc version 
7.2.0 (Ubuntu 7.2.0-8ubuntu3.2)) #201802221031 SMP Thu Feb 22 15:32:28 UTC 2018
[0.00] Command line: BOOT_IMAGE=/@/boot/vmlinuz-4.15.5-041505-generic 
root=UUID=6db4baf7-fda8-41ac-a6ad-1ca7b083430f ro rootflags=subvol=@
[0.00] KERNEL supported cpus:
[0.00]   Intel GenuineIntel
[0.00]   AMD AuthenticAMD
[0.00]   Centaur CentaurHauls
[0.00] x86/fpu: x87 FPU will use FXSAVE
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0008efff] usable
[0.00] BIOS-e820: [mem 0x0008f000-0x0008] ACPI NVS
[0.00] BIOS-e820: [mem 0x0009-0x0009dfff] usable
[0.00] BIOS-e820: [mem 0x0009e000-0x0009] reserved
[0.00] BIOS-e820: [mem 0x0010-0x1fff] usable
[0.00] BIOS-e820: [mem 0x2000-0x201f] reserved
[0.00] BIOS-e820: [mem 0x2020-0x7b631fff] usable
[0.00] BIOS-e820: [mem 0x7b632000-0x7b661fff] reserved
[0.00] BIOS-e820: [mem 0x7b662000-0x7b685fff] usable
[0.00] BIOS-e820: [mem 0x7b686000-0x7b76bfff] ACPI NVS
[0.00] BIOS-e820: [mem 0x7b76c000-0x7ba20fff] reserved
[0.00] BIOS-e820: [mem 0x7ba21000-0x7ba71fff] type 20
[0.00] BIOS-e820: [mem 0x7ba72000-0x7ba76fff] usable
[0.00] BIOS-e820: [mem 0x7ba77000-0x7ba77fff] reserved
[0.00] BIOS-e820: [mem 0x7ba78000-0x7ba7afff] usable
[0.00] BIOS-e820: [mem 0x7ba7b000-0x7ba7bfff] reserved
[0.00] BIOS-e820: [mem 0x7ba7c000-0x7bff] usable
[0.00] BIOS-e820: [mem 0xe000-0xe3ff] reserved
[0.00] BIOS-e820: [mem 0xfea0-0xfeaf] reserved
[0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved
[0.00] BIOS-e820: [mem 0xfed01000-0xfed01fff] reserved
[0.00] BIOS-e820: [mem 0xfed03000-0xfed03fff] reserved
[0.00] BIOS-e820: [mem 0xfed06000-0xfed06fff] reserved
[0.00] BIOS-e820: [mem 0xfed08000-0xfed09fff] reserved
[0.00] BIOS-e820: [mem 0xfed1c000-0xfed1cfff] reserved
[0.00] BIOS-e820: [mem 0xfed8-0xfedb] reserved
[0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved
[0.00] BIOS-e820: [mem 0xffc0-0x] reserved
[0.00] NX (Execute Disable) protection: active
[0.00] efi: EFI v2.40 by American Megatrends
[0.00] efi:  ESRT=0x7b66  ACPI=0x7b6ca000  ACPI 2.0=0x7b6ca000  
SMBIOS=0x7ba1f198
[0.00] random: fast init done
[0.00] SMBIOS 2.8 present.
[0.00] DMI: AZW Z83 II/Cherry Trail CR, BIOS YB1007 08/17/2017
[0.00] e820: update [mem 0x-0x0fff] usable ==> reserved
[0.00] e820: remove [mem 0x000a-0x000f] usable
[0.00] e820: last_pfn = 0x7c000 max_arch_pfn = 

Re: dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-06 Thread Martin K. Petersen

Menion,

> Operating big capacity HDD such 8TB with complex filesystems like
> BTRFS in RAID mode endup in dmesg get flooded by this log, due too
> many capacity checks (opaque to the filesystem itself)

What's your definition of flooded? How many do you see?

Also, what kind of controller are these disks attached to? The reason
you see these messages is that to the kernel it looks like a legacy disk
device that predates capacities in the TB range. The warnings are logged
because we're surprised to be going down this path based on what the
device has previously told us.

-- 
Martin K. Petersen  Oracle Linux Engineering


dmesg flooded with "Very big device. Trying to use READ CAPACITY(16)"

2018-03-06 Thread Menion
Hi all
Operating big capacity HDD such 8TB with complex filesystems like
BTRFS in RAID mode endup in dmesg get flooded by this log, due too
many capacity checks (opaque to the filesystem itself)
The logs come from here:

https://elixir.bootlin.com/linux/latest/source/drivers/scsi/sd.c#L2508

The general guideline tells that KERN_NOTICE (which is the default log
level for dmesg in most distribution) should report information for
any user interest
I think that this information is not really of user interest, rather
more of DEBUG interest
So my suggestion is to lower this log to KERN_DEBUG
Do you agree?
Bye