Re: [EXTERNAL] Re: [gentoo-user] multipath.conf : learning how to use

2019-08-19 Thread Stefan G. Weichinger
On 8/16/19 7:42 PM, Laurence Perkins wrote:

> Note that, in my experience at least, binary distros also tend to
> break hard if you leave them without updates for a year or more and
> then try to bring them current.  And they're usually harder to fix
> when they do. So make sure that whatever you put in is kept up to
> date.
> 
> The stable branch of a good binary distro has a big advantage
> there, especially with a snapshottable filesystem, because the
> updates can be handed off to a trained monkey, leaving the experts
> free for the important stuff.  But it's not a magic bullet.  If you
> don't write good documentation and keep the system up to date
> you'll end up in the exact same position a few years down the road
> no matter what distro you use.

I agree ... there'll be unattended updates automatically and I am
monitoring and using that server on a regular base anyway.

The goal now is to get a well-known state with less maintenance
complexity.

I am currently booking my tickets there ;-)




Re: [EXTERNAL] Re: [gentoo-user] multipath.conf : learning how to use

2019-08-16 Thread Laurence Perkins


On Wed, 2019-08-14 at 20:20 +0200, J. Roeleveld wrote:
> On woensdag 14 augustus 2019 14:17:23 CEST Stefan G. Weichinger
> wrote:
> > Am 14.08.19 um 13:20 schrieb J. Roeleveld:
> > > See next item, make sure you do NOT mount both at the same time.
> > 
> > I understand and agree ;-)
> 
> good :)
> 
> > > > # /usr/bin/sg_vpd --page=di /dev/sdb
> > > > 
> > > > Device Identification VPD page:
> > > >   Addressed logical unit:
> > > > designator type: NAA,  code set: Binary
> > > > 
> > > >   0x600605b00d0ce810217ccffe19f851e8
> > > 
> > > Yes, this one is different.
> > > 
> > > I checked the above ID and it looks like it is already correctly
> > > configured. Is " multipathd " actually running?
> > 
> > no!
> 
> Then "multipath -l" will not show anything either. When you have a
> chance for 
> downtime (and that disk can be umounted) you could try the following:
> 1) stop all services requiring that "disk" to be mounted
> 2) umount that "disk"
> 3) start the "multipath" service
> 4) run "multipath -ll" to see if there is any output
> 
> If yes, you can access the "disk" via the newly added entry under
> "/dev/
> mapper/"
> If you modify "/etc/fstab" for this at the point, ensure multipath is
> started 
> BEFORE the OS tries to mount it during boot.
> 
> Other option (and only option of "multipath -ll" still doesn't show
> anything) 
> is to stop the "multipath" service and leave it all as-is.
> 
> > > If it were running correctly, you would mount " /dev/mapper/
> > > " instead
> > > of " /dev/sdc " or " /dev/sdd ".
> > > 
> > > > In the first week of september I travel there and I have the
> > > > job to
> > > > reinstall that server using Debian Linux (yes, gentoo-users, I
> > > > am
> > > > getting OT here ;-)).
> > > 
> > > For something that doesn't get updated/managed often, Gentoo
> > > might not be
> > > the best choice, I agree.
> > > I would prefer Centos for this one though, as there is far more
> > > info on
> > > multipath from Redhat.
> > 
> > I will consider this ...
> 
> The choice is yours. I just haven't found much info about multipath
> for other 
> distributions. (And I could still use a decent document/guide
> describing all 
> the different options)
> 
> > As I understand things here:
> > 
> > the former admin *tried to* setup multipath and somehow got stuck.
> 
> My guess: multipath wasn't enabled before the boot-proces would try
> to mount 
> it, the following needs to be done (and finished) in sequence for it
> to work:
> 
> 1) The OS needs to detect the disks (/dev/sdc + /dev/sdd). This
> requires 
> modules to be loaded and the fibrechannel disks to be detected
> 
> 2) multipathd needs to be running and correctly identified the
> fibrechannel disk 
> and the paths
> 
> 3) The OS needs to mount the fibrechannel disk using the
> "/dev/mapper/..." 
> entry created by multipath.
> 
> I run ZFS on top of the multipath entries, which makes it all a bit
> "simpler", 
> as the HBA module is built-in and the "zfs"  services depend on
> "multipath".
> All the mounting is done by the zfs services.
> 
> > That's why it isn't running and not used at all. He somehow
> > mentioned
> > this in an email back then when he was still working there.
> > 
> > So currently it seems to me that the storage is attached via
> > "single
> > path" (is that the term here?) only. "directly"= no redundancy
> 
> Exactly, and using non-guaranteed drive-letters. (I know for a fact
> that they 
> can chance as I've had disks move to different letters during
> subsequent boots. 
> I do have 12 disks getting 4 entries each, which means 48 entries ;)
> 
> > That means using the lpfc-kernel-module to run the FibreChannel-
> > adapters
> > ... which failed to come up / sync with a more recent gentoo
> > kernel, as
> > initially mentioned.
> 
> Are these modules not included in the main kernel?
> And maybe they require firmware which, sometimes, requires specific
> versions 
> between module/kernel versions.
> 
> > (right now: 4.1.15-gentoo-r1 ... )
> 
> Old, but if it works, don't fix it. (Just don't expose it to the
> internet)
> 
> > I consider sending a Debian-OS on a SSD there and let the (low
> > expertise) guy there boot from it. (or a stick). Which in fact is
> > risky
> > as he doesn't know anything about linux.
> 
> I wouldn't take that risk on a production server
> 
> > Or I simply wait for my on-site-appointment and start testing when
> > I am
> > there.
> 
> Safest option.
> 
> > Maybe I am lucky and the debian lpfc stuff works from the start.
> > And
> > then I could test multipath as well.
> 
> You could test quickly with the gentoo-install present as described
> above. The 
> config should be the same regardless.
> 
> > I assume that maybe the adapters need a firmware update or so.
> 
> When I added a 2nd HBA to my server, I ended up patching the firmware
> on both 
> to ensure they were identical.
> 
> > The current gentoo installation was done with "hardened" profile,
> > not
> > 

Re: [gentoo-user] multipath.conf : learning how to use

2019-08-14 Thread J. Roeleveld
On woensdag 14 augustus 2019 14:17:23 CEST Stefan G. Weichinger wrote:
> Am 14.08.19 um 13:20 schrieb J. Roeleveld:

> > See next item, make sure you do NOT mount both at the same time.
> 
> I understand and agree ;-)

good :)

> >> # /usr/bin/sg_vpd --page=di /dev/sdb
> >> 
> >> Device Identification VPD page:
> >>   Addressed logical unit:
> >> designator type: NAA,  code set: Binary
> >> 
> >>   0x600605b00d0ce810217ccffe19f851e8
> > 
> > Yes, this one is different.
> > 
> > I checked the above ID and it looks like it is already correctly
> > configured. Is " multipathd " actually running?
> 
> no!

Then "multipath -l" will not show anything either. When you have a chance for 
downtime (and that disk can be umounted) you could try the following:
1) stop all services requiring that "disk" to be mounted
2) umount that "disk"
3) start the "multipath" service
4) run "multipath -ll" to see if there is any output

If yes, you can access the "disk" via the newly added entry under "/dev/
mapper/"
If you modify "/etc/fstab" for this at the point, ensure multipath is started 
BEFORE the OS tries to mount it during boot.

Other option (and only option of "multipath -ll" still doesn't show anything) 
is to stop the "multipath" service and leave it all as-is.

> > If it were running correctly, you would mount " /dev/mapper/ " instead
> > of " /dev/sdc " or " /dev/sdd ".
> > 
> >> In the first week of september I travel there and I have the job to
> >> reinstall that server using Debian Linux (yes, gentoo-users, I am
> >> getting OT here ;-)).
> > 
> > For something that doesn't get updated/managed often, Gentoo might not be
> > the best choice, I agree.
> > I would prefer Centos for this one though, as there is far more info on
> > multipath from Redhat.
> 
> I will consider this ...

The choice is yours. I just haven't found much info about multipath for other 
distributions. (And I could still use a decent document/guide describing all 
the different options)

> As I understand things here:
> 
> the former admin *tried to* setup multipath and somehow got stuck.

My guess: multipath wasn't enabled before the boot-proces would try to mount 
it, the following needs to be done (and finished) in sequence for it to work:

1) The OS needs to detect the disks (/dev/sdc + /dev/sdd). This requires 
modules to be loaded and the fibrechannel disks to be detected

2) multipathd needs to be running and correctly identified the fibrechannel 
disk 
and the paths

3) The OS needs to mount the fibrechannel disk using the "/dev/mapper/..." 
entry created by multipath.

I run ZFS on top of the multipath entries, which makes it all a bit "simpler", 
as the HBA module is built-in and the "zfs"  services depend on "multipath".
All the mounting is done by the zfs services.

> That's why it isn't running and not used at all. He somehow mentioned
> this in an email back then when he was still working there.
> 
> So currently it seems to me that the storage is attached via "single
> path" (is that the term here?) only. "directly"= no redundancy

Exactly, and using non-guaranteed drive-letters. (I know for a fact that they 
can chance as I've had disks move to different letters during subsequent boots. 
I do have 12 disks getting 4 entries each, which means 48 entries ;)

> That means using the lpfc-kernel-module to run the FibreChannel-adapters
> ... which failed to come up / sync with a more recent gentoo kernel, as
> initially mentioned.

Are these modules not included in the main kernel?
And maybe they require firmware which, sometimes, requires specific versions 
between module/kernel versions.

> (right now: 4.1.15-gentoo-r1 ... )

Old, but if it works, don't fix it. (Just don't expose it to the internet)

> I consider sending a Debian-OS on a SSD there and let the (low
> expertise) guy there boot from it. (or a stick). Which in fact is risky
> as he doesn't know anything about linux.

I wouldn't take that risk on a production server

> Or I simply wait for my on-site-appointment and start testing when I am
> there.

Safest option.

> Maybe I am lucky and the debian lpfc stuff works from the start. And
> then I could test multipath as well.

You could test quickly with the gentoo-install present as described above. The 
config should be the same regardless.

> I assume that maybe the adapters need a firmware update or so.

When I added a 2nd HBA to my server, I ended up patching the firmware on both 
to ensure they were identical.

> The current gentoo installation was done with "hardened" profile, not
> touched for years, no docs  so it somehow seems way too much hassle
> to get it up to date again.

I migrated a few "hardened" profile installations to non-hardened, but it 
required preparing binary packages on a VM and reinstalling the whole lot with 
a lot of effort. (empty /var/lib/portage/world, run emerge --depclean, do 
@system with --empty and than re-populate /var/lib/portage/world and let that 
be 

Re: [gentoo-user] multipath.conf : learning how to use

2019-08-14 Thread Stefan G. Weichinger
Am 14.08.19 um 13:20 schrieb J. Roeleveld:

> If there is no documentation, it is a mess by definition.

yes :-)

 I see two devices sdc and sdd that should come from the SAN.
>>>
>>> Interesting, are these supposed to be the same?
>>
>> No, I don't think so. But maybe you'r right. No sdd in fstab or in the
>> mounts at all and ...
> 
> See next item, make sure you do NOT mount both at the same time.

I understand and agree ;-)
>> # /usr/bin/sg_vpd --page=di /dev/sdb
>> Device Identification VPD page:
>>   Addressed logical unit:
>> designator type: NAA,  code set: Binary
>>   0x600605b00d0ce810217ccffe19f851e8
> 
> Yes, this one is different.
> 
> I checked the above ID and it looks like it is already correctly configured.
> Is " multipathd " actually running?

no!

> If it were running correctly, you would mount " /dev/mapper/ " instead of 
> " /dev/sdc " or " /dev/sdd ".
> 
>> In the first week of september I travel there and I have the job to
>> reinstall that server using Debian Linux (yes, gentoo-users, I am
>> getting OT here ;-)).
> 
> For something that doesn't get updated/managed often, Gentoo might not be the 
> best choice, I agree.
> I would prefer Centos for this one though, as there is far more info on 
> multipath from Redhat.

I will consider this ...

As I understand things here:

the former admin *tried to* setup multipath and somehow got stuck.
That's why it isn't running and not used at all. He somehow mentioned
this in an email back then when he was still working there.

So currently it seems to me that the storage is attached via "single
path" (is that the term here?) only. "directly"= no redundancy

That means using the lpfc-kernel-module to run the FibreChannel-adapters
... which failed to come up / sync with a more recent gentoo kernel, as
initially mentioned.

(right now: 4.1.15-gentoo-r1 ... )

I consider sending a Debian-OS on a SSD there and let the (low
expertise) guy there boot from it. (or a stick). Which in fact is risky
as he doesn't know anything about linux.

Or I simply wait for my on-site-appointment and start testing when I am
there.

Maybe I am lucky and the debian lpfc stuff works from the start. And
then I could test multipath as well.

I assume that maybe the adapters need a firmware update or so.

-

The current gentoo installation was done with "hardened" profile, not
touched for years, no docs  so it somehow seems way too much hassle
to get it up to date again. Additionally no experts on site there, so it
should be low maintenance anyway.



Re: [gentoo-user] multipath.conf : learning how to use

2019-08-14 Thread J. Roeleveld
On woensdag 14 augustus 2019 10:14:31 CEST Stefan G. Weichinger wrote:
> Am 14.08.19 um 08:36 schrieb J. Roeleveld:
> > Stefan,
> > 
> > On maandag 29 juli 2019 21:28:50 CEST Stefan G. Weichinger wrote:
> >> At a customer I have to check through an older gentoo server.
> >> 
> >> The former admin is not available anymore and among other things I have
> >> to check how the SAN storage is attached.
> > 
> > If you ever encounter that admin, make sure you hide the body :)
> 
> I only learn day by day what a mess all that is ...

If there is no documentation, it is a mess by definition.

> >> I see two devices sdc and sdd that should come from the SAN.
> > 
> > Interesting, are these supposed to be the same?
> 
> No, I don't think so. But maybe you'r right. No sdd in fstab or in the
> mounts at all and ...

See next item, make sure you do NOT mount both at the same time.

> > what do you get back from:
> > 
> > # /usr/bin/sg_vpd --page=di /dev/sdc
> > # /usr/bin/sg_vpd --page=di /dev/sdd
> > (As suggested in the multipathd.conf file you listed above)
> 
> these 2 look similar to me:
> 
> samba ~ # /usr/bin/sg_vpd --page=di /dev/sdc
> Device Identification VPD page:
>   Addressed logical unit:
> designator type: NAA,  code set: Binary
>   0x600c0ff0001e91b2c1bae2560100

> samba ~ # /usr/bin/sg_vpd --page=di /dev/sdd
> Device Identification VPD page:
>   Addressed logical unit:
> designator type: NAA,  code set: Binary
>   0x600c0ff0001e91b2c1bae2560100

> 
> > If "sdc" and "sdd" are the same disk, the "Adressed logical unit" id
> > should be the same for both.
> 
> So we have 0x600c0ff0001e91b2c1bae2560100 twice, right?

Yes, IOW, the same device/LUN/whatever.
Do NOT mount both at the same time EVER

> There's a /dev/sdb as well, commented in fstab with "9750-8i Raid6
> (4x3TB + 2x4TB)" but that seems another "branch" of devices:
> 
> # /usr/bin/sg_vpd --page=di /dev/sdb
> Device Identification VPD page:
>   Addressed logical unit:
> designator type: NAA,  code set: Binary
>   0x600605b00d0ce810217ccffe19f851e8

Yes, this one is different.

I checked the above ID and it looks like it is already correctly configured.
Is " multipathd " actually running?

If it were running correctly, you would mount " /dev/mapper/ " instead of 
" /dev/sdc " or " /dev/sdd ".

> In the first week of september I travel there and I have the job to
> reinstall that server using Debian Linux (yes, gentoo-users, I am
> getting OT here ;-)).

For something that doesn't get updated/managed often, Gentoo might not be the 
best choice, I agree.
I would prefer Centos for this one though, as there is far more info on 
multipath from Redhat.





Re: [gentoo-user] multipath.conf : learning how to use

2019-08-14 Thread Stefan G. Weichinger
Am 14.08.19 um 08:36 schrieb J. Roeleveld:
> Stefan,
> 
> 
> On maandag 29 juli 2019 21:28:50 CEST Stefan G. Weichinger wrote:
>> At a customer I have to check through an older gentoo server.
>>
>> The former admin is not available anymore and among other things I have
>> to check how the SAN storage is attached.
> 
> If you ever encounter that admin, make sure you hide the body :)

I only learn day by day what a mess all that is ...

>> I see two devices sdc and sdd that should come from the SAN.
> 
> Interesting, are these supposed to be the same? 

No, I don't think so. But maybe you'r right. No sdd in fstab or in the
mounts at all and ...

> what do you get back from:
> 
> # /usr/bin/sg_vpd --page=di /dev/sdc
> # /usr/bin/sg_vpd --page=di /dev/sdd
> (As suggested in the multipathd.conf file you listed above)

these 2 look similar to me:

samba ~ # /usr/bin/sg_vpd --page=di /dev/sdc
Device Identification VPD page:
  Addressed logical unit:
designator type: NAA,  code set: Binary
  0x600c0ff0001e91b2c1bae2560100
designator type: vendor specific [0x0],  code set: Binary
  vendor specific:
 00 11 32 36 37 32 31 36 00  00 c0 ff 1e 91 b2 00 00.267216.
 10 c0 a8 64 12 00 c0 ff 1e  8d 38 00 00 c0 a8 64 11..d..8d.
  Target port:
designator type: Relative target port,  code set: Binary
  Relative target port: 0x3
designator type: Target port group,  code set: Binary
  Target port group: 0x0
designator type: NAA,  code set: Binary
 transport: Fibre Channel Protocol for SCSI (FCP-4)
  0x227000c0ff267216
  Target device that contains addressed lu:
designator type: NAA,  code set: Binary
 transport: Fibre Channel Protocol for SCSI (FCP-4)
  0x208000c0ff267216
samba ~ # /usr/bin/sg_vpd --page=di /dev/sdd
Device Identification VPD page:
  Addressed logical unit:
designator type: NAA,  code set: Binary
  0x600c0ff0001e91b2c1bae2560100
designator type: vendor specific [0x0],  code set: Binary
  vendor specific:
 00 11 32 36 37 32 31 36 00  00 c0 ff 1e 8d 38 00 00.267216..8..
 10 c0 a8 64 11 00 c0 ff 1e  91 b2 00 00 c0 a8 64 12..d...d.
  Target port:
designator type: Relative target port,  code set: Binary
  Relative target port: 0x7
designator type: Target port group,  code set: Binary
  Target port group: 0x1
designator type: NAA,  code set: Binary
 transport: Fibre Channel Protocol for SCSI (FCP-4)
  0x267000c0ff267216
  Target device that contains addressed lu:
designator type: NAA,  code set: Binary
 transport: Fibre Channel Protocol for SCSI (FCP-4)
  0x208000c0ff267216

> If "sdc" and "sdd" are the same disk, the "Adressed logical unit" id should 
> be 
> the same for both.

So we have 0x600c0ff0001e91b2c1bae2560100 twice, right?

There's a /dev/sdb as well, commented in fstab with "9750-8i Raid6
(4x3TB + 2x4TB)" but that seems another "branch" of devices:

# /usr/bin/sg_vpd --page=di /dev/sdb
Device Identification VPD page:
  Addressed logical unit:
designator type: NAA,  code set: Binary
  0x600605b00d0ce810217ccffe19f851e8

-

In the first week of september I travel there and I have the job to
reinstall that server using Debian Linux (yes, gentoo-users, I am
getting OT here ;-)).




Re: [gentoo-user] multipath.conf : learning how to use

2019-08-14 Thread J. Roeleveld
Stefan,


On maandag 29 juli 2019 21:28:50 CEST Stefan G. Weichinger wrote:
> At a customer I have to check through an older gentoo server.
> 
> The former admin is not available anymore and among other things I have
> to check how the SAN storage is attached.

If you ever encounter that admin, make sure you hide the body :)

> As I have to plan a new installation with minimal downtime I'd like to
> understand that multipath-stuff before any changes ;-)

I use multipath, but only using internal HBAs connected to internal 
backplanes.

> The system runs:
> 
> sys-fs/multipath-tools-0.5.0-r1

I use "sys-fs/multipath-tools-0.7.9"

> and has a multipath.conf:

Same here

> (rm-ed comments)
> 
> defaults {
> #  udev_dir/dev
>   polling_interval15
> #  selector"round-robin 0"
>   path_grouping_policygroup_by_prio
>   failback5
>   path_checkertur
> #  prio_callout"/sbin/mpath_prio_tpc /dev/%n"
>   rr_min_io   100
>   rr_weight   uniform
>   no_path_retry   queue
>   user_friendly_names yes
> 
> }
> blacklist {
>   devnode cciss
>   devnode fd
>   devnode hd
>   devnode md
>   devnode sr
>   devnode scd
>   devnode st
>   devnode ram
>   devnode raw
>   devnode loop
>   devnode sda
>   devnode sdb
> }
> 
> multipaths {
>   multipath {
> wwid  3600c0ff0001e91b2c1bae2560100
> ## To find your wwid, please use /usr/bin/sg_vpd --page=di /dev/DEVICE.
> ## The address will be a 0x6. Remove the 0x and replace it with 3.
> alias MSA2040_SAMBA_storage
>   }
> }

This looks like a default one. Mine is far simpler:
***
defaults {
path_grouping_policymultibus
path_selector   "queue-length 0"
rr_min_io_rq100
}
***

Do you have any files in "/etc/multipath"?  I have 2:
"bindings" (which only contains comments)
"wwids" (which, aside from comments, shows the IDs from the harddrives.

Both of these files mention they are automatically maintained.

I don't "hide" devices from multipath and let it figure it out by itself

> "multipath -l" and "-ll" show nothing.

Then multipath is NOT working, I get the following (only showing first 2 
devices):

***
35000cca25d8ec910 dm-4 HGST,HUS726040ALS210
size=3.6T features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 0:0:20:0 sdt  65:48  active ready running
  |- 0:0:7:0  sdh  8:112  active ready running
  |- 1:0:7:0  sdaf 65:240 active ready running
  `- 1:0:20:0 sdar 66:176 active ready running
35000cca25d8b5e78 dm-7 HGST,HUS726040ALS210
size=3.6T features='0' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  |- 0:0:21:0 sdu  65:64  active ready running
  |- 0:0:8:0  sdi  8:128  active ready running
  |- 1:0:8:0  sdag 66:0   active ready running
  `- 1:0:21:0 sdas 66:192 active ready running
***

As per the above, every physical disk is seen 4 times by the system.
I have 2 HBAs connected to backplanes and as these are SAS-drives, every disk 
is connected twice to the backplanes.
In other words, I have 4 different paths to get to every single disk.

> dmesg:
> 
> # dmesg | grep multi
> [1.144947] md: multipath personality registered for level -4
> [1.145679] device-mapper: multipath: version 1.9.0 loaded
> [1.145857] device-mapper: multipath round-robin: version 1.0.0 loaded
> [21827451.284100] device-mapper: table: 253:0: multipath: unknown path
> selector type
> [21827451.285432] device-mapper: table: 253:0: multipath: unknown path
> selector type
> [21827496.130239] device-mapper: table: 253:0: multipath: unknown path
> selector type
> [21827496.131379] device-mapper: table: 253:0: multipath: unknown path
> selector type
> [21827497.576482] device-mapper: table: 253:0: multipath: unknown path
> selector type
> 
> -
> 
> I see two devices sdc and sdd that should come from the SAN.

Interesting, are these supposed to be the same? 
what do you get back from:

# /usr/bin/sg_vpd --page=di /dev/sdc
# /usr/bin/sg_vpd --page=di /dev/sdd
(As suggested in the multipathd.conf file you listed above)

On my system I get the following for "sdt: and "sdh" (first disk listed in 
above multipath output):
***
# /usr/bin/sg_vpd --page=di /dev/sdt
Device Identification VPD page:
  Addressed logical unit:
designator type: NAA,  code set: Binary
  0x5000cca25d8ec910
  Target port:
designator type: NAA,  code set: Binary
 transport: Serial Attached SCSI Protocol (SPL-4)
  0x5000cca25d8ec911
designator type: Relative target port,  code set: Binary
 transport: Serial Attached SCSI Protocol (SPL-4)
  Relative target port: 0x1
  Target device that contains addressed lu:
designator type: NAA,  code set: Binary
 transport: Serial Attached SCSI Protocol (SPL-4)
  0x5000cca25d8ec913
designator type: SCSI name string,  code set: UTF-8
  SCSI name string:
  naa.5000CCA25D8EC913
# /usr/bin/sg_vpd --page=di /dev/sdh
Device