Public bug reported:

I have a system containing two identical nvme devices.  When booting a
trusty PXE image with kernel 4.4.0-38-generic both devices are detected
and available:

# nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid     : 0x8086
ssvid   : 0x8086
sn      : BTHH82250N1X1P0E    
mn      : INTEL SSDPEKKF010T8L
fr      : L08P    
...

# nvme id-ctrl /dev/nvme1
NVME Identify Controller:
vid     : 0x8086
ssvid   : 0x8086
sn      : BTHH82250N261P0E    
mn      : INTEL SSDPEKKF010T8L
fr      : L08P    
...


# dmesg | grep nvme
[    5.106516]  nvme0n1: p1 p2 p3 p4
[    5.106615]  nvme1n1: p1 p2


After booting a bionic PXE image based on 4.15.0-38-generic only the
first nvme device is enabled, the second is detected but disabled as
both devices have the same nqn:

nvme nvme1: ignoring ctrl due to duplicate subnqn 
(nqn.2017-12.org.nvmeexpress:uuid:11111111-2222-3333-4444-555555555555).
nvme nvme1: Removing after probe failure status: -22


The nqn string is found in the device firmware rather than being generated by 
Linux but there does not seem to be an operation in nvme-cli to change this.  
(It is also questionable if the device firmware value is correct according to 
section 7.9 of 
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_3a-20171024_ratified.pdf.
  My reading of the specification is that the string should start 
nqn.2014-08.org.nvmeexpress:uuid: with a random UUID, and I assume a random 
UUID per device.)

The Windows 10 installation provided on the system did not have any
problems operating with both devices.

Looking at the kernel nvme driver history suggests that in 4.4 it didn't
care or validate the nqn but now it does there is a problem.

Our typical installation is a zpool mirror across two devices and this
is preventing us moving from trusty to bionic.

This is a report of a similar issue:
https://ask.fedoraproject.org/en/question/128422/one-of-two-identical-m2
-nvme-drives-disabling-due-to-same-nqn/

It may be worth noting that if the nvme device does not provide an nqn
then it seems one is generated based on the device serial number so a
system with two Samsung MZVLB256HAHQ devices works fine.

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Incomplete


** Tags: xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1803692

Title:
  bionic 4.15 nvme regression from trusty 4.4 with two identical devices

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  I have a system containing two identical nvme devices.  When booting a
  trusty PXE image with kernel 4.4.0-38-generic both devices are
  detected and available:

  # nvme id-ctrl /dev/nvme0
  NVME Identify Controller:
  vid     : 0x8086
  ssvid   : 0x8086
  sn      : BTHH82250N1X1P0E    
  mn      : INTEL SSDPEKKF010T8L
  fr      : L08P    
  ...

  # nvme id-ctrl /dev/nvme1
  NVME Identify Controller:
  vid     : 0x8086
  ssvid   : 0x8086
  sn      : BTHH82250N261P0E    
  mn      : INTEL SSDPEKKF010T8L
  fr      : L08P    
  ...

  
  # dmesg | grep nvme
  [    5.106516]  nvme0n1: p1 p2 p3 p4
  [    5.106615]  nvme1n1: p1 p2


  After booting a bionic PXE image based on 4.15.0-38-generic only the
  first nvme device is enabled, the second is detected but disabled as
  both devices have the same nqn:

  nvme nvme1: ignoring ctrl due to duplicate subnqn 
(nqn.2017-12.org.nvmeexpress:uuid:11111111-2222-3333-4444-555555555555).
  nvme nvme1: Removing after probe failure status: -22

  
  The nqn string is found in the device firmware rather than being generated by 
Linux but there does not seem to be an operation in nvme-cli to change this.  
(It is also questionable if the device firmware value is correct according to 
section 7.9 of 
https://nvmexpress.org/wp-content/uploads/NVM-Express-1_3a-20171024_ratified.pdf.
  My reading of the specification is that the string should start 
nqn.2014-08.org.nvmeexpress:uuid: with a random UUID, and I assume a random 
UUID per device.)

  The Windows 10 installation provided on the system did not have any
  problems operating with both devices.

  Looking at the kernel nvme driver history suggests that in 4.4 it
  didn't care or validate the nqn but now it does there is a problem.

  Our typical installation is a zpool mirror across two devices and this
  is preventing us moving from trusty to bionic.

  This is a report of a similar issue:
  https://ask.fedoraproject.org/en/question/128422/one-of-two-
  identical-m2-nvme-drives-disabling-due-to-same-nqn/

  It may be worth noting that if the nvme device does not provide an nqn
  then it seems one is generated based on the device serial number so a
  system with two Samsung MZVLB256HAHQ devices works fine.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1803692/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to