Can you try "nvme_core.default_ps_max_latency_us=1500"? This will
disable ps4, which causes lots of troubles.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1737934

Title:
  Samsung SM961 NVMe SSD randomly unmounts/loses connection/unavailable

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  Seems related to these bugs:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
  https://bugs.launchpad.net/ubuntu/+source/linux-signed/+bug/1682704
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1705748

  Problem:
  At seemingly random times my computer (brand new Lenovo Thinkpad T470) seems 
to lose access to the Samsung SM961 256GB SSD drive it has inside. When this 
happens the whole OS freezes up and when I try to power down I see a black 
terminal-like screen that prints the following errors:

  EXT4-fs error (device nvme0n1p2): ext4_find_entry:1431: inode #7471275
  (or #741278): comm gmain (or systemd-journal or ...): reading
  directory iblock 0

  This error seems to be repeated endlessly, though I've only let it go
  for a few minutes. No other errors are printed.

  This is the only drive it has.
  I don't know if this occurs in Windows too since I removed Windows and 
installed Ubuntu immediatly after updating the BIOS.

  Info:
  Distro: Ubuntu MATE 17.10

  sudo uname -r
  4.13.0-19-generic

  sudo nvme get-feature -f 0x0c -H /dev/nvme0 (with latency set to 250)
  get-feature:0xc (Autonomous Power State Transition), Current value:0x000001
        Autonomous Power State Transition Enable (APSTE): Enabled
        Auto PST Entries        .................
        Entry[ 0]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[ 1]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[ 2]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[ 3]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[ 4]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[ 5]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[ 6]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[ 7]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[ 8]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[ 9]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[10]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[11]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[12]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[13]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[14]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[15]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[16]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[17]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[18]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[19]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[20]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[21]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[22]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[23]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[24]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[25]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[26]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[27]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[28]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[29]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[30]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................
        Entry[31]   
        .................
        Idle Time Prior to Transition (ITPT): 0 ms
        Idle Transition Power State   (ITPS): 0
        .................

  sudo smartctl -a /dev/nvme0
  smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.13.0-19-generic] (local build)
  Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

  === START OF INFORMATION SECTION ===
  Model Number:                       SAMSUNG MZVLW256HEHP-000L7
  Serial Number:                      S35ENX0JA13385
  Firmware Version:                   4L7QCXB7
  PCI Vendor/Subsystem ID:            0x144d
  IEEE OUI Identifier:                0x002538
  Total NVM Capacity:                 256.060.514.304 [256 GB]
  Unallocated NVM Capacity:           0
  Controller ID:                      2
  Number of Namespaces:               1
  Namespace 1 Size/Capacity:          256.060.514.304 [256 GB]
  Namespace 1 Utilization:            17.834.708.992 [17,8 GB]
  Namespace 1 Formatted LBA Size:     512
  Local Time is:                      Wed Dec 13 10:52:39 2017 CET
  Firmware Updates (0x16):            3 Slots, no Reset required
  Optional Admin Commands (0x0017):   Security Format Frmw_DL *Other*
  Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
  Warning  Comp. Temp. Threshold:     69 Celsius
  Critical Comp. Temp. Threshold:     72 Celsius

  Supported Power States
  St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
   0 +     7.60W       -        -    0  0  0  0        0       0
   1 +     6.00W       -        -    1  1  1  1        0       0
   2 +     5.10W       -        -    2  2  2  2        0       0
   3 -   0.0400W       -        -    3  3  3  3      210    1500
   4 -   0.0050W       -        -    4  4  4  4     2200    6000

  Supported LBA Sizes (NSID 0x1)
  Id Fmt  Data  Metadt  Rel_Perf
   0 +     512       0         0

  === START OF SMART DATA SECTION ===
  SMART overall-health self-assessment test result: PASSED

  SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
  Critical Warning:                   0x00
  Temperature:                        35 Celsius
  Available Spare:                    100%
  Available Spare Threshold:          10%
  Percentage Used:                    0%
  Data Units Read:                    151.517 [77,5 GB]
  Data Units Written:                 160.733 [82,2 GB]
  Host Read Commands:                 1.874.938
  Host Write Commands:                1.650.810
  Controller Busy Time:               10
  Power Cycles:                       96
  Power On Hours:                     14
  Unsafe Shutdowns:                   78
  Media and Data Integrity Errors:    0
  Error Information Log Entries:      38
  Warning  Comp. Temperature Time:    0
  Critical Comp. Temperature Time:    0
  Temperature Sensor 1:               35 Celsius
  Temperature Sensor 2:               61 Celsius

  Error Information (NVMe Log 0x01, max 64 entries)
  Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
    0         38     0  0x0018  0x4004  0x02c            0     0     -
    1         37     0  0x0017  0x4004  0x02c            0     0     -
    2         36     0  0x0018  0x4004  0x02c            0     0     -
    3         35     0  0x0017  0x4004  0x02c            0     0     -
    4         34     0  0x0018  0x4004  0x02c            0     0     -
    5         33     0  0x0017  0x4004  0x02c            0     0     -
    6         32     0  0x0018  0x4004  0x02c            0     0     -
    7         31     0  0x0017  0x4004  0x02c            0     0     -
    8         30     0  0x0018  0x4004  0x02c            0     0     -
    9         29     0  0x0017  0x4004  0x02c            0     0     -
   10         28     0  0x0018  0x4004  0x02c            0     0     -
   11         27     0  0x0017  0x4004  0x02c            0     0     -
   12         26     0  0x0018  0x4004  0x02c            0     0     -
   13         25     0  0x0017  0x4004  0x02c            0     0     -
   14         24     0  0x0018  0x4004  0x02c            0     0     -
   15         23     0  0x0017  0x4004  0x02c            0     0     -
  ... (22 entries not shown)

  
  lspci -nn
  00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core 
Processor Host Bridge/DRAM Registers [8086:5904] (rev 02)
  00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 620 
[8086:5916] (rev 02)
  00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-LP USB 3.0 
xHCI Controller [8086:9d2f] (rev 21)
  00:14.2 Signal processing controller [1180]: Intel Corporation Sunrise 
Point-LP Thermal subsystem [8086:9d31] (rev 21)
  00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-LP 
CSME HECI #1 [8086:9d3a] (rev 21)
  00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express 
Root Port [8086:9d10] (rev f1)
  00:1c.6 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express 
Root Port #7 [8086:9d16] (rev f1)
  00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-LP PCI Express 
Root Port #9 [8086:9d18] (rev f1)
  00:1d.2 PCI bridge [0604]: Intel Corporation Device [8086:9d1a] (rev f1)
  00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-LP LPC Controller 
[8086:9d58] (rev 21)
  00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-LP PMC 
[8086:9d21] (rev 21)
  00:1f.3 Audio device [0403]: Intel Corporation Device [8086:9d71] (rev 21)
  00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-LP SMBus [8086:9d23] 
(rev 21)
  00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (4) 
I219-V [8086:15d8] (rev 21)
  04:00.0 Network controller [0280]: Intel Corporation Wireless 8265 / 8275 
[8086:24fd] (rev 78)
  3e:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd 
NVMe SSD Controller SM961/PM961 [144d:a804]

  sudo nvme list
  Node             SN                   Model                                   
 Namespace Usage                      Format           FW Rev  
  ---------------- -------------------- 
---------------------------------------- --------- -------------------------- 
---------------- --------
  /dev/nvme0n1     S35ENX0JA13385       SAMSUNG MZVLW256HEHP-000L7              
 1          17,83  GB / 256,06  GB    512   B +  0 B   4L7QCXB7

  I tried looking for kernel errors with dmesg | grep -i nvme and dmesg
  | grep -i EXT4-fs, but nothing of value shows up (only that the drive
  was mounted).

  What have I tried:
  Reading the bug reports mentioned above it seemed that my problem should 
already be fixed since I'm on kernel 4.13.
  Since I still have the problem, I should be able to temporarily fix it by 
setting 
  GRUB_CMDLINE_LINUX_DEFAULT="nvme_core.default_ps_max_latency_us=0"
  and running sudo update-grub.
  This doesn't help however, since I still get random loss of connection to the 
drive. Sometimes this happens after minutes of booting, sometimes after hours, 
but I haven't been able to run stable for a full day.
  I have tried all values for the latency I found in various bug reports 
online, specifically I tried: 0, 250, 5500, 6000 and 11000. It seems to run 
most stable with 250 and least stable with 0, where the error happens 
seconds/minutes after boot.

  I'm at my wits' end here. I could just stick a normal SATA SSD in there but 
then this brand new one would be a waste. Any help would be greatly appreciated.
  If more info is needed I'll do my best to provide it.

  ProblemType: Bug
  DistroRelease: Ubuntu 17.10
  Package: linux-image-4.13.0-19-generic 4.13.0-19.22
  ProcVersionSignature: Ubuntu 4.13.0-19.22-generic 4.13.13
  Uname: Linux 4.13.0-19-generic x86_64
  ApportVersion: 2.20.7-0ubuntu3.5
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  musilitar   1334 F.... pulseaudio
  CurrentDesktop: MATE
  Date: Wed Dec 13 11:11:54 2017
  InstallationDate: Installed on 2017-12-07 (5 days ago)
  InstallationMedia: Ubuntu-MATE 17.10 "Artful Aardvark" - Release amd64 
(20171018)
  MachineType: LENOVO 20HD0001MB
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.13.0-19-generic.efi.signed 
root=UUID=99902074-7315-4905-8a7d-97b65a09bb74 ro quiet splash 
nvme_core.default_ps_max_latency_us=250 vt.handoff=7
  RelatedPackageVersions:
   linux-restricted-modules-4.13.0-19-generic N/A
   linux-backports-modules-4.13.0-19-generic  N/A
   linux-firmware                             1.169.1
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 11/10/2017
  dmi.bios.vendor: LENOVO
  dmi.bios.version: N1QET68W (1.43 )
  dmi.board.asset.tag: Not Available
  dmi.board.name: 20HD0001MB
  dmi.board.vendor: LENOVO
  dmi.board.version: SDK0J40697 WIN
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: None
  dmi.modalias: 
dmi:bvnLENOVO:bvrN1QET68W(1.43):bd11/10/2017:svnLENOVO:pn20HD0001MB:pvrThinkPadT470:rvnLENOVO:rn20HD0001MB:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
  dmi.product.family: ThinkPad T470
  dmi.product.name: 20HD0001MB
  dmi.product.version: ThinkPad T470
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1737934/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to