Dear All, 
 
one of our OmniOS ZFS Servers crashes with a kernel panic whenever users 
try to open certain excel files via SMB. 
 
This happens only with certain Excel files (old "xls" format, written by 
a Fortran program). Other Excel files on the same filesystem can be 
opened without any problem. 
 
I know this is a very special problem and it can be worked around by 
copying the file to the local windows computer and opening it there. But 
nevertheless: No error - I mean: REALLY NO ERROR - when opening a file 
should lead to a kernel panic! There must be some bug in the ZFS code. 
 
The crashes were only reproducible with Excel 2019 not with 2016, which 
just causes hangs and remaining excel background processes when you close the 
start screen. LibreOffice opens the file without problems. If the file 
was opened by LibreOffice and saved in the new Excel format it can be 
opened from the ZFS server without a problem. So this is a really 
special case but nevertheless a kernel panic just from opening a file... 
 
It happens on another system where I copied the file too, so it should 
not be a problem of the file structure on disk. The first system runs 
OmniOS r151048o the other r151048m. On our newest system running 
r151048t I don't want to test it, because it's the home dir server for 
all staff members. 
 
I suspect two possible reasons: the first is that it has to do with the 
way the file is locked (probably the newer Excel version is doing 
something else than the old) and the second is, that because of a 
probably wrong format of the .xls file (directly written by a Fortran 
program), there are read accesses to regions of the file that don't exist. 
 
The fact that the old Excel version just hangs when opening the file increases 
the probability of the first reason. The latter reason should probably 
happen on other incomplete files too, so that the panic would probably 
have happened more often than only on our system with that special kind 
of files... 
 
Here is the output of fmadm fault for the errors: 
 
--------------- ------------------------------------  -------------- 
--------- 
TIME            EVENT-ID MSG-ID         SEVERITY 
--------------- ------------------------------------ -------------- 
--------- 
Apr. 04 12:51:59 1efff73a-91c8-4112-91b3-edb2366d8167 SUNOS-8000-KL  Major 
 
Host        : itsm-zfs-serv4 
Platform    : SSG-5049P-E1CTR36L    Chassis_id  : A290024X1B04425 
Product_sn  : 
 
Fault class : defect.sunos.kernel.panic 
Affects     : sw:///:path=/var/crash//.1efff73a-91c8-4112-91b3-edb2366d8167 
                  faulted but still in service 
Problem in  : sw:///:path=/var/crash//.1efff73a-91c8-4112-91b3-edb2366d8167 
                  faulted but still in service 
 
Description : The system has rebooted after a kernel panic. Refer to 
              http://illumos.org/msg/SUNOS-8000-KL for more information. 
 
Response    : The failed system image was dumped to the dump device.  If 
              savecore is enabled (see dumpadm(8)) a copy of the dump 
will be 
              written to the savecore directory */var/crash/*. 
 
Impact      : There may be some performance impact while the panic is 
copied to 
              the savecore directory.  Disk space usage by panics can be 
              substantial. 
 
Action      : If savecore is not enabled then please take steps to 
preserve the 
              crash image. 
              Use 'fmdump -Vp -u 1efff73a-91c8-4112-91b3-edb2366d8167' 
to view 
              more panic detail.  Please refer to the knowledge article for 
              additional information. 
 
The output of fmdump -Vp -u 1efff73a-91c8-4112-91b3-edb2366d8167: 
 
TIME UUID                                 SUNW-MSG-ID 
Apr. 04 2024 12:51:59.483238000 1efff73a-91c8-4112-91b3-edb2366d8167 
SUNOS-8000-KL 
 
  TIME                 CLASS                                 ENA 
  Apr. 04 12:51:59.4484 ireport.os.sunos.panic.dump_available 
0x0000000000000000 
  Apr. 04 12:49:27.2688 ireport.os.sunos.panic.dump_pending_on_device 
0x0000000000000000 
 
nvlist version: 0 
    version = 0x0 
    class = list.suspect 
    uuid = 1efff73a-91c8-4112-91b3-edb2366d8167 
    code = SUNOS-8000-KL 
    diag-time = 1712227919 451418 
    de = fmd:///module/software-diagnosis 
    fault-list-sz = 0x1 
    fault-list = (array of embedded nvlists) 
    (start fault-list[0]) 
    nvlist version: 0 
        version = 0x0 
        class = defect.sunos.kernel.panic 
        certainty = 0x64 
        asru = 
sw:///:path=/var/crash//.1efff73a-91c8-4112-91b3-edb2366d8167 
        resource = 
sw:///:path=/var/crash//.1efff73a-91c8-4112-91b3-edb2366d8167 
        savecore-succcess = 1 
        dump-dir = */var/crash/* 
        dump-files = vmdump.14 
        os-instance-uuid = 1efff73a-91c8-4112-91b3-edb2366d8167 
        panicstr = BAD TRAP: type=e (#pf Page fault) 
rp=fffffe00f85c9700 addr=4 occurred in module "zfs" due to a NULL 
pointer dereference 
        panicstack = unix:die+c0 () | unix:trap+999 () | 
unix:cmntrap+e9 () | zfs:dmu_xuio_cnt+10 () | zfs:zfs_retzcbuf+16 () | 
genunix:vhead_retzcbuf+b9 () | genunix:fop_retzcbuf+71 () | 
smbsrv:smb_vop_retzcbuf+20 () | smbsrv:smb_fsop_retzcbuf+20 () | 
smbsrv:smb_xuio_free+70 () | smbsrv:smb2_read+55c () | 
smbsrv:smb2sr_work+38f () | smbsrv:smb2_tq_work+7a () | 
genunix:taskq_d_thread+1ac () | unix:thread_start+b () | 
        crashtime = 1712226957 
        panic-time = Thu Apr  4 12:35:57 2024 CEST 
    (end fault-list[0]) 
 
    fault-status = 0x1 
    severity = Major 
    __ttl = 0x1 
    __tod = 0x660e864f 0x1ccda070 
 
The call stack is the same on both machines. 
 
Regards 
 
Michael
------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/Tb762378066635820-M2691ee616d0a53322c91acfa
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to