Dear All,
one of our OmniOS ZFS Servers crashes with a kernel panic whenever users
try to open certain excel files via SMB.
This happens only with certain Excel files (old "xls" format, written by
a Fortran program). Other Excel files on the same filesystem can be
opened without any problem.
I know this is a very special problem and it can be worked around by
copying the file to the local windows computer and opening it there. But
nevertheless: No error - I mean: REALLY NO ERROR - when opening a file
should lead to a kernel panic! There must be some bug in the ZFS code.
The crashes were only reproducible with Excel 2019 not with 2016, which
just causes hangs and remaining excel background processes when you close the
start screen. LibreOffice opens the file without problems. If the file
was opened by LibreOffice and saved in the new Excel format it can be
opened from the ZFS server without a problem. So this is a really
special case but nevertheless a kernel panic just from opening a file...
It happens on another system where I copied the file too, so it should
not be a problem of the file structure on disk. The first system runs
OmniOS r151048o the other r151048m. On our newest system running
r151048t I don't want to test it, because it's the home dir server for
all staff members.
I suspect two possible reasons: the first is that it has to do with the
way the file is locked (probably the newer Excel version is doing
something else than the old) and the second is, that because of a
probably wrong format of the .xls file (directly written by a Fortran
program), there are read accesses to regions of the file that don't exist.
The fact that the old Excel version just hangs when opening the file increases
the probability of the first reason. The latter reason should probably
happen on other incomplete files too, so that the panic would probably
have happened more often than only on our system with that special kind
of files...
Here is the output of fmadm fault for the errors:
--------------- ------------------------------------ --------------
---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ --------------
---------
Apr. 04 12:51:59 1efff73a-91c8-4112-91b3-edb2366d8167 SUNOS-8000-KL Major
Host : itsm-zfs-serv4
Platform : SSG-5049P-E1CTR36L Chassis_id : A290024X1B04425
Product_sn :
Fault class : defect.sunos.kernel.panic
Affects : sw:///:path=/var/crash//.1efff73a-91c8-4112-91b3-edb2366d8167
faulted but still in service
Problem in : sw:///:path=/var/crash//.1efff73a-91c8-4112-91b3-edb2366d8167
faulted but still in service
Description : The system has rebooted after a kernel panic. Refer to
http://illumos.org/msg/SUNOS-8000-KL for more information.
Response : The failed system image was dumped to the dump device. If
savecore is enabled (see dumpadm(8)) a copy of the dump
will be
written to the savecore directory */var/crash/*.
Impact : There may be some performance impact while the panic is
copied to
the savecore directory. Disk space usage by panics can be
substantial.
Action : If savecore is not enabled then please take steps to
preserve the
crash image.
Use 'fmdump -Vp -u 1efff73a-91c8-4112-91b3-edb2366d8167'
to view
more panic detail. Please refer to the knowledge article for
additional information.
The output of fmdump -Vp -u 1efff73a-91c8-4112-91b3-edb2366d8167:
TIME UUID SUNW-MSG-ID
Apr. 04 2024 12:51:59.483238000 1efff73a-91c8-4112-91b3-edb2366d8167
SUNOS-8000-KL
TIME CLASS ENA
Apr. 04 12:51:59.4484 ireport.os.sunos.panic.dump_available
0x0000000000000000
Apr. 04 12:49:27.2688 ireport.os.sunos.panic.dump_pending_on_device
0x0000000000000000
nvlist version: 0
version = 0x0
class = list.suspect
uuid = 1efff73a-91c8-4112-91b3-edb2366d8167
code = SUNOS-8000-KL
diag-time = 1712227919 451418
de = fmd:///module/software-diagnosis
fault-list-sz = 0x1
fault-list = (array of embedded nvlists)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = defect.sunos.kernel.panic
certainty = 0x64
asru =
sw:///:path=/var/crash//.1efff73a-91c8-4112-91b3-edb2366d8167
resource =
sw:///:path=/var/crash//.1efff73a-91c8-4112-91b3-edb2366d8167
savecore-succcess = 1
dump-dir = */var/crash/*
dump-files = vmdump.14
os-instance-uuid = 1efff73a-91c8-4112-91b3-edb2366d8167
panicstr = BAD TRAP: type=e (#pf Page fault)
rp=fffffe00f85c9700 addr=4 occurred in module "zfs" due to a NULL
pointer dereference
panicstack = unix:die+c0 () | unix:trap+999 () |
unix:cmntrap+e9 () | zfs:dmu_xuio_cnt+10 () | zfs:zfs_retzcbuf+16 () |
genunix:vhead_retzcbuf+b9 () | genunix:fop_retzcbuf+71 () |
smbsrv:smb_vop_retzcbuf+20 () | smbsrv:smb_fsop_retzcbuf+20 () |
smbsrv:smb_xuio_free+70 () | smbsrv:smb2_read+55c () |
smbsrv:smb2sr_work+38f () | smbsrv:smb2_tq_work+7a () |
genunix:taskq_d_thread+1ac () | unix:thread_start+b () |
crashtime = 1712226957
panic-time = Thu Apr 4 12:35:57 2024 CEST
(end fault-list[0])
fault-status = 0x1
severity = Major
__ttl = 0x1
__tod = 0x660e864f 0x1ccda070
The call stack is the same on both machines.
Regards
Michael
------------------------------------------
illumos: illumos-discuss
Permalink:
https://illumos.topicbox.com/groups/discuss/Tb762378066635820-M2691ee616d0a53322c91acfa
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription