Hi

After our upgrade to 1.6.7.2 (from 1.4.12) we started to get 
"ldiskfs_get_inode_block: bad inode number: 1" errors. This causes the ldiskfs 
filesystem to remount read only making the whole lustre filesystem read only.  

We took the filesytem offline and ran fsck on them. Some minor errors was found 
and fixxed, but the error persists. Any clue?


Regards
Martin Budsjö

Example of the error:

Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28): 
ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: Remounting filesystem read-only
Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28): 
ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: LustreError: 
1316:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) error starting handle for 
op 4 (35 credits): rc -30
Nov  4 03:06:32 hlc305 kernel: LustreError: 
1316:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) Skipped 52 previous 
similar messages
Nov  4 03:06:32 hlc305 kernel: LustreError: 
1316:0:(mds_open.c:769:mds_finish_open()) mds_create_objects: rc = -30
Nov  4 03:06:32 hlc305 kernel: LustreError: 
1316:0:(mds_open.c:769:mds_finish_open()) Skipped 1 previous similar message
Nov  4 03:06:32 hlc305 kernel: LustreError: 
1316:0:(mds_reint.c:154:mds_finish_transno()) fsfilt_start: -30
Nov  4 03:06:32 hlc305 kernel: LustreError: 
1316:0:(mds_reint.c:154:mds_finish_transno()) Skipped 52 previous similar 
messages
Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28): 
ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: LDISKFS-fs error (device dm-28): 
ldiskfs_get_inode_block: bad inode number: 1
Nov  4 03:06:32 hlc305 kernel: LustreError: 
1189:0:(mds_open.c:769:mds_finish_open()) mds_create_objects: rc = -30
Nov  4 03:07:14 hlc305 kernel: Lustre: Failing over spehome-MDT0000
Nov  4 03:07:14 hlc305 kernel: Lustre: Skipped 12 previous similar messages
Nov  4 03:07:14 hlc305 kernel: Lustre: *** setting obd spehome-MDT0000 device 
'dm-28' read-only ***
Nov  4 03:07:14 hlc305 kernel: Turning device dm-28 (0xfd0001c) read-only
Nov  4 03:07:15 hlc305 kernel: LustreError: 
1243:0:(handler.c:1601:mds_handle()) operation 101 on unconnected MDS from 
12345-10.0.0...@tcp
Nov  4 03:07:15 hlc305 kernel: LustreError: 
1243:0:(handler.c:1601:mds_handle()) Skipped 74 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 
1243:0:(ldlm_lib.c:1643:target_send_reply_msg()) @@@ processing error (-107)  
r...@00000100cabd0800 x30168375/t0 o101-><?>@<?>:
0/0 lens 440/0 e 0 to 0 dl 1257300535 ref 1 fl Interpret:/0/0 rc -107/0
Nov  4 03:07:15 hlc305 kernel: LustreError: 
1243:0:(ldlm_lib.c:1643:target_send_reply_msg()) Skipped 74 previous similar 
messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 137-5: UUID 'mds5_UUID' is not 
available  for connect (stopping)
Nov  4 03:07:15 hlc305 kernel: LustreError: 
17131:0:(llog_obd.c:380:llog_obd_origin_cleanup()) failure destroying log 
during cleanup: -30
Nov  4 03:07:15 hlc305 kernel: LustreError: 
17131:0:(llog_obd.c:380:llog_obd_origin_cleanup()) Skipped 6 previous similar 
messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 
17131:0:(fsfilt-ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) can't start 
transaction for 34 blocks (8192 bytes)
Nov  4 03:07:15 hlc305 kernel: LustreError: 
17131:0:(fsfilt-ldiskfs.c:1236:fsfilt_ldiskfs_write_record()) Skipped 6 
previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 
17131:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) error writing log record: rc 
-30
Nov  4 03:07:15 hlc305 kernel: LustreError: 
17131:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) Skipped 6 previous similar 
messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 
17131:0:(llog.c:135:llog_cancel_rec()) Failure re-writing header -30
Nov  4 03:07:15 hlc305 kernel: LustreError: 
17131:0:(llog.c:135:llog_cancel_rec()) Skipped 6 previous similar messages
Nov  4 03:07:15 hlc305 kernel: LustreError: 
17131:0:(handler.c:1963:mds_update_server_data()) error writing MDS server 
data: rc = -30
Nov  4 03:07:15 hlc305 kernel: Lustre: spehome-MDT0000: shutting down for 
failover; client state will be preserved.
Nov  4 03:07:15 hlc305 kernel: Lustre: MDT spehome-MDT0000 has stopped.
Nov  4 03:07:15 hlc305 kernel: VFS: Busy inodes after unmount. Self-destruct in 
5 seconds.  Have a nice day...
Nov  4 03:07:15 hlc305 kernel: Removing read-only on unknown block (0xfd0001c)
Nov  4 03:07:15 hlc305 kernel: Lustre: server umount spehome-MDT0000 complete
Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval 5 seconds
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with errors, 
running e2fsck is recommended
Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: recovery complete.
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with ordered data 
mode.
Nov  4 03:07:16 hlc305 kernel: kjournald starting.  Commit interval 5 seconds
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs warning: mounting fs with errors, 
running e2fsck is recommended
Nov  4 03:07:16 hlc305 kernel: LDISKFS FS on dm-28, internal journal
Nov  4 03:07:16 hlc305 kernel: LDISKFS-fs: mounted filesystem with ordered data 
mode.
Nov  4 03:07:16 hlc305 kernel: Lustre: Enabling user_xattr
Nov  4 03:07:16 hlc305 kernel: Lustre: 
17137:0:(mds_fs.c:511:mds_init_server_data()) RECOVERY: service 
spehome-MDT0000, 166 recoverable clients, last_transno 7646800754
Nov  4 03:07:16 hlc305 kernel: Lustre: 636:0:(mds_lov.c:1075:mds_notify()) MDS 
spehome-MDT0000: in recovery, not resetting orphans on ost1spehome_UUID
Nov  4 03:07:16 hlc305 kernel: Lustre: MDT spehome-MDT0000 now serving 
mds5_UUID (spehome-MDT0000/2af526ca-85d4-996b-4773-4898812d6a31), but will be 
in recovery for at le
ast 5:00, or until 166 clients reconnect. During this time new clients will not 
be allowed to connect. Recovery progress can be monitored by watching 
/proc/fs/lustre/mds/
spehome-MDT0000/recovery_status.
Nov  4 03:07:16 hlc305 kernel: Lustre: Server spehome-MDT0000 on device 
/dev/vg_mds/spehome has started
Nov  4 03:07:16 hlc305 kernel: Lustre: spehome-MDT0000: temporarily refusing 
client connection from 10.0.0...@tcp
Nov  4 03:07:16 hlc305 kernel: Lustre: 
1247:0:(ldlm_lib.c:1240:check_and_start_recovery_timer()) spehome-MDT0000: 
starting recovery timer
Nov  4 03:07:16 hlc305 kernel: Lustre: 
1249:0:(ldlm_lib.c:1591:target_queue_last_replay_reply()) spehome-MDT0000: 165 
recoverable clients remain
Nov  4 03:07:16 hlc305 kernel: Lustre: 
1249:0:(ldlm_lib.c:1591:target_queue_last_replay_reply()) Skipped 7 previous 
similar messages
Nov  4 03:07:16 hlc305 kernel: Lustre: 
1284:0:(mds_open.c:841:mds_open_by_fid()) Orphan d5839c:772a0a47 found and 
opened in PENDING directory
Nov  4 03:07:16 hlc305 kernel: Lustre: 
1284:0:(mds_open.c:841:mds_open_by_fid()) Skipped 1 previous similar message


--------------------------------------------------------------------------
Confidentiality Notice: This message is private and may contain confidential 
and proprietary information. If you have received this message in error, please 
notify us and remove it from your system and note that you must not copy, 
distribute or take any action in reliance on it. Any unauthorized use or 
disclosure of the contents of this message is not permitted and may be unlawful.
 
_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to