Re: [Ocfs2-devel] [PATCH] make ocfs2 do not log ENOENT error
On Fri, 24 Jan 2014 14:11:09 +0800 xiaowei...@oracle.com wrote: From: Xiaowei.Hu xiaowei...@oracle.com suppress log message like this: (open_delete,8328,0):ocfs2_unlink:951 ERROR: status = -2 Orabug:17445485 --- fs/ocfs2/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c index 0ba1cf0..ca6c5ba 100644 --- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -947,7 +947,7 @@ leave: ocfs2_free_dir_lookup_result(orphan_insert); ocfs2_free_dir_lookup_result(lookup); - if (status (status != -ENOTEMPTY)) + if (status (status != -ENOTEMPTY) (status != -ENOENT)) mlog_errno(status); return status; Please send a signed-off-by: for this patch. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] stalled ocfs2 patches
Guys, I'm developing an increasingly large backlog of ocfs2 patches which I am unprepared to send upstream. Some I feel need review which is beyond my understanding and some were reviewed, with an unclear outcome. Can we please take another look at all of these and try to get this all cleared away? I'll send all the patches now. My notes are as follows: ocfs2-fix-ocfs2_sync_file-if-filesystem-is-readonly.patch Needs acks ocfs2-free-allocated-clusters-if-error-occurs-after-ocfs2_claim_clusters.patch Needs acks ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one.patch Needs acks ocfs2-fix-a-tiny-race-when-running-dirop_fileop_racer.patch Needs acks ocfs2-code-cleanup-remove-unused-functions.patch Might have been nacked? ocfs2-o2net-o2net_listen_data_ready-should-do-nothing-if-socket-state-is-not-tcp_listen.patch Needs acks ocfs2-check-existence-of-old-dentry-in-ocfs2_link.patch Needs acks ocfs2-should-call-ocfs2_journal_access_di-before-ocfs2_delete_entry-in-ocfs2_orphan_del.patch Needs more review. Sunil might have nacked this, but it was foggy. ocfs2-llseek-requires-ocfs2-inode-lock-for-the-file-in-seek_end.patch Sunil was worried about performance, Joel had a question, Mark wanted an update, Joel acked it. ocfs2-fix-issue-that-ocfs2_setattr-does-not-deal-with-new_i_size==i_size.patch To be updated in response to comment from Jeff Liu ocfs2-update-inode-size-after-zeronig-the-hole.patch Needs acks ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 01/11] ocfs2: fix ocfs2_sync_file() if filesystem is readonly
From: Younger Liu younger.li...@gmail.com Subject: ocfs2: fix ocfs2_sync_file() if filesystem is readonly If filesystem is readonly, there is no need to flush drive's caches or force any uncommitted transactions. Signed-off-by: Younger Liu younger.li...@gmail.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/file.c |3 +++ 1 file changed, 3 insertions(+) diff -puN fs/ocfs2/file.c~ocfs2-fix-ocfs2_sync_file-if-filesystem-is-readonly fs/ocfs2/file.c --- a/fs/ocfs2/file.c~ocfs2-fix-ocfs2_sync_file-if-filesystem-is-readonly +++ a/fs/ocfs2/file.c @@ -185,6 +185,9 @@ static int ocfs2_sync_file(struct file * file-f_path.dentry-d_name.name, (unsigned long long)datasync); + if (ocfs2_is_hard_readonly(osb) || ocfs2_is_soft_readonly(osb)) + return 0; + err = filemap_write_and_wait_range(inode-i_mapping, start, end); if (err) return err; _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 03/11] ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one
From: Tariq Saeed tariq.x.sa...@oracle.com Subject: ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one When o2net-accept-one() rejects an illegal connection, it terminates the loop picking up the remaining queued connections. This fix will continue accepting connections till the queue is emtpy. Addresses Orabug 17489469. Signed-off-by: Tariq Saseed tariq.x.sa...@oracle.com Cc: Mark Fasheh mfas...@suse.com Cc: Joel Becker jl...@evilplan.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/cluster/tcp.c | 13 +++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff -puN fs/ocfs2/cluster/tcp.c~ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one fs/ocfs2/cluster/tcp.c --- a/fs/ocfs2/cluster/tcp.c~ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one +++ a/fs/ocfs2/cluster/tcp.c @@ -1826,7 +1826,7 @@ int o2net_register_hb_callbacks(void) /* */ -static int o2net_accept_one(struct socket *sock) +static int o2net_accept_one(struct socket *sock, int *more) { int ret, slen; struct sockaddr_in sin; @@ -1837,6 +1837,7 @@ static int o2net_accept_one(struct socke struct o2net_node *nn; BUG_ON(sock == NULL); + *more = 0; ret = sock_create_lite(sock-sk-sk_family, sock-sk-sk_type, sock-sk-sk_protocol, new_sock); if (ret) @@ -1848,6 +1849,7 @@ static int o2net_accept_one(struct socke if (ret 0) goto out; + *more = 1; new_sock-sk-sk_allocation = GFP_ATOMIC; ret = o2net_set_nodelay(new_sock); @@ -1949,8 +1951,15 @@ out: static void o2net_accept_many(struct work_struct *work) { struct socket *sock = o2net_listen_sock; - while (o2net_accept_one(sock) == 0) + int more; + int err; + + for (;;) { + err = o2net_accept_one(sock, more); + if (!more) + break; cond_resched(); + } } static void o2net_listen_data_ready(struct sock *sk, int bytes) _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 08/11] ocfs2: should call ocfs2_journal_access_di() before ocfs2_delete_entry() in ocfs2_orphan_del()
From: Younger Liu younger@huawei.com Subject: ocfs2: should call ocfs2_journal_access_di() before ocfs2_delete_entry() in ocfs2_orphan_del() While deleting a file into orphan dir in ocfs2_orphan_del(), it calls ocfs2_delete_entry() before ocfs2_journal_access_di(). If ocfs2_delete_entry() succeeded and ocfs2_journal_access_di() failed, there would be a inconsistency: the file is deleted from orphan dir, but orphan dir dinode is not updated. So we need to call ocfs2_journal_access_di() before ocfs2_orphan_del(). Signed-off-by: Younger Liu younger@huawei.com Reviewed-by: Jensen shencanq...@huawei.com Cc: Jie Liu jeff@oracle.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/namei.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff -puN fs/ocfs2/namei.c~ocfs2-should-call-ocfs2_journal_access_di-before-ocfs2_delete_entry-in-ocfs2_orphan_del fs/ocfs2/namei.c --- a/fs/ocfs2/namei.c~ocfs2-should-call-ocfs2_journal_access_di-before-ocfs2_delete_entry-in-ocfs2_orphan_del +++ a/fs/ocfs2/namei.c @@ -2207,17 +2207,17 @@ int ocfs2_orphan_del(struct ocfs2_super goto leave; } - /* remove it from the orphan directory */ - status = ocfs2_delete_entry(handle, orphan_dir_inode, lookup); + status = ocfs2_journal_access_di(handle, +INODE_CACHE(orphan_dir_inode), +orphan_dir_bh, +OCFS2_JOURNAL_ACCESS_WRITE); if (status 0) { mlog_errno(status); goto leave; } - status = ocfs2_journal_access_di(handle, -INODE_CACHE(orphan_dir_inode), -orphan_dir_bh, -OCFS2_JOURNAL_ACCESS_WRITE); + /* remove it from the orphan directory */ + status = ocfs2_delete_entry(handle, orphan_dir_inode, lookup); if (status 0) { mlog_errno(status); goto leave; _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 04/11] ocfs2: fix a tiny race when running dirop_fileop_racer
From: Yiwen Jiang jiangyi...@huawei.com Subject: ocfs2: fix a tiny race when running dirop_fileop_racer When running dirop_fileop_racer we found a dead lock case. 2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create /race/16/1 in the filesystem, and let the inode number of dir 16 is less than the inode number of dir race. Node ANode B mv /race/16/1 /race/ right after Node A has got the EX mode of /race/16/, and tries to get EX mode of /race ls /race/16/ In this case, Node A has got the EX mode of /race/16/, and wants to get EX mode of /race/. Node B has got the PR mode of /race/, and wants to get the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead lock happens. This patch fixes this case by locking in ancestor order before trying inode number order. Signed-off-by: Yiwen Jiang jiangyi...@huawei.com Signed-off-by: Joseph Qi joseph...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/namei.c | 97 - 1 file changed, 95 insertions(+), 2 deletions(-) diff -puN fs/ocfs2/namei.c~ocfs2-fix-a-tiny-race-when-running-dirop_fileop_racer fs/ocfs2/namei.c --- a/fs/ocfs2/namei.c~ocfs2-fix-a-tiny-race-when-running-dirop_fileop_racer +++ a/fs/ocfs2/namei.c @@ -954,6 +954,65 @@ leave: return status; } +static int ocfs2_check_if_ancestor(struct ocfs2_super *osb, + u64 src_inode_no, u64 dest_inode_no) +{ + int ret = 0, i = 0; + u64 parent_inode_no = 0; + u64 child_inode_no = src_inode_no; + struct inode *child_inode; + +#define MAX_LOOKUP_TIMES 32 + while (1) { + child_inode = ocfs2_iget(osb, child_inode_no, 0, 0); + if (IS_ERR(child_inode)) { + ret = PTR_ERR(child_inode); + break; + } + + ret = ocfs2_inode_lock(child_inode, NULL, 0); + if (ret 0) { + iput(child_inode); + if (ret != -ENOENT) + mlog_errno(ret); + break; + } + + ret = ocfs2_lookup_ino_from_name(child_inode, .., 2, + parent_inode_no); + ocfs2_inode_unlock(child_inode, 0); + iput(child_inode); + if (ret 0) { + ret = -ENOENT; + break; + } + + if (parent_inode_no == dest_inode_no) { + ret = 1; + break; + } + + if (parent_inode_no == osb-root_inode-i_ino) { + ret = 0; + break; + } + + child_inode_no = parent_inode_no; + + if (++i = MAX_LOOKUP_TIMES) { + mlog(ML_NOTICE, max lookup times reached, filesystem + may have nested directories, + src inode: %llu, dest inode: %llu.\n, + (unsigned long long)src_inode_no, + (unsigned long long)dest_inode_no); + ret = 0; + break; + } + } + + return ret; +} + /* * The only place this should be used is rename! * if they have the same id, then the 1st one is the only one locked. @@ -965,6 +1024,7 @@ static int ocfs2_double_lock(struct ocfs struct inode *inode2) { int status; + int inode1_is_ancestor, inode2_is_ancestor; struct ocfs2_inode_info *oi1 = OCFS2_I(inode1); struct ocfs2_inode_info *oi2 = OCFS2_I(inode2); struct buffer_head **tmpbh; @@ -978,9 +1038,26 @@ static int ocfs2_double_lock(struct ocfs if (*bh2) *bh2 = NULL; - /* we always want to lock the one with the lower lockid first. */ + /* we always want to lock the one with the lower lockid first. +* and if they are nested, we lock ancestor first */ if (oi1-ip_blkno != oi2-ip_blkno) { - if (oi1-ip_blkno oi2-ip_blkno) { + inode1_is_ancestor = ocfs2_check_if_ancestor(osb, oi2-ip_blkno, + oi1-ip_blkno); + if (inode1_is_ancestor 0) { + status = inode1_is_ancestor; + goto bail; + } + + inode2_is_ancestor = ocfs2_check_if_ancestor(osb, oi1-ip_blkno, + oi2-ip_blkno); + if (inode2_is_ancestor 0) { + status = inode2_is_ancestor; + goto bail; +
[Ocfs2-devel] [patch 10/11] ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size
From: Younger Liu younger@huawei.com Subject: ocfs2: fix issue that ocfs2_setattr() does not deal with new_i_size==i_size The issue scenario is as following: - Create a small file and fallocate a large disk space for a file with FALLOC_FL_KEEP_SIZE option. - ftruncate the file back to the original size again. but the disk free space is not changed back. This is a real bug that be fixed in this patch. In order to solve the issue above, we modified ocfs2_setattr(), if attr-ia_size != i_size_read(inode), It calls ocfs2_truncate_file(), and truncate disk space to attr-ia_size. Signed-off-by: Younger Liu younger@huawei.com Reviewed-by: Jie Liu jeff@oracle.com Tested-by: Jie Liu jeff@oracle.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Cc: Sunil Mushran sunil.mush...@gmail.com Reviewed-by: Jensen shencanq...@huawei.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/alloc.c |2 +- fs/ocfs2/file.c |9 ++--- 2 files changed, 3 insertions(+), 8 deletions(-) diff -puN fs/ocfs2/alloc.c~ocfs2-fix-issue-that-ocfs2_setattr-does-not-deal-with-new_i_size==i_size fs/ocfs2/alloc.c --- a/fs/ocfs2/alloc.c~ocfs2-fix-issue-that-ocfs2_setattr-does-not-deal-with-new_i_size==i_size +++ a/fs/ocfs2/alloc.c @@ -7158,7 +7158,7 @@ int ocfs2_truncate_inline(struct inode * if (end i_size_read(inode)) end = i_size_read(inode); - BUG_ON(start = end); + BUG_ON(start end); if (!(OCFS2_I(inode)-ip_dyn_features OCFS2_INLINE_DATA_FL) || !(le16_to_cpu(di-i_dyn_features) OCFS2_INLINE_DATA_FL) || diff -puN fs/ocfs2/file.c~ocfs2-fix-issue-that-ocfs2_setattr-does-not-deal-with-new_i_size==i_size fs/ocfs2/file.c --- a/fs/ocfs2/file.c~ocfs2-fix-issue-that-ocfs2_setattr-does-not-deal-with-new_i_size==i_size +++ a/fs/ocfs2/file.c @@ -477,11 +477,6 @@ static int ocfs2_truncate_file(struct in goto bail; } - /* lets handle the simple truncate cases before doing any more -* cluster locking. */ - if (new_i_size == le64_to_cpu(fe-i_size)) - goto bail; - down_write(OCFS2_I(inode)-ip_alloc_sem); ocfs2_resv_discard(osb-osb_la_resmap, @@ -1148,14 +1143,14 @@ int ocfs2_setattr(struct dentry *dentry, goto bail_unlock_rw; } - if (size_change attr-ia_size != i_size_read(inode)) { + if (size_change) { status = inode_newsize_ok(inode, attr-ia_size); if (status) goto bail_unlock; inode_dio_wait(inode); - if (i_size_read(inode) attr-ia_size) { + if (i_size_read(inode) = attr-ia_size) { if (ocfs2_should_order_data(inode)) { status = ocfs2_begin_ordered_truncate(inode, attr-ia_size); _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 07/11] ocfs2: check existence of old dentry in ocfs2_link()
From: Xue jiufei xuejiu...@huawei.com Subject: ocfs2: check existence of old dentry in ocfs2_link() System call linkat first calls user_path_at(), check the existence of old dentry, and then calls vfs_link()-ocfs2_link() to do the actual work. There may exist a race when Node A create a hard link for file while node B rm it. Node A Node B user_path_at() -ocfs2_lookup(), find old dentry exist rm file, add inode say inodeA to orphan_dir call ocfs2_link(),create a hard link for inodeA. rm the link, add inodeA to orphan_dir again When orphan_scan work start, it calls ocfs2_queue_orphans() to do the main work. It first tranverses entrys in orphan_dir, linking all inodes in this orphan_dir to a list look like this: inodeA-inodeB-...-inodeA When tranvering this list, it will fall into loop, calling iput() again and again. And finally trigger BUG_ON(inode-i_state I_CLEAR). Signed-off-by: joyce xuejiu...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/namei.c | 13 + 1 file changed, 13 insertions(+) diff -puN fs/ocfs2/namei.c~ocfs2-check-existence-of-old-dentry-in-ocfs2_link fs/ocfs2/namei.c --- a/fs/ocfs2/namei.c~ocfs2-check-existence-of-old-dentry-in-ocfs2_link +++ a/fs/ocfs2/namei.c @@ -644,6 +644,7 @@ static int ocfs2_link(struct dentry *old struct ocfs2_super *osb = OCFS2_SB(dir-i_sb); struct ocfs2_dir_lookup_result lookup = { NULL, }; sigset_t oldset; + u64 old_de_ino; trace_ocfs2_link((unsigned long long)OCFS2_I(inode)-ip_blkno, old_dentry-d_name.len, old_dentry-d_name.name, @@ -665,6 +666,18 @@ static int ocfs2_link(struct dentry *old err = -ENOENT; goto out; } + + err = ocfs2_lookup_ino_from_name(dir, old_dentry-d_name.name, + old_dentry-d_name.len, old_de_ino); + if (err) { + err = -ENOENT; + goto out; + } + + if (old_de_ino != OCFS2_I(inode)-ip_blkno) { + err = -ENOENT; + goto out; + } err = ocfs2_check_dir_for_entry(dir, dentry-d_name.name, dentry-d_name.len); _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 11/11] ocfs2: update inode size after zeroing the hole
From: Junxiao Bi junxiao...@oracle.com Subject: ocfs2: update inode size after zeroing the hole fs-writeback will release the dirty pages without page lock whose offset are over inode size, the release happens at block_write_full_page_endio(). If not update, dirty pages in file holes may be released before flushed to the disk, then file holes will contain some non-zero data, this will cause sparse file md5sum error. To reproduce the bug, find a big sparse file with many holes, like vm image file, its actual size should be bigger than available mem size to make writeback work more frequently, tar it with -S option, then keep untar it and check its md5sum again and again until you get a wrong md5sum. Signed-off-by: Junxiao Bi junxiao...@oracle.com Cc: Younger Liu younger@huawei.com Cc: Mark Fasheh mfas...@suse.com Cc: Joel Becker jl...@evilplan.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/file.c | 40 1 file changed, 32 insertions(+), 8 deletions(-) diff -puN fs/ocfs2/file.c~ocfs2-update-inode-size-after-zeronig-the-hole fs/ocfs2/file.c --- a/fs/ocfs2/file.c~ocfs2-update-inode-size-after-zeronig-the-hole +++ a/fs/ocfs2/file.c @@ -716,7 +716,8 @@ leave: * While a write will already be ordering the data, a truncate will not. * Thus, we need to explicitly order the zeroed pages. */ -static handle_t *ocfs2_zero_start_ordered_transaction(struct inode *inode) +static handle_t *ocfs2_zero_start_ordered_transaction(struct inode *inode, + struct buffer_head *di_bh) { struct ocfs2_super *osb = OCFS2_SB(inode-i_sb); handle_t *handle = NULL; @@ -733,7 +734,14 @@ static handle_t *ocfs2_zero_start_ordere } ret = ocfs2_jbd2_file_inode(handle, inode); - if (ret 0) + if (ret 0) { + mlog_errno(ret); + goto out; + } + + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), di_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (ret) mlog_errno(ret); out: @@ -749,7 +757,7 @@ out: * to be too fragile to do exactly what we need without us having to * worry about recursive locking in -write_begin() and -write_end(). */ static int ocfs2_write_zero_page(struct inode *inode, u64 abs_from, -u64 abs_to) +u64 abs_to, struct buffer_head *di_bh) { struct address_space *mapping = inode-i_mapping; struct page *page; @@ -757,6 +765,7 @@ static int ocfs2_write_zero_page(struct handle_t *handle = NULL; int ret = 0; unsigned zero_from, zero_to, block_start, block_end; + struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh-b_data; BUG_ON(abs_from = abs_to); BUG_ON(abs_to (((u64)index + 1) PAGE_CACHE_SHIFT)); @@ -799,7 +808,8 @@ static int ocfs2_write_zero_page(struct } if (!handle) { - handle = ocfs2_zero_start_ordered_transaction(inode); + handle = ocfs2_zero_start_ordered_transaction(inode, + di_bh); if (IS_ERR(handle)) { ret = PTR_ERR(handle); handle = NULL; @@ -816,8 +826,22 @@ static int ocfs2_write_zero_page(struct ret = 0; } - if (handle) + if (handle) { + /* +* fs-writeback will release the dirty pages without page lock +* whose offset are over inode size, the release happens at +* block_write_full_page_endio(). +*/ + i_size_write(inode, abs_to); + inode-i_blocks = ocfs2_inode_sector_count(inode); + di-i_size = cpu_to_le64((u64)i_size_read(inode)); + inode-i_mtime = inode-i_ctime = CURRENT_TIME; + di-i_mtime = di-i_ctime = cpu_to_le64(inode-i_mtime.tv_sec); + di-i_ctime_nsec = cpu_to_le32(inode-i_mtime.tv_nsec); + di-i_mtime_nsec = di-i_ctime_nsec; + ocfs2_journal_dirty(handle, di_bh); ocfs2_commit_trans(OCFS2_SB(inode-i_sb), handle); + } out_unlock: unlock_page(page); @@ -913,7 +937,7 @@ out: * has made sure that the entire range needs zeroing. */ static int ocfs2_zero_extend_range(struct inode *inode, u64 range_start, - u64 range_end) + u64 range_end, struct buffer_head *di_bh) { int rc = 0; u64 next_pos; @@ -929,7 +953,7 @@ static int ocfs2_zero_extend_range(struc next_pos = (zero_pos PAGE_CACHE_MASK) + PAGE_CACHE_SIZE; if (next_pos range_end) next_pos = range_end; - rc =
[Ocfs2-devel] [patch 06/11] ocfs2/o2net: o2net_listen_data_ready should do nothing if socket state is not TCP_LISTEN
From: Tariq Saeed tariq.x.sa...@oracle.com Subject: ocfs2/o2net: o2net_listen_data_ready should do nothing if socket state is not TCP_LISTEN Orabug: 17330860 When accepting an incomming connection o2net_accept_one clones a child data socket from the parent listening socket. It then proceeds to setup the child with callback o2net_data_ready() and sk_user_data to NULL. If data arrives in this window, o2net_listen_data_ready will be called with some non-deterministic value in sk_user_data (not inherited). We panic when we page fault on sk_user_data -- in parent it is sock_def_readable(). The fix is to recognize that this is a data socket being set up by looking at the socket state and do nothing. Signed-off-by: Tariq Saseed tariq.x.sa...@oracle.com Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/cluster/tcp.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff -puN fs/ocfs2/cluster/tcp.c~ocfs2-o2net-o2net_listen_data_ready-should-do-nothing-if-socket-state-is-not-tcp_listen fs/ocfs2/cluster/tcp.c --- a/fs/ocfs2/cluster/tcp.c~ocfs2-o2net-o2net_listen_data_ready-should-do-nothing-if-socket-state-is-not-tcp_listen +++ a/fs/ocfs2/cluster/tcp.c @@ -1973,18 +1973,30 @@ static void o2net_listen_data_ready(stru goto out; } - /* -sk_data_ready is also called for a newly established child socket -* before it has been accepted and the acceptor has set up their -* data_ready.. we only want to queue listen work for our listening -* socket */ + /* This callback may called twice when a new connection +* is being established as a child socket inherits everything +* from a parent LISTEN socket, including the data_ready cb of +* the parent. This leads to a hazard. In o2net_accept_one() +* we are still initializing the child socket but have not +* changed the inherited data_ready callback yet when +* data starts arriving. +* We avoid this hazard by checking the state. +* For the listening socket, the state will be TCP_LISTEN; for the new +* socket, will be TCP_ESTABLISHED. Also, in this case, +* sk-sk_user_data is not a valid function pointer. +*/ + if (sk-sk_state == TCP_LISTEN) { mlog(ML_TCP, bytes: %d\n, bytes); queue_work(o2net_wq, o2net_listen_work); + } else { + ready = NULL; } out: read_unlock(sk-sk_callback_lock); - ready(sk, bytes); + if (ready != NULL) + ready(sk, bytes); } static int o2net_open_listening_sock(__be32 addr, __be16 port) _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 09/11] ocfs2: llseek requires ocfs2 inode lock for the file in SEEK_END
From: Jensen shencanq...@huawei.com Subject: ocfs2: llseek requires ocfs2 inode lock for the file in SEEK_END llseek requires ocfs2 inode lock for updating the file size in SEEK_END. because the file size maybe update on another node. This bug can be reproduce the following scenario: at first, we dd a test fileA, the file size is 10k. on NodeA: - 1) open the test fileA, lseek the end of file. and print the position. 2) close the test fileA on NodeB: 1) open the test fileA, append the 5k data to test FileA. 2) lseek the end of file. and print the position. 3) close file. At first we run the test program1 on NodeA , the result is 10k. And then run the test program2 on NodeB, the result is 15k. At last, we run the test program1 on NodeA again, the result is 10k. After applying this patch the three step result is 15k. Signed-off-by: Jensen shencanq...@huawei.com Cc: Jie Liu jeff@oracle.com Acked-by: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Cc: Sunil Mushran sunil.mush...@gmail.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/file.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff -puN fs/ocfs2/file.c~ocfs2-llseek-requires-ocfs2-inode-lock-for-the-file-in-seek_end fs/ocfs2/file.c --- a/fs/ocfs2/file.c~ocfs2-llseek-requires-ocfs2-inode-lock-for-the-file-in-seek_end +++ a/fs/ocfs2/file.c @@ -2626,7 +2626,16 @@ static loff_t ocfs2_file_llseek(struct f case SEEK_SET: break; case SEEK_END: - offset += inode-i_size; + /* SEEK_END requires the OCFS2 inode lock for the file +* because it references the file's size. +*/ + ret = ocfs2_inode_lock(inode, NULL, 0); + if (ret 0) { + mlog_errno(ret); + goto out; + } + offset += i_size_read(inode); + ocfs2_inode_unlock(inode, 0); break; case SEEK_CUR: if (offset == 0) { _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch 03/11] ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one
On Fri, Jan 24, 2014 at 12:47:02PM -0800, a...@linux-foundation.org wrote: From: Tariq Saeed tariq.x.sa...@oracle.com Subject: ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one When o2net-accept-one() rejects an illegal connection, it terminates the loop picking up the remaining queued connections. This fix will continue accepting connections till the queue is emtpy. Addresses Orabug 17489469. Thanks for sending this, review comments below. diff -puN fs/ocfs2/cluster/tcp.c~ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one fs/ocfs2/cluster/tcp.c --- a/fs/ocfs2/cluster/tcp.c~ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one +++ a/fs/ocfs2/cluster/tcp.c @@ -1826,7 +1826,7 @@ int o2net_register_hb_callbacks(void) /* */ -static int o2net_accept_one(struct socket *sock) +static int o2net_accept_one(struct socket *sock, int *more) { int ret, slen; struct sockaddr_in sin; @@ -1837,6 +1837,7 @@ static int o2net_accept_one(struct socke struct o2net_node *nn; BUG_ON(sock == NULL); + *more = 0; ret = sock_create_lite(sock-sk-sk_family, sock-sk-sk_type, sock-sk-sk_protocol, new_sock); if (ret) @@ -1848,6 +1849,7 @@ static int o2net_accept_one(struct socke if (ret 0) goto out; + *more = 1; new_sock-sk-sk_allocation = GFP_ATOMIC; ret = o2net_set_nodelay(new_sock); @@ -1949,8 +1951,15 @@ out: static void o2net_accept_many(struct work_struct *work) { struct socket *sock = o2net_listen_sock; - while (o2net_accept_one(sock) == 0) + int more; + int err; + + for (;;) { + err = o2net_accept_one(sock, more); + if (!more) + break; We're throwing out 'err' here and trusting the variable 'more'. However, err could be set and more would be 0 regardless of whether there actually are more connections to be had. This makes more sense given when 'more' is set: if (err) break; /* only trust the value of 'more' when err == 0 */ if (more) break; Thanks, --Mark -- Mark Fasheh ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch 05/11] ocfs2: code cleanup: remove unused functions
On Fri, Jan 24, 2014 at 12:47:04PM -0800, a...@linux-foundation.org wrote: From: Goldwyn Rodrigues rgold...@suse.de Subject: ocfs2: code cleanup: remove unused functions These functions are either coded in individual files as static or not used at all. Remove them. NAK - some of this may be used by userspace (ocfs2_fs.h is shared with ocfs2-tools). The other reason for my NAK is that some of these either complete a set of functions or provide information on how to handle structures: ocfs2_gd_is_discontig or ocfs2_system_inode_is_global are good examples and it's all in ocfs2_fs.h which is where I'd personally go look to see how the fs structures are stored. --Mark -- Mark Fasheh ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch 01/11] ocfs2: fix ocfs2_sync_file() if filesystem is readonly
On 01/24/2014 02:46 PM, a...@linux-foundation.org wrote: From: Younger Liu younger.li...@gmail.com Subject: ocfs2: fix ocfs2_sync_file() if filesystem is readonly If filesystem is readonly, there is no need to flush drive's caches or force any uncommitted transactions. An ocfs2 filesystem can be set to read-only because of an error, in which case, you should return -EROFS. Nak. -- Goldwyn ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch 01/11] ocfs2: fix ocfs2_sync_file() if filesystem is readonly
On Fri, Jan 24, 2014 at 12:46:59PM -0800, a...@linux-foundation.org wrote: From: Younger Liu younger.li...@gmail.com Subject: ocfs2: fix ocfs2_sync_file() if filesystem is readonly If filesystem is readonly, there is no need to flush drive's caches or force any uncommitted transactions. Signed-off-by: Younger Liu younger.li...@gmail.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org Looks good, thanks for this. Signed-off-by: Mark Fasheh mfas...@suse.de --Mark -- Mark Fasheh ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch 03/11] ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one
On 01/24/2014 01:55 PM, Mark Fasheh wrote: On Fri, Jan 24, 2014 at 12:47:02PM -0800, a...@linux-foundation.org wrote: From: Tariq Saeed tariq.x.sa...@oracle.com Subject: ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one When o2net-accept-one() rejects an illegal connection, it terminates the loop picking up the remaining queued connections. This fix will continue accepting connections till the queue is emtpy. Addresses Orabug 17489469. Thanks for sending this, review comments below. diff -puN fs/ocfs2/cluster/tcp.c~ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one fs/ocfs2/cluster/tcp.c --- a/fs/ocfs2/cluster/tcp.c~ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one +++ a/fs/ocfs2/cluster/tcp.c @@ -1826,7 +1826,7 @@ int o2net_register_hb_callbacks(void) /* */ -static int o2net_accept_one(struct socket *sock) +static int o2net_accept_one(struct socket *sock, int *more) { int ret, slen; struct sockaddr_in sin; @@ -1837,6 +1837,7 @@ static int o2net_accept_one(struct socke struct o2net_node *nn; BUG_ON(sock == NULL); +*more = 0; ret = sock_create_lite(sock-sk-sk_family, sock-sk-sk_type, sock-sk-sk_protocol, new_sock); if (ret) @@ -1848,6 +1849,7 @@ static int o2net_accept_one(struct socke if (ret 0) goto out; +*more = 1; new_sock-sk-sk_allocation = GFP_ATOMIC; ret = o2net_set_nodelay(new_sock); @@ -1949,8 +1951,15 @@ out: static void o2net_accept_many(struct work_struct *work) { struct socket *sock = o2net_listen_sock; -while (o2net_accept_one(sock) == 0) +int more; +int err; + +for (;;) { +err = o2net_accept_one(sock, more); +if (!more) +break; We're throwing out 'err' here and trusting the variable 'more'. However, err could be set and more would be 0 regardless of whether there actually are more connections to be had. This makes more sense given when 'more' is set: Hi Mark, thanks for reviewing this :). Please correct me if I am wrong. The idea was if accept itself fails exit the loop, otherwise continue scanning all queued connections. For eg: if a connection request from unknown node followed by known node happens at the same time, we can ignore the ret value of the first connection request and still continue accepting the next connection. ret value appears to be of not much importance other than just for logging or debugging purpose. if (err) break; /* only trust the value of 'more' when err == 0 */ if (more) break; Thanks, --Mark -- Mark Fasheh ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch 01/11] ocfs2: fix ocfs2_sync_file() if filesystem is readonly
On Fri, Jan 24, 2014 at 04:02:09PM -0600, Goldwyn Rodrigues wrote: On 01/24/2014 02:46 PM, a...@linux-foundation.org wrote: From: Younger Liu younger.li...@gmail.com Subject: ocfs2: fix ocfs2_sync_file() if filesystem is readonly If filesystem is readonly, there is no need to flush drive's caches or force any uncommitted transactions. An ocfs2 filesystem can be set to read-only because of an error, in which case, you should return -EROFS. Nak. Goldwyn's right actually - disregard my sign off for the last one. Basically the patch does this: if (we're in some readonly state) return 0; What we want, at the top of ocfs2_sync_file() is a return of -EROFS. This will satisfy Goldwyn's requirement that we bubble -EROFS up the stack but at the same time avoiding the extra work of trying to sync on a RO fs. So the new version of the patch would be: if (we're in some readonly state) return -EROFS; Thanks, --Mark -- Mark Fasheh ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch 03/11] ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one
Thanks for the comments. To understand the consequences of ignoring the err, we need to look at what is going on. We get a softIRQ when a connection packet (tcp SYN). It is critical to note that we may not get a softIRQ_for every connection s_ince connection packets can arrive back-to-back (as happened in this bug). So, one softIRQ could be delivered for 1 pending accept. _This is the KEY point. _ If we terminate the loop calling o2net_accept_one() upon seeing an error, what happens to the rest of the connections in the queue. If no new connection arrives for hours, no new softIRQ will be delivered, and the connections will just sit in the queue. Thanks, -Tariq On 1/24/2014 1:55 PM, Mark Fasheh wrote: On Fri, Jan 24, 2014 at 12:47:02PM -0800, a...@linux-foundation.org wrote: From: Tariq Saeed tariq.x.sa...@oracle.com Subject: ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one When o2net-accept-one() rejects an illegal connection, it terminates the loop picking up the remaining queued connections. This fix will continue accepting connections till the queue is emtpy. Addresses Orabug 17489469. Thanks for sending this, review comments below. diff -puN fs/ocfs2/cluster/tcp.c~ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one fs/ocfs2/cluster/tcp.c --- a/fs/ocfs2/cluster/tcp.c~ocfs2-o2net-incorrect-to-terminate-accepting-connections-loop-upon-rejecting-an-invalid-one +++ a/fs/ocfs2/cluster/tcp.c @@ -1826,7 +1826,7 @@ int o2net_register_hb_callbacks(void) /* */ -static int o2net_accept_one(struct socket *sock) +static int o2net_accept_one(struct socket *sock, int *more) { int ret, slen; struct sockaddr_in sin; @@ -1837,6 +1837,7 @@ static int o2net_accept_one(struct socke struct o2net_node *nn; BUG_ON(sock == NULL); + *more = 0; ret = sock_create_lite(sock-sk-sk_family, sock-sk-sk_type, sock-sk-sk_protocol, new_sock); if (ret) @@ -1848,6 +1849,7 @@ static int o2net_accept_one(struct socke if (ret 0) goto out; + *more = 1; new_sock-sk-sk_allocation = GFP_ATOMIC; ret = o2net_set_nodelay(new_sock); @@ -1949,8 +1951,15 @@ out: static void o2net_accept_many(struct work_struct *work) { struct socket *sock = o2net_listen_sock; - while (o2net_accept_one(sock) == 0) + int more; + int err; + + for (;;) { + err = o2net_accept_one(sock, more); + if (!more) + break; We're throwing out 'err' here and trusting the variable 'more'. However, err could be set and more would be 0 regardless of whether there actually are more connections to be had. This makes more sense given when 'more' is set: if (err) break; /* only trust the value of 'more' when err == 0 */ if (more) break; Thanks, --Mark -- Mark Fasheh ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch 01/11] ocfs2: fix ocfs2_sync_file() if filesystem is readonly
On Fri, 24 Jan 2014 14:21:19 -0800 Mark Fasheh mfas...@suse.de wrote: On Fri, Jan 24, 2014 at 04:02:09PM -0600, Goldwyn Rodrigues wrote: On 01/24/2014 02:46 PM, a...@linux-foundation.org wrote: From: Younger Liu younger.li...@gmail.com Subject: ocfs2: fix ocfs2_sync_file() if filesystem is readonly If filesystem is readonly, there is no need to flush drive's caches or force any uncommitted transactions. An ocfs2 filesystem can be set to read-only because of an error, in which case, you should return -EROFS. Nak. Goldwyn's right actually - disregard my sign off for the last one. Basically the patch does this: if (we're in some readonly state) return 0; What we want, at the top of ocfs2_sync_file() is a return of -EROFS. This will satisfy Goldwyn's requirement that we bubble -EROFS up the stack but at the same time avoiding the extra work of trying to sync on a RO fs. So the new version of the patch would be: if (we're in some readonly state) return -EROFS; So it's this? --- a/fs/ocfs2/file.c~ocfs2-fix-ocfs2_sync_file-if-filesystem-is-readonly +++ a/fs/ocfs2/file.c @@ -185,6 +185,9 @@ static int ocfs2_sync_file(struct file * file-f_path.dentry-d_name.name, (unsigned long long)datasync); + if (ocfs2_is_hard_readonly(osb) || ocfs2_is_soft_readonly(osb)) + return -EROFS; + err = filemap_write_and_wait_range(inode-i_mapping, start, end); if (err) return err; _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel