Re: [Ocfs2-devel] ocfs2-tools repository and documentation
Hi Goldwyn, Thanks for your answer. I will have a look at your code. I just want to be sure that the source code I am looking at is the latest one to avoid fixing things that maybe have been already fixed elsewhere. Same is true for documentation and package repositories. If I could have links to the latest, no matter how old and not maintained, documentation and repository, it would be helpful, as well. Do you know of any other distro, besides Suse, maintaining a stack of patches? Cheers, Germano On 14/12/14 07:24, Goldwyn Rodrigues wrote: Yes, if you don't count the last 2 patches, it is more close to 3 years now ;) Since the patches were not being updated. I started maintaining an alternate repository where I am putting all the bugs reported at SUSE: https://github.com/goldwynr/ocfs2-tools Branch suse-fixes has the fixes found by SUSE over the upstream branch. Branch nocontrold has the patches for the feature of doing away with ocfs2_controld to work with the latest corosync/pacemaker stack. Patches for the kernel are already in the kernel but the ones in the tools need some review. I had a mail conversation with Srini and he has promised to update the upstream branch soon. Regards, ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] ocfs patches for review
Another round of jolly patch reviewing, please. A number of these patches have been stalled for quite a long time. I have the following notes: o2dlm-fix-null-pointer-dereference-in-o2dlm_blocking_ast_wrapper.patch: - Joseph Qi had issues ocfs2-free-inode-when-i_count-becomes-zero.patch: - Mark is finding a better way of doing this ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock.patch: - Mark requested changes ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 06/15] ocfs2: reflink: fix slow unlink for refcounted file
From: Junxiao Bi junxiao...@oracle.com Subject: ocfs2: reflink: fix slow unlink for refcounted file When running ocfs2 test suite multiple nodes reflink stress test, for a 4 nodes cluster, every unlink() for refcounted file needs about 700s. The slow unlink is caused by the contention of refcount tree lock since all nodes are unlink files using the same refcount tree. When the unlinking file have many extents(over 1600 in our test), most of the extents has refcounted flag set. In ocfs2_commit_truncate(), it will execute the following call trace for every extents. This means it needs get and released refcount tree lock about 1600 times. And when several nodes are do this at the same time, the performance will be very low. ocfs2_remove_btree_range() ocfs2_lock_refcount_tree() --ocfs2_refcount_lock() __ocfs2_cluster_lock() ocfs2_refcount_lock() is costly, move it to ocfs2_commit_truncate() to do lock/unlock once can improve a lot performance. Signed-off-by: Junxiao Bi junxiao...@oracle.com Cc: Wengang wen.gang.w...@oracle.com Cc: Mark Fasheh mfas...@suse.com Cc: Joel Becker jl...@evilplan.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/alloc.c | 28 +--- fs/ocfs2/alloc.h |2 +- fs/ocfs2/dir.c |2 +- fs/ocfs2/file.c |2 +- 4 files changed, 24 insertions(+), 10 deletions(-) diff -puN fs/ocfs2/alloc.c~ocfs2-reflink-fix-slow-unlink-for-refcounted-file fs/ocfs2/alloc.c --- a/fs/ocfs2/alloc.c~ocfs2-reflink-fix-slow-unlink-for-refcounted-file +++ a/fs/ocfs2/alloc.c @@ -5662,7 +5662,7 @@ int ocfs2_remove_btree_range(struct inod struct ocfs2_extent_tree *et, u32 cpos, u32 phys_cpos, u32 len, int flags, struct ocfs2_cached_dealloc_ctxt *dealloc, -u64 refcount_loc) +u64 refcount_loc, bool refcount_tree_locked) { int ret, credits = 0, extra_blocks = 0; u64 phys_blkno = ocfs2_clusters_to_blocks(inode-i_sb, phys_cpos); @@ -5676,11 +5676,13 @@ int ocfs2_remove_btree_range(struct inod BUG_ON(!(OCFS2_I(inode)-ip_dyn_features OCFS2_HAS_REFCOUNT_FL)); - ret = ocfs2_lock_refcount_tree(osb, refcount_loc, 1, - ref_tree, NULL); - if (ret) { - mlog_errno(ret); - goto bail; + if (!refcount_tree_locked) { + ret = ocfs2_lock_refcount_tree(osb, refcount_loc, 1, + ref_tree, NULL); + if (ret) { + mlog_errno(ret); + goto bail; + } } ret = ocfs2_prepare_refcount_change_for_del(inode, @@ -7021,6 +7023,7 @@ int ocfs2_commit_truncate(struct ocfs2_s u64 refcount_loc = le64_to_cpu(di-i_refcount_loc); struct ocfs2_extent_tree et; struct ocfs2_cached_dealloc_ctxt dealloc; + struct ocfs2_refcount_tree *ref_tree = NULL; ocfs2_init_dinode_extent_tree(et, INODE_CACHE(inode), di_bh); ocfs2_init_dealloc_ctxt(dealloc); @@ -7130,9 +7133,18 @@ start: phys_cpos = ocfs2_blocks_to_clusters(inode-i_sb, blkno); + if ((flags OCFS2_EXT_REFCOUNTED) trunc_len !ref_tree) { + status = ocfs2_lock_refcount_tree(osb, refcount_loc, 1, + ref_tree, NULL); + if (status) { + mlog_errno(status); + goto bail; + } + } + status = ocfs2_remove_btree_range(inode, et, trunc_cpos, phys_cpos, trunc_len, flags, dealloc, - refcount_loc); + refcount_loc, true); if (status 0) { mlog_errno(status); goto bail; @@ -7147,6 +7159,8 @@ start: goto start; bail: + if (ref_tree) + ocfs2_unlock_refcount_tree(osb, ref_tree, 1); ocfs2_schedule_truncate_log_flush(osb, 1); diff -puN fs/ocfs2/alloc.h~ocfs2-reflink-fix-slow-unlink-for-refcounted-file fs/ocfs2/alloc.h --- a/fs/ocfs2/alloc.h~ocfs2-reflink-fix-slow-unlink-for-refcounted-file +++ a/fs/ocfs2/alloc.h @@ -142,7 +142,7 @@ int ocfs2_remove_btree_range(struct inod struct ocfs2_extent_tree *et, u32 cpos, u32 phys_cpos, u32 len, int flags, struct ocfs2_cached_dealloc_ctxt *dealloc, -u64 refcount_loc); +u64 refcount_loc, bool refcount_tree_locked); int ocfs2_num_free_extents(struct ocfs2_super *osb, struct ocfs2_extent_tree *et); diff -puN
[Ocfs2-devel] [patch 01/15] o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper
From: Srinivas Eeda srinivas.e...@oracle.com Subject: o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper A tiny race between BAST and unlock message causes the NULL dereference. A node sends an unlock request to master and receives a response. Before processing the response it receives a BAST from the master. Since both requests are processed by different threads it creates a race. While the BAST is being processed, lock can get freed by unlock code. This patch makes bast to return immediately if lock is found but unlock is pending. The code should handle this race. We also have to fix master node to skip sending BAST after receiving unlock message. Below is the crash stack BUG: unable to handle kernel NULL pointer dereference at 0048 IP: [a015e023] o2dlm_blocking_ast_wrapper+0xd/0x16 [a034e3db] dlm_do_local_bast+0x8e/0x97 [ocfs2_dlm] [a034f366] dlm_proxy_ast_handler+0x838/0x87e [ocfs2_dlm] [a0308abe] o2net_process_message+0x395/0x5b8 [ocfs2_nodemanager] [a030aac8] o2net_rx_until_empty+0x762/0x90d [ocfs2_nodemanager] [81071802] worker_thread+0x14d/0x1ed [a...@linux-foundation.org: coding-style fixes] Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com Cc: Mark Fasheh mfas...@suse.com Cc: Joel Becker jl...@evilplan.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/dlm/dlmast.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff -puN fs/ocfs2/dlm/dlmast.c~o2dlm-fix-null-pointer-dereference-in-o2dlm_blocking_ast_wrapper fs/ocfs2/dlm/dlmast.c --- a/fs/ocfs2/dlm/dlmast.c~o2dlm-fix-null-pointer-dereference-in-o2dlm_blocking_ast_wrapper +++ a/fs/ocfs2/dlm/dlmast.c @@ -385,8 +385,12 @@ int dlm_proxy_ast_handler(struct o2net_m head = res-granted; list_for_each_entry(lock, head, list) { - if (lock-ml.cookie == cookie) + /* if lock is found but unlock is pending ignore the bast */ + if (lock-ml.cookie == cookie) { + if (lock-unlock_pending) + break; goto do_ast; + } } mlog(0, Got %sast for unknown lock! cookie=%u:%llu, name=%.*s, _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 07/15] ocfs2: fix journal commit deadlock
From: Junxiao Bi junxiao...@oracle.com Subject: ocfs2: fix journal commit deadlock For buffer write, page lock will be got in write_begin and released in write_end, in ocfs2_write_end_nolock(), before it unlock the page in ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask for the read lock of journal-j_trans_barrier. Holding page lock and ask for journal-j_trans_barrier breaks the locking order. This will cause a deadlock with journal commit threads, ocfs2cmt will get write lock of journal-j_trans_barrier first, then it wakes up kjournald2 to do the commit work, at last it waits until done. To commit journal, kjournald2 needs flushing data first, it needs get the cache page lock. Since some ocfs2 cluster locks are holding by write process, this deadlock may hung the whole cluster. unlock pages before ocfs2_run_deallocs() can fix the locking order, also put unlock before ocfs2_commit_trans() to make page lock is unlocked before j_trans_barrier to preserve unlocking order. Signed-off-by: Junxiao Bi junxiao...@oracle.com Reviewed-by: Wengang Wang wen.gang.w...@oracle.com Cc: sta...@vger.kernel.org Cc: Mark Fasheh mfas...@suse.com Cc: Joel Becker jl...@evilplan.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/aops.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff -puN fs/ocfs2/aops.c~ocfs2-fix-journal-commit-deadlock fs/ocfs2/aops.c --- a/fs/ocfs2/aops.c~ocfs2-fix-journal-commit-deadlock +++ a/fs/ocfs2/aops.c @@ -894,7 +894,7 @@ void ocfs2_unlock_and_free_pages(struct } } -static void ocfs2_free_write_ctxt(struct ocfs2_write_ctxt *wc) +static void ocfs2_unlock_pages(struct ocfs2_write_ctxt *wc) { int i; @@ -915,7 +915,11 @@ static void ocfs2_free_write_ctxt(struct page_cache_release(wc-w_target_page); } ocfs2_unlock_and_free_pages(wc-w_pages, wc-w_num_pages); +} +static void ocfs2_free_write_ctxt(struct ocfs2_write_ctxt *wc) +{ + ocfs2_unlock_pages(wc); brelse(wc-w_di_bh); kfree(wc); } @@ -2041,11 +2045,19 @@ out_write_size: ocfs2_journal_dirty(handle, wc-w_di_bh); out: + /* unlock pages before dealloc since it needs acquiring j_trans_barrier +* lock, or it will cause a deadlock since journal commit threads holds +* this lock and will ask for the page lock when flushing the data. +* put it here to preserve the unlock order. +*/ + ocfs2_unlock_pages(wc); + ocfs2_commit_trans(osb, handle); ocfs2_run_deallocs(osb, wc-w_dealloc); - ocfs2_free_write_ctxt(wc); + brelse(wc-w_di_bh); + kfree(wc); return copied; } _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 13/15] ocfs2: do not fallback to buffer I/O write if appending
From: Weiwei Wang wangww...@huawei.com Subject: ocfs2: do not fallback to buffer I/O write if appending Now we can do direct io and do not fallback to buffered IO any more in case of append O_DIRECT write. Signed-off-by: Weiwei Wang wangww...@huawei.com Signed-off-by: Joseph Qi joseph...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/file.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff -puN fs/ocfs2/file.c~ocfs2-do-not-fallback-to-buffer-i-o-write-if-appending fs/ocfs2/file.c --- a/fs/ocfs2/file.c~ocfs2-do-not-fallback-to-buffer-i-o-write-if-appending +++ a/fs/ocfs2/file.c @@ -2116,6 +2116,9 @@ static int ocfs2_prepare_inode_for_write struct dentry *dentry = file-f_path.dentry; struct inode *inode = dentry-d_inode; loff_t saved_pos = 0, end; + struct ocfs2_super *osb = OCFS2_SB(inode-i_sb); + int full_coherency = !(osb-s_mount_opt + OCFS2_MOUNT_COHERENCY_BUFFERED); /* * We start with a read level meta lock and only jump to an ex @@ -2204,7 +2207,7 @@ static int ocfs2_prepare_inode_for_write * one node could wind up truncating another * nodes writes. */ - if (end i_size_read(inode)) { + if (end i_size_read(inode) !full_coherency) { *direct_io = 0; break; } _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 10/15] ocfs2: add orphan recovery types in ocfs2_recover_orphans
From: Joseph Qi joseph...@huawei.com Subject: ocfs2: add orphan recovery types in ocfs2_recover_orphans Define two orphan recovery types, which indicates if need truncate file or not. Originally, only deleted inode will be add to orphan dir. We use orphan dir to temporary store the file in append O_DIRECT write to ensure the block allocation and inode size updating in the same handle once the append O_DIRECT fails. So now there may be not truly deleted files in orphan dir. Signed-off-by: Weiwei Wang wangww...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/journal.c | 113 +++ fs/ocfs2/ocfs2.h | 15 + 2 files changed, 108 insertions(+), 20 deletions(-) diff -puN fs/ocfs2/journal.c~ocfs2-add-orphan-recovery-types-in-ocfs2_recover_orphans fs/ocfs2/journal.c --- a/fs/ocfs2/journal.c~ocfs2-add-orphan-recovery-types-in-ocfs2_recover_orphans +++ a/fs/ocfs2/journal.c @@ -50,6 +50,8 @@ #include sysfile.h #include uptodate.h #include quota.h +#include file.h +#include namei.h #include buffer_head_io.h #include ocfs2_trace.h @@ -69,13 +71,15 @@ static int ocfs2_journal_toggle_dirty(st static int ocfs2_trylock_journal(struct ocfs2_super *osb, int slot_num); static int ocfs2_recover_orphans(struct ocfs2_super *osb, -int slot); +int slot, +enum ocfs2_orphan_reco_type orphan_reco_type); static int ocfs2_commit_thread(void *arg); static void ocfs2_queue_recovery_completion(struct ocfs2_journal *journal, int slot_num, struct ocfs2_dinode *la_dinode, struct ocfs2_dinode *tl_dinode, - struct ocfs2_quota_recovery *qrec); + struct ocfs2_quota_recovery *qrec, + enum ocfs2_orphan_reco_type orphan_reco_type); static inline int ocfs2_wait_on_mount(struct ocfs2_super *osb) { @@ -149,7 +153,8 @@ int ocfs2_compute_replay_slots(struct oc return 0; } -void ocfs2_queue_replay_slots(struct ocfs2_super *osb) +void ocfs2_queue_replay_slots(struct ocfs2_super *osb, + enum ocfs2_orphan_reco_type orphan_reco_type) { struct ocfs2_replay_map *replay_map = osb-replay_map; int i; @@ -163,7 +168,8 @@ void ocfs2_queue_replay_slots(struct ocf for (i = 0; i replay_map-rm_slots; i++) if (replay_map-rm_replay_slots[i]) ocfs2_queue_recovery_completion(osb-journal, i, NULL, - NULL, NULL); + NULL, NULL, + orphan_reco_type); replay_map-rm_state = REPLAY_DONE; } @@ -1174,6 +1180,7 @@ struct ocfs2_la_recovery_item { struct ocfs2_dinode *lri_la_dinode; struct ocfs2_dinode *lri_tl_dinode; struct ocfs2_quota_recovery *lri_qrec; + enum ocfs2_orphan_reco_type lri_orphan_reco_type; }; /* Does the second half of the recovery process. By this point, the @@ -1195,6 +1202,7 @@ void ocfs2_complete_recovery(struct work struct ocfs2_dinode *la_dinode, *tl_dinode; struct ocfs2_la_recovery_item *item, *n; struct ocfs2_quota_recovery *qrec; + enum ocfs2_orphan_reco_type orphan_reco_type; LIST_HEAD(tmp_la_list); trace_ocfs2_complete_recovery( @@ -1212,6 +1220,7 @@ void ocfs2_complete_recovery(struct work la_dinode = item-lri_la_dinode; tl_dinode = item-lri_tl_dinode; qrec = item-lri_qrec; + orphan_reco_type = item-lri_orphan_reco_type; trace_ocfs2_complete_recovery_slot(item-lri_slot, la_dinode ? le64_to_cpu(la_dinode-i_blkno) : 0, @@ -1236,7 +1245,8 @@ void ocfs2_complete_recovery(struct work kfree(tl_dinode); } - ret = ocfs2_recover_orphans(osb, item-lri_slot); + ret = ocfs2_recover_orphans(osb, item-lri_slot, + orphan_reco_type); if (ret 0) mlog_errno(ret); @@ -1261,7 +1271,8 @@ static void ocfs2_queue_recovery_complet int slot_num, struct ocfs2_dinode *la_dinode, struct ocfs2_dinode *tl_dinode, - struct ocfs2_quota_recovery *qrec) + struct ocfs2_quota_recovery *qrec, + enum
[Ocfs2-devel] [patch 15/15] ocfs2: fix leftover orphan entry caused by append O_DIRECT write crash
From: Joseph Qi joseph...@huawei.com Subject: ocfs2: fix leftover orphan entry caused by append O_DIRECT write crash If one node has crashed with orphan entry leftover, another node which do append O_DIRECT write to the same file will override the i_dio_orphaned_slot. Then the old entry won't be cleaned forever. If this case happens, we let it wait for orphan recovery first. Signed-off-by: Joseph Qi joseph...@huawei.com Cc: Weiwei Wang wangww...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/inode.h |2 ++ fs/ocfs2/journal.c |2 ++ fs/ocfs2/namei.c | 37 +++-- fs/ocfs2/super.c |2 ++ 4 files changed, 41 insertions(+), 2 deletions(-) diff -puN fs/ocfs2/inode.h~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash fs/ocfs2/inode.h --- a/fs/ocfs2/inode.h~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash +++ a/fs/ocfs2/inode.h @@ -81,6 +81,8 @@ struct ocfs2_inode_info tid_t i_sync_tid; tid_t i_datasync_tid; + wait_queue_head_t append_dio_wq; + struct dquot *i_dquot[MAXQUOTAS]; }; diff -puN fs/ocfs2/journal.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash fs/ocfs2/journal.c --- a/fs/ocfs2/journal.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash +++ a/fs/ocfs2/journal.c @@ -2210,6 +2210,8 @@ static int ocfs2_recover_orphans(struct ret = ocfs2_del_inode_from_orphan(osb, inode, 0, 0); if (ret) mlog_errno(ret); + + wake_up(OCFS2_I(inode)-append_dio_wq); } /* else if ORPHAN_NO_NEED_TRUNCATE, do nothing */ next: diff -puN fs/ocfs2/namei.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash fs/ocfs2/namei.c --- a/fs/ocfs2/namei.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash +++ a/fs/ocfs2/namei.c @@ -2654,6 +2654,26 @@ leave: return status; } +static int ocfs2_dio_orphan_recovered(struct inode *inode) +{ + int ret; + struct buffer_head *di_bh = NULL; + struct ocfs2_dinode *di = NULL; + + ret = ocfs2_inode_lock(inode, di_bh, 1); + if (ret 0) { + mlog_errno(ret); + return 0; + } + + di = (struct ocfs2_dinode *) di_bh-b_data; + ret = !(di-i_flags cpu_to_le32(OCFS2_DIO_ORPHANED_FL)); + ocfs2_inode_unlock(inode, 1); + brelse(di_bh); + + return ret; +} + int ocfs2_add_inode_to_orphan(struct ocfs2_super *osb, struct inode *inode) { @@ -2666,12 +2686,26 @@ int ocfs2_add_inode_to_orphan(struct ocf struct ocfs2_dinode *di = NULL; bool orphaned = false; +restart: status = ocfs2_inode_lock(inode, di_bh, 1); if (status 0) { mlog_errno(status); goto bail; } + di = (struct ocfs2_dinode *) di_bh-b_data; + /* +* Another append dio crashed? +* If so, wait for recovery first. +*/ + if (unlikely(di-i_flags cpu_to_le32(OCFS2_DIO_ORPHANED_FL))) { + ocfs2_inode_unlock(inode, 1); + brelse(di_bh); + wait_event_interruptible(OCFS2_I(inode)-append_dio_wq, + ocfs2_dio_orphan_recovered(inode)); + goto restart; + } + status = ocfs2_dio_prepare_orphan_dir(osb, orphan_dir_inode, OCFS2_I(inode)-ip_blkno, orphan_name, @@ -2684,8 +2718,7 @@ int ocfs2_add_inode_to_orphan(struct ocf orphan dir %llu.\n, OCFS2_I(inode)-ip_blkno, OCFS2_I(orphan_dir_inode)-ip_blkno); - di = (struct ocfs2_dinode *) di_bh-b_data; - if (!(di-i_flags le32_to_cpu(OCFS2_ORPHANED_FL))) { + if (!(di-i_flags cpu_to_le32(OCFS2_ORPHANED_FL))) { mlog_errno(status); goto bail_unlock_orphan; } diff -puN fs/ocfs2/super.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash fs/ocfs2/super.c --- a/fs/ocfs2/super.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash +++ a/fs/ocfs2/super.c @@ -1768,6 +1768,8 @@ static void ocfs2_inode_init_once(void * ocfs2_lock_res_init_once(oi-ip_inode_lockres); ocfs2_lock_res_init_once(oi-ip_open_lockres); + init_waitqueue_head(oi-append_dio_wq); + ocfs2_metadata_cache_init(INODE_CACHE(oi-vfs_inode), ocfs2_inode_caching_ops); _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 02/15] ocfs2: free inode when i_count becomes zero
From: Xue jiufei xuejiu...@huawei.com Subject: ocfs2: free inode when i_count becomes zero Disk inode deletion may be heavily delayed when one node unlink a file after the same dentry is freed on another node(say N1) because of memory shrink but inode is left in memory. This inode can only be freed while N1 doing the orphan scan work. However, N1 may skip orphan scan for several times because other nodes may do the work earlier. In our tests, it may take 1 hour on 4 nodes cluster and this will cause bad user experience. So we think the inode should be freed when i_count becomes zero to avoid such circumstances. [a...@linux-foundation.org: coding-style fixes] Signed-off-by: joyce.xue xuejiu...@huawei.com Cc: Mark Fasheh mfas...@suse.com Cc: Joel Becker jl...@evilplan.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/inode.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff -puN fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero fs/ocfs2/inode.c --- a/fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero +++ a/fs/ocfs2/inode.c @@ -1191,17 +1191,9 @@ void ocfs2_evict_inode(struct inode *ino int ocfs2_drop_inode(struct inode *inode) { struct ocfs2_inode_info *oi = OCFS2_I(inode); - int res; - trace_ocfs2_drop_inode((unsigned long long)oi-ip_blkno, inode-i_nlink, oi-ip_flags); - - if (oi-ip_flags OCFS2_INODE_MAYBE_ORPHANED) - res = 1; - else - res = generic_drop_inode(inode); - - return res; + return 1; } /* _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 08/15] ocfs2: eliminate the static flag of some functions
From: Weiwei Wang wangww...@huawei.com Subject: ocfs2: eliminate the static flag of some functions Currently in case of append O_DIRECT write (block not allocated yet), ocfs2 will fall back to buffered I/O. This has some disadvantages. Firstly, it is not the behavior as expected. Secondly, it will consume huge page cache, e.g. in mass backup scenario. Thirdly, modern filesystems such as ext4 support this feature. In this patch set, the direct I/O write doesn't fallback to buffer I/O write any more because the allocate blocks are enabled in direct I/O now. This patch (of 7): Eliminate the static flag of some functions which will be used in append O_DIRECT write. Signed-off-by: Weiwei Wang wangww...@huawei.com Signed-off-by: Joseph Qi joseph...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/file.c | 11 +-- fs/ocfs2/file.h |9 + 2 files changed, 18 insertions(+), 2 deletions(-) diff -puN fs/ocfs2/file.c~ocfs2-eliminate-the-static-flag-of-some-functions fs/ocfs2/file.c --- a/fs/ocfs2/file.c~ocfs2-eliminate-the-static-flag-of-some-functions +++ a/fs/ocfs2/file.c @@ -295,7 +295,7 @@ out: return ret; } -static int ocfs2_set_inode_size(handle_t *handle, +int ocfs2_set_inode_size(handle_t *handle, struct inode *inode, struct buffer_head *fe_bh, u64 new_i_size) @@ -441,7 +441,7 @@ out: return status; } -static int ocfs2_truncate_file(struct inode *inode, +int ocfs2_truncate_file(struct inode *inode, struct buffer_head *di_bh, u64 new_i_size) { @@ -709,6 +709,13 @@ leave: return status; } +int ocfs2_extend_allocation(struct inode *inode, u32 logical_start, + u32 clusters_to_add, int mark_unwritten) +{ + return __ocfs2_extend_allocation(inode, logical_start, + clusters_to_add, mark_unwritten); +} + /* * While a write will already be ordering the data, a truncate will not. * Thus, we need to explicitly order the zeroed pages. diff -puN fs/ocfs2/file.h~ocfs2-eliminate-the-static-flag-of-some-functions fs/ocfs2/file.h --- a/fs/ocfs2/file.h~ocfs2-eliminate-the-static-flag-of-some-functions +++ a/fs/ocfs2/file.h @@ -51,13 +51,22 @@ int ocfs2_add_inode_data(struct ocfs2_su struct ocfs2_alloc_context *data_ac, struct ocfs2_alloc_context *meta_ac, enum ocfs2_alloc_restarted *reason_ret); +int ocfs2_set_inode_size(handle_t *handle, + struct inode *inode, + struct buffer_head *fe_bh, + u64 new_i_size); int ocfs2_simple_size_update(struct inode *inode, struct buffer_head *di_bh, u64 new_i_size); +int ocfs2_truncate_file(struct inode *inode, + struct buffer_head *di_bh, + u64 new_i_size); int ocfs2_extend_no_holes(struct inode *inode, struct buffer_head *di_bh, u64 new_i_size, u64 zero_to); int ocfs2_zero_extend(struct inode *inode, struct buffer_head *di_bh, loff_t zero_to); +int ocfs2_extend_allocation(struct inode *inode, u32 logical_start, + u32 clusters_to_add, int mark_unwritten); int ocfs2_setattr(struct dentry *dentry, struct iattr *attr); int ocfs2_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat); _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 05/15] ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker
From: Joseph Qi joseph...@huawei.com Subject: ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker ac4fef4d23ed (ocfs2/dlm: do not purge lockres that is queued for assert master) may have the following possible race case: dlm_dispatch_assert_master dlm_wq queue_work(dlm-quedlm_worker, dlm-dispatched_work); dispatch work, dlm_lockres_drop_inflight_worker *BUG_ON(res-inflight_assert_workers == 0)* dlm_lockres_grab_inflight_worker inflight_assert_workers++ So ensure inflight_assert_workers to be increased first. Signed-off-by: Joseph Qi joseph...@huawei.com Signed-off-by: Xue jiufei xuejiu...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/dlm/dlmmaster.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) diff -puN fs/ocfs2/dlm/dlmmaster.c~ocfs2-dlm-fix-race-between-dispatched_work-and-dlm_lockres_grab_inflight_worker fs/ocfs2/dlm/dlmmaster.c --- a/fs/ocfs2/dlm/dlmmaster.c~ocfs2-dlm-fix-race-between-dispatched_work-and-dlm_lockres_grab_inflight_worker +++ a/fs/ocfs2/dlm/dlmmaster.c @@ -685,14 +685,6 @@ void __dlm_lockres_grab_inflight_worker( res-inflight_assert_workers); } -static void dlm_lockres_grab_inflight_worker(struct dlm_ctxt *dlm, - struct dlm_lock_resource *res) -{ - spin_lock(res-spinlock); - __dlm_lockres_grab_inflight_worker(dlm, res); - spin_unlock(res-spinlock); -} - static void __dlm_lockres_drop_inflight_worker(struct dlm_ctxt *dlm, struct dlm_lock_resource *res) { @@ -1636,6 +1628,7 @@ send_response: } mlog(0, %u is the owner of %.*s, cleaning everyone else\n, dlm-node_num, res-lockname.len, res-lockname.name); + spin_lock(res-spinlock); ret = dlm_dispatch_assert_master(dlm, res, 0, request-node_idx, DLM_ASSERT_MASTER_MLE_CLEANUP); if (ret 0) { @@ -1643,7 +1636,8 @@ send_response: response = DLM_MASTER_RESP_ERROR; dlm_lockres_put(res); } else - dlm_lockres_grab_inflight_worker(dlm, res); + __dlm_lockres_grab_inflight_worker(dlm, res); + spin_unlock(res-spinlock); } else { if (res) dlm_lockres_put(res); _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 03/15] ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()
From: yangwenfang vicky.yangwenf...@huawei.com Subject: ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock() After we call ocfs2_journal_access_di() in ocfs2_write_begin(), jbd2_journal_restart() may also be called, in this function transaction A's t_updates-- and obtains a new transaction B. If jbd2_journal_commit_transaction() is happened to commit transaction A, when t_updates==0, it will continue to complete commit and unfile buffer. So when jbd2_journal_dirty_metadata(), the handle is pointed a new transaction B, and the buffer head's journal head is already freed, jh-b_transaction == NULL, jh-b_next_transaction == NULL, it returns EINVAL, So it triggers the BUG_ON(status). thread 1: jbd2: ocfs2_write_begin jbd2_journal_commit_transaction ocfs2_write_begin_nolock ocfs2_start_trans jbd2__journal_start(t_updates+1, transaction A) ocfs2_journal_access_di ocfs2_write_cluster_by_desc ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ocfs2_extend_rotate_transaction jbd2_journal_restart (t_updates-1,transaction B) t_updates==0 __jbd2_journal_refile_buffer ocfs2_write_end ocfs2_write_end_nolock ocfs2_journal_dirty jbd2_journal_dirty_metadata(bug) ocfs2_commit_trans In ext4, I found that: jbd2_journal_get_write_access() called by ext4_write_end. ext4_write_begin ext4_journal_start __ext4_journal_start_sb ext4_journal_check_start jbd2__journal_start ext4_write_end ext4_mark_inode_dirty ext4_reserve_inode_write ext4_journal_get_write_access jbd2_journal_get_write_access ext4_mark_iloc_dirty ext4_do_update_inode ext4_handle_dirty_metadata jbd2_journal_dirty_metadata So I think we should put ocfs2_journal_access_di before ocfs2_journal_dirty in the ocfs2_write_end. and it works well after my modification. Signed-off-by: vicky vicky.yangwenf...@huawei.com Cc: Mark Fasheh mfas...@suse.com Cc: Joel Becker jl...@evilplan.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/aops.c | 21 ++--- 1 file changed, 10 insertions(+), 11 deletions(-) diff -puN fs/ocfs2/aops.c~ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock fs/ocfs2/aops.c --- a/fs/ocfs2/aops.c~ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock +++ a/fs/ocfs2/aops.c @@ -1818,16 +1818,6 @@ try_again: if (ret) goto out_commit; } - /* -* We don't want this to fail in ocfs2_write_end(), so do it -* here. -*/ - ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), wc-w_di_bh, - OCFS2_JOURNAL_ACCESS_WRITE); - if (ret) { - mlog_errno(ret); - goto out_quota; - } /* * Fill our page array first. That way we've grabbed enough so @@ -1978,7 +1968,7 @@ int ocfs2_write_end_nolock(struct addres loff_t pos, unsigned len, unsigned copied, struct page *page, void *fsdata) { - int i; + int i, ret; unsigned from, to, start = pos (PAGE_CACHE_SIZE - 1); struct inode *inode = mapping-host; struct ocfs2_super *osb = OCFS2_SB(inode-i_sb); @@ -2028,6 +2018,14 @@ int ocfs2_write_end_nolock(struct addres } } + ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), wc-w_di_bh, + OCFS2_JOURNAL_ACCESS_WRITE); + if (ret) { + copied = ret; + mlog_errno(ret); + goto out; + } + out_write_size: pos += copied; if (pos i_size_read(inode)) { @@ -2042,6 +2040,7 @@ out_write_size: ocfs2_update_inode_fsync_trans(handle, inode, 1); ocfs2_journal_dirty(handle, wc-w_di_bh); +out: ocfs2_commit_trans(osb, handle); ocfs2_run_deallocs(osb, wc-w_dealloc); _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 11/15] ocfs2: implement ocfs2_direct_IO_write
From: Weiwei Wang wangww...@huawei.com Subject: ocfs2: implement ocfs2_direct_IO_write Implement ocfs2_direct_IO_write. Add the inode to orphan dir first, and then delete it once append O_DIRECT finished. This is to make sure block allocation and inode size are consistent. [joseph...@huawei.com: fix brelse warning if ocfs2_direct_IO_get_blocks failed] Signed-off-by: Weiwei Wang wangww...@huawei.com Signed-off-by: Joseph Qi joseph...@huawei.com Signed-off-by: Joseph Qi joseph...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Cc: alex chen alex.c...@huawei.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/aops.c | 192 +- 1 file changed, 189 insertions(+), 3 deletions(-) diff -puN fs/ocfs2/aops.c~ocfs2-implement-ocfs2_direct_io_write fs/ocfs2/aops.c --- a/fs/ocfs2/aops.c~ocfs2-implement-ocfs2_direct_io_write +++ a/fs/ocfs2/aops.c @@ -28,6 +28,7 @@ #include linux/pipe_fs_i.h #include linux/mpage.h #include linux/quotaops.h +#include linux/blkdev.h #include cluster/masklog.h @@ -47,6 +48,9 @@ #include ocfs2_trace.h #include buffer_head_io.h +#include dir.h +#include namei.h +#include sysfile.h static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) @@ -597,6 +601,180 @@ static int ocfs2_releasepage(struct page return try_to_free_buffers(page); } +static int ocfs2_is_overwrite(struct ocfs2_super *osb, + struct inode *inode , loff_t offset) +{ + int ret = 0; + u32 v_cpos = 0; + u32 p_cpos = 0; + unsigned int num_clusters = 0; + unsigned int ext_flags = 0; + + v_cpos = ocfs2_bytes_to_clusters(osb-sb, offset); + ret = ocfs2_get_clusters(inode, v_cpos, p_cpos, + num_clusters, ext_flags); + if (ret 0) { + mlog_errno(ret); + return ret; + } + + if (p_cpos !(ext_flags OCFS2_EXT_UNWRITTEN)) + return 1; + + return 0; +} + +static ssize_t ocfs2_direct_IO_write(struct kiocb *iocb, + struct iov_iter *iter, + loff_t offset) +{ + ssize_t ret = 0; + ssize_t written = 0; + bool orphaned = false; + int is_overwrite = 0; + struct file *file = iocb-ki_filp; + struct inode *inode = file_inode(file)-i_mapping-host; + struct ocfs2_super *osb = OCFS2_SB(inode-i_sb); + struct buffer_head *di_bh = NULL; + size_t count = iter-count; + journal_t *journal = osb-journal-j_journal; + u32 zero_len; + int cluster_align; + loff_t final_size = offset + count; + int append_write = offset = i_size_read(inode) ? 1 : 0; + unsigned int num_clusters = 0; + unsigned int ext_flags = 0; + + { + u64 o = offset; + + zero_len = do_div(o, 1 osb-s_clustersize_bits); + cluster_align = !!zero_len; + } + + /* +* when final_size inode-i_size, inode-i_size will be +* updated after direct write, so add the inode to orphan +* dir first. +*/ + if (final_size i_size_read(inode)) { + ret = ocfs2_add_inode_to_orphan(osb, inode); + if (ret 0) + goto out; + orphaned = true; + } + + if (append_write) { + ret = ocfs2_inode_lock(inode, di_bh, 1); + if (ret 0) { + mlog_errno(ret); + goto clean_orphan; + } + + if (ocfs2_sparse_alloc(OCFS2_SB(inode-i_sb))) + ret = ocfs2_zero_extend(inode, di_bh, offset); + else + ret = ocfs2_extend_no_holes(inode, di_bh, offset, offset); + if (ret 0) { + mlog_errno(ret); + ocfs2_inode_unlock(inode, 1); + brelse(di_bh); + goto clean_orphan; + } + + is_overwrite = ocfs2_is_overwrite(osb, inode, offset); + if (is_overwrite 0) { + mlog_errno(is_overwrite); + ocfs2_inode_unlock(inode, 1); + brelse(di_bh); + goto clean_orphan; + } + + ocfs2_inode_unlock(inode, 1); + brelse(di_bh); + di_bh = NULL; + } + + written = __blockdev_direct_IO(WRITE, iocb, inode, inode-i_sb-s_bdev, + iter, offset, + ocfs2_direct_IO_get_blocks, + ocfs2_dio_end_io, NULL, 0); + if (unlikely(written 0)) { + loff_t i_size = i_size_read(inode); + + if (offset + count i_size) { + ret = ocfs2_inode_lock(inode, di_bh, 1); +
[Ocfs2-devel] [patch 12/15] ocfs2: allocate blocks in ocfs2_direct_IO_get_blocks
From: Weiwei Wang wangww...@huawei.com Subject: ocfs2: allocate blocks in ocfs2_direct_IO_get_blocks Allow blocks allocation in ocfs2_direct_IO_get_blocks. Signed-off-by: Weiwei Wang wangww...@huawei.com Signed-off-by: Joseph Qi joseph...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/aops.c | 45 ++--- 1 file changed, 42 insertions(+), 3 deletions(-) diff -puN fs/ocfs2/aops.c~ocfs2-allocate-blocks-in-ocfs2_direct_io_get_blocks fs/ocfs2/aops.c --- a/fs/ocfs2/aops.c~ocfs2-allocate-blocks-in-ocfs2_direct_io_get_blocks +++ a/fs/ocfs2/aops.c @@ -510,18 +510,21 @@ bail: * * called like this: dio-get_blocks(dio-inode, fs_startblk, * fs_count, map_bh, dio-rw == WRITE); - * - * Note that we never bother to allocate blocks here, and thus ignore the - * create argument. */ static int ocfs2_direct_IO_get_blocks(struct inode *inode, sector_t iblock, struct buffer_head *bh_result, int create) { int ret; + u32 cpos = 0; + int alloc_locked = 0; u64 p_blkno, inode_blocks, contig_blocks; unsigned int ext_flags; unsigned char blocksize_bits = inode-i_sb-s_blocksize_bits; unsigned long max_blocks = bh_result-b_size inode-i_blkbits; + unsigned long len = bh_result-b_size; + unsigned int clusters_to_alloc = 0; + + cpos = ocfs2_blocks_to_clusters(inode-i_sb, iblock); /* This function won't even be called if the request isn't all * nicely aligned and of the right size, so there's no need @@ -543,6 +546,40 @@ static int ocfs2_direct_IO_get_blocks(st /* We should already CoW the refcounted extent in case of create. */ BUG_ON(create (ext_flags OCFS2_EXT_REFCOUNTED)); + /* allocate blocks if no p_blkno is found, and create == 1 */ + if (!p_blkno create) { + ret = ocfs2_inode_lock(inode, NULL, 1); + if (ret 0) { + mlog_errno(ret); + goto bail; + } + + alloc_locked = 1; + + /* fill hole, allocate blocks can't be larger than the size +* of the hole */ + clusters_to_alloc = ocfs2_clusters_for_bytes(inode-i_sb, len); + if (clusters_to_alloc contig_blocks) + clusters_to_alloc = contig_blocks; + + /* allocate extent and insert them into the extent tree */ + ret = ocfs2_extend_allocation(inode, cpos, + clusters_to_alloc, 0); + if (ret 0) { + mlog_errno(ret); + goto bail; + } + + ret = ocfs2_extent_map_get_blocks(inode, iblock, p_blkno, + contig_blocks, ext_flags); + if (ret 0) { + mlog(ML_ERROR, get_blocks() failed iblock=%llu\n, + (unsigned long long)iblock); + ret = -EIO; + goto bail; + } + } + /* * get_more_blocks() expects us to describe a hole by clearing * the mapped bit on bh_result(). @@ -560,6 +597,8 @@ static int ocfs2_direct_IO_get_blocks(st contig_blocks = max_blocks; bh_result-b_size = contig_blocks blocksize_bits; bail: + if (alloc_locked) + ocfs2_inode_unlock(inode, 1); return ret; } _ ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch 14/15] ocfs2: do not fallback to buffer I/O write if fill holes
From: Weiwei Wang wangww...@huawei.com Subject: ocfs2: do not fallback to buffer I/O write if fill holes Now append O_DIRECT write to a hole will try direct io first, then fallback to buffered IO if fails. Signed-off-by: Weiwei Wang wangww...@huawei.com Signed-off-by: Joseph Qi joseph...@huawei.com Cc: Joel Becker jl...@evilplan.org Cc: Mark Fasheh mfas...@suse.com Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/file.c | 92 -- 1 file changed, 41 insertions(+), 51 deletions(-) diff -puN fs/ocfs2/file.c~ocfs2-do-not-fallback-to-buffer-i-o-write-if-fill-holes fs/ocfs2/file.c --- a/fs/ocfs2/file.c~ocfs2-do-not-fallback-to-buffer-i-o-write-if-fill-holes +++ a/fs/ocfs2/file.c @@ -1359,44 +1359,6 @@ out: return ret; } -/* - * Will look for holes and unwritten extents in the range starting at - * pos for count bytes (inclusive). - */ -static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos, - size_t count) -{ - int ret = 0; - unsigned int extent_flags; - u32 cpos, clusters, extent_len, phys_cpos; - struct super_block *sb = inode-i_sb; - - cpos = pos OCFS2_SB(sb)-s_clustersize_bits; - clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos; - - while (clusters) { - ret = ocfs2_get_clusters(inode, cpos, phys_cpos, extent_len, -extent_flags); - if (ret 0) { - mlog_errno(ret); - goto out; - } - - if (phys_cpos == 0 || (extent_flags OCFS2_EXT_UNWRITTEN)) { - ret = 1; - break; - } - - if (extent_len clusters) - extent_len = clusters; - - clusters -= extent_len; - cpos += extent_len; - } -out: - return ret; -} - static int ocfs2_write_remove_suid(struct inode *inode) { int ret; @@ -2212,18 +2174,6 @@ static int ocfs2_prepare_inode_for_write break; } - /* -* We don't fill holes during direct io, so -* check for them here. If any are found, the -* caller will have to retake some cluster -* locks and initiate the io as buffered. -*/ - ret = ocfs2_check_range_for_holes(inode, saved_pos, count); - if (ret == 1) { - *direct_io = 0; - ret = 0; - } else if (ret 0) - mlog_errno(ret); break; } @@ -2253,6 +2203,7 @@ static ssize_t ocfs2_file_write_iter(str u32 old_clusters; struct file *file = iocb-ki_filp; struct inode *inode = file_inode(file); + struct address_space *mapping = file-f_mapping; struct ocfs2_super *osb = OCFS2_SB(inode-i_sb); int full_coherency = !(osb-s_mount_opt OCFS2_MOUNT_COHERENCY_BUFFERED); @@ -2367,11 +2318,50 @@ relock: iov_iter_truncate(from, count); if (direct_io) { + loff_t endbyte; + ssize_t written_buffered; written = generic_file_direct_write(iocb, from, *ppos); - if (written 0) { + if (written 0 || written == count) { ret = written; goto out_dio; } + + /* +* direct-io write to a hole: fall through to buffered I/O +* for completing the rest of the request. +*/ + count -= written; + written_buffered = generic_perform_write(file, from, *ppos); + /* +* If generic_file_buffered_write() returned a synchronous error +* then we want to return the number of bytes which were +* direct-written, or the error code if that was zero. Note +* that this differs from normal direct-io semantics, which +* will return -EFOO even if some bytes were written. +*/ + if (written_buffered 0) { + ret = written_buffered; + goto out; + } + + /* We need to ensure that the page cache pages are written to +* disk and invalidated to preserve the expected O_DIRECT +* semantics. +*/ + endbyte = *ppos + written_buffered - written - 1; + ret = filemap_write_and_wait_range(file-f_mapping, *ppos, + endbyte); + if (ret == 0) { + written = written_buffered; + invalidate_mapping_pages(mapping, + *ppos
Re: [Ocfs2-devel] [patch 02/15] ocfs2: free inode when i_count becomes zero
Hi, Andrew, This patch may lead to data loss so please remove it from mm tree please. Here is the situation: When i_count becomes zero but there still exists dirty pages in i_mapping, the dirty pages would be freed without flushing the data. To avoid this problem, we should flush dirty page before dropping the inode, but I don't think it it a good idea to flush page in function ocfs2_drop_inode(). So now there is no better way to solve this problem. Thanks, Xuejiufei On 2014/12/16 6:50, a...@linux-foundation.org wrote: From: Xue jiufei xuejiu...@huawei.com Subject: ocfs2: free inode when i_count becomes zero Disk inode deletion may be heavily delayed when one node unlink a file after the same dentry is freed on another node(say N1) because of memory shrink but inode is left in memory. This inode can only be freed while N1 doing the orphan scan work. However, N1 may skip orphan scan for several times because other nodes may do the work earlier. In our tests, it may take 1 hour on 4 nodes cluster and this will cause bad user experience. So we think the inode should be freed when i_count becomes zero to avoid such circumstances. [a...@linux-foundation.org: coding-style fixes] Signed-off-by: joyce.xue xuejiu...@huawei.com Cc: Mark Fasheh mfas...@suse.com Cc: Joel Becker jl...@evilplan.org Signed-off-by: Andrew Morton a...@linux-foundation.org --- fs/ocfs2/inode.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff -puN fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero fs/ocfs2/inode.c --- a/fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero +++ a/fs/ocfs2/inode.c @@ -1191,17 +1191,9 @@ void ocfs2_evict_inode(struct inode *ino int ocfs2_drop_inode(struct inode *inode) { struct ocfs2_inode_info *oi = OCFS2_I(inode); - int res; - trace_ocfs2_drop_inode((unsigned long long)oi-ip_blkno, inode-i_nlink, oi-ip_flags); - - if (oi-ip_flags OCFS2_INODE_MAYBE_ORPHANED) - res = 1; - else - res = generic_drop_inode(inode); - - return res; + return 1; } /* _ . ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] ocfs2-tools repository and documentation
Hi Goldwyn, Germano I have updated ocfs2-tools git repo with latest fixes :) Germano, following documentation is available for OCFS2 http://docs.oracle.com/cd/E37670_01/E37355/html/ol_ocfs2.html http://www.oracle.com/us/technologies/linux/025995.htm https://oss.oracle.com/projects/ocfs2/dist/documentation/v1.8/ocfs2-1_8_2-manpages.pdf https://oss.oracle.com/projects/ocfs2/documentation/ On 12/15/2014 10:47 AM, Germano Percossi wrote: Hi Goldwyn, Thanks for your answer. I will have a look at your code. I just want to be sure that the source code I am looking at is the latest one to avoid fixing things that maybe have been already fixed elsewhere. Same is true for documentation and package repositories. If I could have links to the latest, no matter how old and not maintained, documentation and repository, it would be helpful, as well. Do you know of any other distro, besides Suse, maintaining a stack of patches? Cheers, Germano On 14/12/14 07:24, Goldwyn Rodrigues wrote: Yes, if you don't count the last 2 patches, it is more close to 3 years now ;) Since the patches were not being updated. I started maintaining an alternate repository where I am putting all the bugs reported at SUSE: https://github.com/goldwynr/ocfs2-tools Branch suse-fixes has the fixes found by SUSE over the upstream branch. Branch nocontrold has the patches for the feature of doing away with ocfs2_controld to work with the latest corosync/pacemaker stack. Patches for the kernel are already in the kernel but the ones in the tools need some review. I had a mail conversation with Srini and he has promised to update the upstream branch soon. Regards, ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel