Re: [Ocfs2-devel] ocfs2-tools repository and documentation

2014-12-15 Thread Germano Percossi
Hi Goldwyn,

Thanks for your answer. I will have a look at your code.
I just want to be sure that the source code I am looking at
is the latest one to avoid fixing things that maybe have been already
fixed elsewhere.

Same is true for documentation and package repositories.
If I could have links to the latest, no matter how old and not
maintained, documentation and repository, it would be helpful,
as well.

Do you know of any other distro, besides Suse, maintaining a stack
of patches?

Cheers,
Germano

On 14/12/14 07:24, Goldwyn Rodrigues wrote:

 Yes, if you don't count the last 2 patches, it is more close to 3 years
 now ;)

 Since the patches were not being updated. I started maintaining an
 alternate repository where I am putting all the bugs reported at SUSE:

 https://github.com/goldwynr/ocfs2-tools

 Branch suse-fixes has the fixes found by SUSE over the upstream branch.

 Branch nocontrold has the patches for the feature of doing away with
 ocfs2_controld to work with the latest corosync/pacemaker stack. Patches
 for the kernel are already in the kernel but the ones in the tools need
 some review.

 I had a mail conversation with Srini and he has promised to update the
 upstream branch soon.

 Regards,



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] ocfs patches for review

2014-12-15 Thread Andrew Morton

Another round of jolly patch reviewing, please.  A number of these
patches have been stalled for quite a long time.

I have the following notes:

o2dlm-fix-null-pointer-dereference-in-o2dlm_blocking_ast_wrapper.patch:
   - Joseph Qi had issues

ocfs2-free-inode-when-i_count-becomes-zero.patch:
   - Mark is finding a better way of doing this

ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock.patch:
   - Mark requested changes



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 06/15] ocfs2: reflink: fix slow unlink for refcounted file

2014-12-15 Thread akpm
From: Junxiao Bi junxiao...@oracle.com
Subject: ocfs2: reflink: fix slow unlink for refcounted file

When running ocfs2 test suite multiple nodes reflink stress test, for a 4
nodes cluster, every unlink() for refcounted file needs about 700s.

The slow unlink is caused by the contention of refcount tree lock since
all nodes are unlink files using the same refcount tree.  When the
unlinking file have many extents(over 1600 in our test), most of the
extents has refcounted flag set.  In ocfs2_commit_truncate(), it will
execute the following call trace for every extents.  This means it needs
get and released refcount tree lock about 1600 times.  And when several
nodes are do this at the same time, the performance will be very low.

ocfs2_remove_btree_range()
ocfs2_lock_refcount_tree()
--ocfs2_refcount_lock()
__ocfs2_cluster_lock()

ocfs2_refcount_lock() is costly, move it to ocfs2_commit_truncate() to do
lock/unlock once can improve a lot performance.

Signed-off-by: Junxiao Bi junxiao...@oracle.com
Cc: Wengang wen.gang.w...@oracle.com
Cc: Mark Fasheh mfas...@suse.com
Cc: Joel Becker jl...@evilplan.org
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/alloc.c |   28 +---
 fs/ocfs2/alloc.h |2 +-
 fs/ocfs2/dir.c   |2 +-
 fs/ocfs2/file.c  |2 +-
 4 files changed, 24 insertions(+), 10 deletions(-)

diff -puN fs/ocfs2/alloc.c~ocfs2-reflink-fix-slow-unlink-for-refcounted-file 
fs/ocfs2/alloc.c
--- a/fs/ocfs2/alloc.c~ocfs2-reflink-fix-slow-unlink-for-refcounted-file
+++ a/fs/ocfs2/alloc.c
@@ -5662,7 +5662,7 @@ int ocfs2_remove_btree_range(struct inod
 struct ocfs2_extent_tree *et,
 u32 cpos, u32 phys_cpos, u32 len, int flags,
 struct ocfs2_cached_dealloc_ctxt *dealloc,
-u64 refcount_loc)
+u64 refcount_loc, bool refcount_tree_locked)
 {
int ret, credits = 0, extra_blocks = 0;
u64 phys_blkno = ocfs2_clusters_to_blocks(inode-i_sb, phys_cpos);
@@ -5676,11 +5676,13 @@ int ocfs2_remove_btree_range(struct inod
BUG_ON(!(OCFS2_I(inode)-ip_dyn_features 
 OCFS2_HAS_REFCOUNT_FL));
 
-   ret = ocfs2_lock_refcount_tree(osb, refcount_loc, 1,
-  ref_tree, NULL);
-   if (ret) {
-   mlog_errno(ret);
-   goto bail;
+   if (!refcount_tree_locked) {
+   ret = ocfs2_lock_refcount_tree(osb, refcount_loc, 1,
+  ref_tree, NULL);
+   if (ret) {
+   mlog_errno(ret);
+   goto bail;
+   }
}
 
ret = ocfs2_prepare_refcount_change_for_del(inode,
@@ -7021,6 +7023,7 @@ int ocfs2_commit_truncate(struct ocfs2_s
u64 refcount_loc = le64_to_cpu(di-i_refcount_loc);
struct ocfs2_extent_tree et;
struct ocfs2_cached_dealloc_ctxt dealloc;
+   struct ocfs2_refcount_tree *ref_tree = NULL;
 
ocfs2_init_dinode_extent_tree(et, INODE_CACHE(inode), di_bh);
ocfs2_init_dealloc_ctxt(dealloc);
@@ -7130,9 +7133,18 @@ start:
 
phys_cpos = ocfs2_blocks_to_clusters(inode-i_sb, blkno);
 
+   if ((flags  OCFS2_EXT_REFCOUNTED)  trunc_len  !ref_tree) {
+   status = ocfs2_lock_refcount_tree(osb, refcount_loc, 1,
+   ref_tree, NULL);
+   if (status) {
+   mlog_errno(status);
+   goto bail;
+   }
+   }
+
status = ocfs2_remove_btree_range(inode, et, trunc_cpos,
  phys_cpos, trunc_len, flags, dealloc,
- refcount_loc);
+ refcount_loc, true);
if (status  0) {
mlog_errno(status);
goto bail;
@@ -7147,6 +7159,8 @@ start:
goto start;
 
 bail:
+   if (ref_tree)
+   ocfs2_unlock_refcount_tree(osb, ref_tree, 1);
 
ocfs2_schedule_truncate_log_flush(osb, 1);
 
diff -puN fs/ocfs2/alloc.h~ocfs2-reflink-fix-slow-unlink-for-refcounted-file 
fs/ocfs2/alloc.h
--- a/fs/ocfs2/alloc.h~ocfs2-reflink-fix-slow-unlink-for-refcounted-file
+++ a/fs/ocfs2/alloc.h
@@ -142,7 +142,7 @@ int ocfs2_remove_btree_range(struct inod
 struct ocfs2_extent_tree *et,
 u32 cpos, u32 phys_cpos, u32 len, int flags,
 struct ocfs2_cached_dealloc_ctxt *dealloc,
-u64 refcount_loc);
+u64 refcount_loc, bool refcount_tree_locked);
 
 int ocfs2_num_free_extents(struct ocfs2_super *osb,
   struct ocfs2_extent_tree *et);
diff -puN 

[Ocfs2-devel] [patch 01/15] o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper

2014-12-15 Thread akpm
From: Srinivas Eeda srinivas.e...@oracle.com
Subject: o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper

A tiny race between BAST and unlock message causes the NULL dereference.

A node sends an unlock request to master and receives a response.  Before
processing the response it receives a BAST from the master.  Since both
requests are processed by different threads it creates a race.  While the
BAST is being processed, lock can get freed by unlock code.

This patch makes bast to return immediately if lock is found but unlock is
pending.  The code should handle this race.  We also have to fix master
node to skip sending BAST after receiving unlock message.

Below is the crash stack

BUG: unable to handle kernel NULL pointer dereference at 0048
IP: [a015e023] o2dlm_blocking_ast_wrapper+0xd/0x16
[a034e3db] dlm_do_local_bast+0x8e/0x97 [ocfs2_dlm]
[a034f366] dlm_proxy_ast_handler+0x838/0x87e [ocfs2_dlm]
[a0308abe] o2net_process_message+0x395/0x5b8 [ocfs2_nodemanager]
[a030aac8] o2net_rx_until_empty+0x762/0x90d [ocfs2_nodemanager]
[81071802] worker_thread+0x14d/0x1ed

[a...@linux-foundation.org: coding-style fixes]
Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
Cc: Mark Fasheh mfas...@suse.com
Cc: Joel Becker jl...@evilplan.org
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/dlm/dlmast.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff -puN 
fs/ocfs2/dlm/dlmast.c~o2dlm-fix-null-pointer-dereference-in-o2dlm_blocking_ast_wrapper
 fs/ocfs2/dlm/dlmast.c
--- 
a/fs/ocfs2/dlm/dlmast.c~o2dlm-fix-null-pointer-dereference-in-o2dlm_blocking_ast_wrapper
+++ a/fs/ocfs2/dlm/dlmast.c
@@ -385,8 +385,12 @@ int dlm_proxy_ast_handler(struct o2net_m
head = res-granted;
 
list_for_each_entry(lock, head, list) {
-   if (lock-ml.cookie == cookie)
+   /* if lock is found but unlock is pending ignore the bast */
+   if (lock-ml.cookie == cookie) {
+   if (lock-unlock_pending)
+   break;
goto do_ast;
+   }
}
 
mlog(0, Got %sast for unknown lock! cookie=%u:%llu, name=%.*s, 
_

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 07/15] ocfs2: fix journal commit deadlock

2014-12-15 Thread akpm
From: Junxiao Bi junxiao...@oracle.com
Subject: ocfs2: fix journal commit deadlock

For buffer write, page lock will be got in write_begin and released in
write_end, in ocfs2_write_end_nolock(), before it unlock the page in
ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask for
the read lock of journal-j_trans_barrier.  Holding page lock and ask for
journal-j_trans_barrier breaks the locking order.

This will cause a deadlock with journal commit threads, ocfs2cmt will get
write lock of journal-j_trans_barrier first, then it wakes up kjournald2
to do the commit work, at last it waits until done.  To commit journal,
kjournald2 needs flushing data first, it needs get the cache page lock.

Since some ocfs2 cluster locks are holding by write process, this
deadlock may hung the whole cluster.

unlock pages before ocfs2_run_deallocs() can fix the locking order, also
put unlock before ocfs2_commit_trans() to make page lock is unlocked
before j_trans_barrier to preserve unlocking order.

Signed-off-by: Junxiao Bi junxiao...@oracle.com
Reviewed-by: Wengang Wang wen.gang.w...@oracle.com
Cc: sta...@vger.kernel.org
Cc: Mark Fasheh mfas...@suse.com
Cc: Joel Becker jl...@evilplan.org
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/aops.c |   16 ++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff -puN fs/ocfs2/aops.c~ocfs2-fix-journal-commit-deadlock fs/ocfs2/aops.c
--- a/fs/ocfs2/aops.c~ocfs2-fix-journal-commit-deadlock
+++ a/fs/ocfs2/aops.c
@@ -894,7 +894,7 @@ void ocfs2_unlock_and_free_pages(struct
}
 }
 
-static void ocfs2_free_write_ctxt(struct ocfs2_write_ctxt *wc)
+static void ocfs2_unlock_pages(struct ocfs2_write_ctxt *wc)
 {
int i;
 
@@ -915,7 +915,11 @@ static void ocfs2_free_write_ctxt(struct
page_cache_release(wc-w_target_page);
}
ocfs2_unlock_and_free_pages(wc-w_pages, wc-w_num_pages);
+}
 
+static void ocfs2_free_write_ctxt(struct ocfs2_write_ctxt *wc)
+{
+   ocfs2_unlock_pages(wc);
brelse(wc-w_di_bh);
kfree(wc);
 }
@@ -2041,11 +2045,19 @@ out_write_size:
ocfs2_journal_dirty(handle, wc-w_di_bh);
 
 out:
+   /* unlock pages before dealloc since it needs acquiring j_trans_barrier
+* lock, or it will cause a deadlock since journal commit threads holds
+* this lock and will ask for the page lock when flushing the data.
+* put it here to preserve the unlock order.
+*/
+   ocfs2_unlock_pages(wc);
+
ocfs2_commit_trans(osb, handle);
 
ocfs2_run_deallocs(osb, wc-w_dealloc);
 
-   ocfs2_free_write_ctxt(wc);
+   brelse(wc-w_di_bh);
+   kfree(wc);
 
return copied;
 }
_

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 13/15] ocfs2: do not fallback to buffer I/O write if appending

2014-12-15 Thread akpm
From: Weiwei Wang wangww...@huawei.com
Subject: ocfs2: do not fallback to buffer I/O write if appending

Now we can do direct io and do not fallback to buffered IO any more in
case of append O_DIRECT write.

Signed-off-by: Weiwei Wang wangww...@huawei.com
Signed-off-by: Joseph Qi joseph...@huawei.com
Cc: Joel Becker jl...@evilplan.org
Cc: Mark Fasheh mfas...@suse.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/file.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff -puN 
fs/ocfs2/file.c~ocfs2-do-not-fallback-to-buffer-i-o-write-if-appending 
fs/ocfs2/file.c
--- a/fs/ocfs2/file.c~ocfs2-do-not-fallback-to-buffer-i-o-write-if-appending
+++ a/fs/ocfs2/file.c
@@ -2116,6 +2116,9 @@ static int ocfs2_prepare_inode_for_write
struct dentry *dentry = file-f_path.dentry;
struct inode *inode = dentry-d_inode;
loff_t saved_pos = 0, end;
+   struct ocfs2_super *osb = OCFS2_SB(inode-i_sb);
+   int full_coherency = !(osb-s_mount_opt 
+   OCFS2_MOUNT_COHERENCY_BUFFERED);
 
/*
 * We start with a read level meta lock and only jump to an ex
@@ -2204,7 +2207,7 @@ static int ocfs2_prepare_inode_for_write
 * one node could wind up truncating another
 * nodes writes.
 */
-   if (end  i_size_read(inode)) {
+   if (end  i_size_read(inode)  !full_coherency) {
*direct_io = 0;
break;
}
_

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 10/15] ocfs2: add orphan recovery types in ocfs2_recover_orphans

2014-12-15 Thread akpm
From: Joseph Qi joseph...@huawei.com
Subject: ocfs2: add orphan recovery types in ocfs2_recover_orphans

Define two orphan recovery types, which indicates if need truncate file or
not.

Originally, only deleted inode will be add to orphan dir.  We use orphan
dir to temporary store the file in append O_DIRECT write to ensure the
block allocation and inode size updating in the same handle once the
append O_DIRECT fails.  So now there may be not truly deleted files in
orphan dir.

Signed-off-by: Weiwei Wang wangww...@huawei.com
Cc: Joel Becker jl...@evilplan.org
Cc: Mark Fasheh mfas...@suse.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/journal.c |  113 +++
 fs/ocfs2/ocfs2.h   |   15 +
 2 files changed, 108 insertions(+), 20 deletions(-)

diff -puN 
fs/ocfs2/journal.c~ocfs2-add-orphan-recovery-types-in-ocfs2_recover_orphans 
fs/ocfs2/journal.c
--- 
a/fs/ocfs2/journal.c~ocfs2-add-orphan-recovery-types-in-ocfs2_recover_orphans
+++ a/fs/ocfs2/journal.c
@@ -50,6 +50,8 @@
 #include sysfile.h
 #include uptodate.h
 #include quota.h
+#include file.h
+#include namei.h
 
 #include buffer_head_io.h
 #include ocfs2_trace.h
@@ -69,13 +71,15 @@ static int ocfs2_journal_toggle_dirty(st
 static int ocfs2_trylock_journal(struct ocfs2_super *osb,
 int slot_num);
 static int ocfs2_recover_orphans(struct ocfs2_super *osb,
-int slot);
+int slot,
+enum ocfs2_orphan_reco_type orphan_reco_type);
 static int ocfs2_commit_thread(void *arg);
 static void ocfs2_queue_recovery_completion(struct ocfs2_journal *journal,
int slot_num,
struct ocfs2_dinode *la_dinode,
struct ocfs2_dinode *tl_dinode,
-   struct ocfs2_quota_recovery *qrec);
+   struct ocfs2_quota_recovery *qrec,
+   enum ocfs2_orphan_reco_type 
orphan_reco_type);
 
 static inline int ocfs2_wait_on_mount(struct ocfs2_super *osb)
 {
@@ -149,7 +153,8 @@ int ocfs2_compute_replay_slots(struct oc
return 0;
 }
 
-void ocfs2_queue_replay_slots(struct ocfs2_super *osb)
+void ocfs2_queue_replay_slots(struct ocfs2_super *osb,
+   enum ocfs2_orphan_reco_type orphan_reco_type)
 {
struct ocfs2_replay_map *replay_map = osb-replay_map;
int i;
@@ -163,7 +168,8 @@ void ocfs2_queue_replay_slots(struct ocf
for (i = 0; i  replay_map-rm_slots; i++)
if (replay_map-rm_replay_slots[i])
ocfs2_queue_recovery_completion(osb-journal, i, NULL,
-   NULL, NULL);
+   NULL, NULL,
+   orphan_reco_type);
replay_map-rm_state = REPLAY_DONE;
 }
 
@@ -1174,6 +1180,7 @@ struct ocfs2_la_recovery_item {
struct ocfs2_dinode *lri_la_dinode;
struct ocfs2_dinode *lri_tl_dinode;
struct ocfs2_quota_recovery *lri_qrec;
+   enum ocfs2_orphan_reco_type  lri_orphan_reco_type;
 };
 
 /* Does the second half of the recovery process. By this point, the
@@ -1195,6 +1202,7 @@ void ocfs2_complete_recovery(struct work
struct ocfs2_dinode *la_dinode, *tl_dinode;
struct ocfs2_la_recovery_item *item, *n;
struct ocfs2_quota_recovery *qrec;
+   enum ocfs2_orphan_reco_type orphan_reco_type;
LIST_HEAD(tmp_la_list);
 
trace_ocfs2_complete_recovery(
@@ -1212,6 +1220,7 @@ void ocfs2_complete_recovery(struct work
la_dinode = item-lri_la_dinode;
tl_dinode = item-lri_tl_dinode;
qrec = item-lri_qrec;
+   orphan_reco_type = item-lri_orphan_reco_type;
 
trace_ocfs2_complete_recovery_slot(item-lri_slot,
la_dinode ? le64_to_cpu(la_dinode-i_blkno) : 0,
@@ -1236,7 +1245,8 @@ void ocfs2_complete_recovery(struct work
kfree(tl_dinode);
}
 
-   ret = ocfs2_recover_orphans(osb, item-lri_slot);
+   ret = ocfs2_recover_orphans(osb, item-lri_slot,
+   orphan_reco_type);
if (ret  0)
mlog_errno(ret);
 
@@ -1261,7 +1271,8 @@ static void ocfs2_queue_recovery_complet
int slot_num,
struct ocfs2_dinode *la_dinode,
struct ocfs2_dinode *tl_dinode,
-   struct ocfs2_quota_recovery *qrec)
+   struct ocfs2_quota_recovery *qrec,
+   enum 

[Ocfs2-devel] [patch 15/15] ocfs2: fix leftover orphan entry caused by append O_DIRECT write crash

2014-12-15 Thread akpm
From: Joseph Qi joseph...@huawei.com
Subject: ocfs2: fix leftover orphan entry caused by append O_DIRECT write crash

If one node has crashed with orphan entry leftover, another node which do
append O_DIRECT write to the same file will override the
i_dio_orphaned_slot.  Then the old entry won't be cleaned forever.  If
this case happens, we let it wait for orphan recovery first.

Signed-off-by: Joseph Qi joseph...@huawei.com
Cc: Weiwei Wang wangww...@huawei.com
Cc: Joel Becker jl...@evilplan.org
Cc: Mark Fasheh mfas...@suse.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/inode.h   |2 ++
 fs/ocfs2/journal.c |2 ++
 fs/ocfs2/namei.c   |   37 +++--
 fs/ocfs2/super.c   |2 ++
 4 files changed, 41 insertions(+), 2 deletions(-)

diff -puN 
fs/ocfs2/inode.h~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash
 fs/ocfs2/inode.h
--- 
a/fs/ocfs2/inode.h~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash
+++ a/fs/ocfs2/inode.h
@@ -81,6 +81,8 @@ struct ocfs2_inode_info
tid_t i_sync_tid;
tid_t i_datasync_tid;
 
+   wait_queue_head_t append_dio_wq;
+
struct dquot *i_dquot[MAXQUOTAS];
 };
 
diff -puN 
fs/ocfs2/journal.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash
 fs/ocfs2/journal.c
--- 
a/fs/ocfs2/journal.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash
+++ a/fs/ocfs2/journal.c
@@ -2210,6 +2210,8 @@ static int ocfs2_recover_orphans(struct
ret = ocfs2_del_inode_from_orphan(osb, inode, 0, 0);
if (ret)
mlog_errno(ret);
+
+   wake_up(OCFS2_I(inode)-append_dio_wq);
} /* else if ORPHAN_NO_NEED_TRUNCATE, do nothing */
 
 next:
diff -puN 
fs/ocfs2/namei.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash
 fs/ocfs2/namei.c
--- 
a/fs/ocfs2/namei.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash
+++ a/fs/ocfs2/namei.c
@@ -2654,6 +2654,26 @@ leave:
return status;
 }
 
+static int ocfs2_dio_orphan_recovered(struct inode *inode)
+{
+   int ret;
+   struct buffer_head *di_bh = NULL;
+   struct ocfs2_dinode *di = NULL;
+
+   ret = ocfs2_inode_lock(inode, di_bh, 1);
+   if (ret  0) {
+   mlog_errno(ret);
+   return 0;
+   }
+
+   di = (struct ocfs2_dinode *) di_bh-b_data;
+   ret = !(di-i_flags  cpu_to_le32(OCFS2_DIO_ORPHANED_FL));
+   ocfs2_inode_unlock(inode, 1);
+   brelse(di_bh);
+
+   return ret;
+}
+
 int ocfs2_add_inode_to_orphan(struct ocfs2_super *osb,
struct inode *inode)
 {
@@ -2666,12 +2686,26 @@ int ocfs2_add_inode_to_orphan(struct ocf
struct ocfs2_dinode *di = NULL;
bool orphaned = false;
 
+restart:
status = ocfs2_inode_lock(inode, di_bh, 1);
if (status  0) {
mlog_errno(status);
goto bail;
}
 
+   di = (struct ocfs2_dinode *) di_bh-b_data;
+   /*
+* Another append dio crashed?
+* If so, wait for recovery first.
+*/
+   if (unlikely(di-i_flags  cpu_to_le32(OCFS2_DIO_ORPHANED_FL))) {
+   ocfs2_inode_unlock(inode, 1);
+   brelse(di_bh);
+   wait_event_interruptible(OCFS2_I(inode)-append_dio_wq,
+   ocfs2_dio_orphan_recovered(inode));
+   goto restart;
+   }
+
status = ocfs2_dio_prepare_orphan_dir(osb, orphan_dir_inode,
OCFS2_I(inode)-ip_blkno,
orphan_name,
@@ -2684,8 +2718,7 @@ int ocfs2_add_inode_to_orphan(struct ocf
orphan dir %llu.\n,
OCFS2_I(inode)-ip_blkno,
OCFS2_I(orphan_dir_inode)-ip_blkno);
-   di = (struct ocfs2_dinode *) di_bh-b_data;
-   if (!(di-i_flags  le32_to_cpu(OCFS2_ORPHANED_FL))) {
+   if (!(di-i_flags  cpu_to_le32(OCFS2_ORPHANED_FL))) {
mlog_errno(status);
goto bail_unlock_orphan;
}
diff -puN 
fs/ocfs2/super.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash
 fs/ocfs2/super.c
--- 
a/fs/ocfs2/super.c~ocfs2-fix-leftover-orphan-entry-caused-by-append-o_direct-write-crash
+++ a/fs/ocfs2/super.c
@@ -1768,6 +1768,8 @@ static void ocfs2_inode_init_once(void *
ocfs2_lock_res_init_once(oi-ip_inode_lockres);
ocfs2_lock_res_init_once(oi-ip_open_lockres);
 
+   init_waitqueue_head(oi-append_dio_wq);
+
ocfs2_metadata_cache_init(INODE_CACHE(oi-vfs_inode),
  ocfs2_inode_caching_ops);
 
_

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 02/15] ocfs2: free inode when i_count becomes zero

2014-12-15 Thread akpm
From: Xue jiufei xuejiu...@huawei.com
Subject: ocfs2: free inode when i_count becomes zero

Disk inode deletion may be heavily delayed when one node unlink a file
after the same dentry is freed on another node(say N1) because of memory
shrink but inode is left in memory.  This inode can only be freed while N1
doing the orphan scan work.

However, N1 may skip orphan scan for several times because other nodes may
do the work earlier.  In our tests, it may take 1 hour on 4 nodes cluster
and this will cause bad user experience.  So we think the inode should be
freed when i_count becomes zero to avoid such circumstances.

[a...@linux-foundation.org: coding-style fixes]
Signed-off-by: joyce.xue xuejiu...@huawei.com
Cc: Mark Fasheh mfas...@suse.com
Cc: Joel Becker jl...@evilplan.org
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/inode.c |   10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff -puN fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero 
fs/ocfs2/inode.c
--- a/fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero
+++ a/fs/ocfs2/inode.c
@@ -1191,17 +1191,9 @@ void ocfs2_evict_inode(struct inode *ino
 int ocfs2_drop_inode(struct inode *inode)
 {
struct ocfs2_inode_info *oi = OCFS2_I(inode);
-   int res;
-
trace_ocfs2_drop_inode((unsigned long long)oi-ip_blkno,
inode-i_nlink, oi-ip_flags);
-
-   if (oi-ip_flags  OCFS2_INODE_MAYBE_ORPHANED)
-   res = 1;
-   else
-   res = generic_drop_inode(inode);
-
-   return res;
+   return 1;
 }
 
 /*
_

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 08/15] ocfs2: eliminate the static flag of some functions

2014-12-15 Thread akpm
From: Weiwei Wang wangww...@huawei.com
Subject: ocfs2: eliminate the static flag of some functions

Currently in case of append O_DIRECT write (block not allocated yet),
ocfs2 will fall back to buffered I/O.  This has some disadvantages. 
Firstly, it is not the behavior as expected.

Secondly, it will consume huge page cache, e.g.  in mass backup scenario. 
Thirdly, modern filesystems such as ext4 support this feature.

In this patch set, the direct I/O write doesn't fallback to buffer I/O
write any more because the allocate blocks are enabled in direct I/O now.


This patch (of 7):

Eliminate the static flag of some functions which will be used in append
O_DIRECT write.

Signed-off-by: Weiwei Wang wangww...@huawei.com
Signed-off-by: Joseph Qi joseph...@huawei.com
Cc: Joel Becker jl...@evilplan.org
Cc: Mark Fasheh mfas...@suse.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/file.c |   11 +--
 fs/ocfs2/file.h |9 +
 2 files changed, 18 insertions(+), 2 deletions(-)

diff -puN fs/ocfs2/file.c~ocfs2-eliminate-the-static-flag-of-some-functions 
fs/ocfs2/file.c
--- a/fs/ocfs2/file.c~ocfs2-eliminate-the-static-flag-of-some-functions
+++ a/fs/ocfs2/file.c
@@ -295,7 +295,7 @@ out:
return ret;
 }
 
-static int ocfs2_set_inode_size(handle_t *handle,
+int ocfs2_set_inode_size(handle_t *handle,
struct inode *inode,
struct buffer_head *fe_bh,
u64 new_i_size)
@@ -441,7 +441,7 @@ out:
return status;
 }
 
-static int ocfs2_truncate_file(struct inode *inode,
+int ocfs2_truncate_file(struct inode *inode,
   struct buffer_head *di_bh,
   u64 new_i_size)
 {
@@ -709,6 +709,13 @@ leave:
return status;
 }
 
+int ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
+   u32 clusters_to_add, int mark_unwritten)
+{
+   return __ocfs2_extend_allocation(inode, logical_start,
+   clusters_to_add, mark_unwritten);
+}
+
 /*
  * While a write will already be ordering the data, a truncate will not.
  * Thus, we need to explicitly order the zeroed pages.
diff -puN fs/ocfs2/file.h~ocfs2-eliminate-the-static-flag-of-some-functions 
fs/ocfs2/file.h
--- a/fs/ocfs2/file.h~ocfs2-eliminate-the-static-flag-of-some-functions
+++ a/fs/ocfs2/file.h
@@ -51,13 +51,22 @@ int ocfs2_add_inode_data(struct ocfs2_su
 struct ocfs2_alloc_context *data_ac,
 struct ocfs2_alloc_context *meta_ac,
 enum ocfs2_alloc_restarted *reason_ret);
+int ocfs2_set_inode_size(handle_t *handle,
+   struct inode *inode,
+   struct buffer_head *fe_bh,
+   u64 new_i_size);
 int ocfs2_simple_size_update(struct inode *inode,
 struct buffer_head *di_bh,
 u64 new_i_size);
+int ocfs2_truncate_file(struct inode *inode,
+   struct buffer_head *di_bh,
+   u64 new_i_size);
 int ocfs2_extend_no_holes(struct inode *inode, struct buffer_head *di_bh,
  u64 new_i_size, u64 zero_to);
 int ocfs2_zero_extend(struct inode *inode, struct buffer_head *di_bh,
  loff_t zero_to);
+int ocfs2_extend_allocation(struct inode *inode, u32 logical_start,
+   u32 clusters_to_add, int mark_unwritten);
 int ocfs2_setattr(struct dentry *dentry, struct iattr *attr);
 int ocfs2_getattr(struct vfsmount *mnt, struct dentry *dentry,
  struct kstat *stat);
_

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 05/15] ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker

2014-12-15 Thread akpm
From: Joseph Qi joseph...@huawei.com
Subject: ocfs2/dlm: fix race between dispatched_work and 
dlm_lockres_grab_inflight_worker

ac4fef4d23ed (ocfs2/dlm: do not purge lockres that is queued for assert
master) may have the following possible race case:

dlm_dispatch_assert_master   dlm_wq

queue_work(dlm-quedlm_worker,
dlm-dispatched_work);
 dispatch work,
 dlm_lockres_drop_inflight_worker
 *BUG_ON(res-inflight_assert_workers == 0)*
dlm_lockres_grab_inflight_worker
inflight_assert_workers++

So ensure inflight_assert_workers to be increased first.

Signed-off-by: Joseph Qi joseph...@huawei.com
Signed-off-by: Xue jiufei xuejiu...@huawei.com
Cc: Joel Becker jl...@evilplan.org
Cc: Mark Fasheh mfas...@suse.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/dlm/dlmmaster.c |   12 +++-
 1 file changed, 3 insertions(+), 9 deletions(-)

diff -puN 
fs/ocfs2/dlm/dlmmaster.c~ocfs2-dlm-fix-race-between-dispatched_work-and-dlm_lockres_grab_inflight_worker
 fs/ocfs2/dlm/dlmmaster.c
--- 
a/fs/ocfs2/dlm/dlmmaster.c~ocfs2-dlm-fix-race-between-dispatched_work-and-dlm_lockres_grab_inflight_worker
+++ a/fs/ocfs2/dlm/dlmmaster.c
@@ -685,14 +685,6 @@ void __dlm_lockres_grab_inflight_worker(
res-inflight_assert_workers);
 }
 
-static void dlm_lockres_grab_inflight_worker(struct dlm_ctxt *dlm,
-   struct dlm_lock_resource *res)
-{
-   spin_lock(res-spinlock);
-   __dlm_lockres_grab_inflight_worker(dlm, res);
-   spin_unlock(res-spinlock);
-}
-
 static void __dlm_lockres_drop_inflight_worker(struct dlm_ctxt *dlm,
struct dlm_lock_resource *res)
 {
@@ -1636,6 +1628,7 @@ send_response:
}
mlog(0, %u is the owner of %.*s, cleaning everyone else\n,
 dlm-node_num, res-lockname.len, 
res-lockname.name);
+   spin_lock(res-spinlock);
ret = dlm_dispatch_assert_master(dlm, res, 0, request-node_idx,
 DLM_ASSERT_MASTER_MLE_CLEANUP);
if (ret  0) {
@@ -1643,7 +1636,8 @@ send_response:
response = DLM_MASTER_RESP_ERROR;
dlm_lockres_put(res);
} else
-   dlm_lockres_grab_inflight_worker(dlm, res);
+   __dlm_lockres_grab_inflight_worker(dlm, res);
+   spin_unlock(res-spinlock);
} else {
if (res)
dlm_lockres_put(res);
_

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 03/15] ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in ocfs2_write_end_nolock()

2014-12-15 Thread akpm
From: yangwenfang vicky.yangwenf...@huawei.com
Subject: ocfs2: call ocfs2_journal_access_di() before ocfs2_journal_dirty() in 
ocfs2_write_end_nolock()

After we call ocfs2_journal_access_di() in ocfs2_write_begin(),
jbd2_journal_restart() may also be called, in this function transaction
A's t_updates-- and obtains a new transaction B.  If
jbd2_journal_commit_transaction() is happened to commit transaction A,
when t_updates==0, it will continue to complete commit and unfile buffer.

So when jbd2_journal_dirty_metadata(), the handle is pointed a new
transaction B, and the buffer head's journal head is already freed,
jh-b_transaction == NULL, jh-b_next_transaction == NULL, it returns
EINVAL, So it triggers the BUG_ON(status).

thread 1: jbd2:
ocfs2_write_begin jbd2_journal_commit_transaction
ocfs2_write_begin_nolock
  ocfs2_start_trans
jbd2__journal_start(t_updates+1,
   transaction A)
ocfs2_journal_access_di
ocfs2_write_cluster_by_desc
  ocfs2_mark_extent_written
ocfs2_change_extent_flag
  ocfs2_split_extent
ocfs2_extend_rotate_transaction
  jbd2_journal_restart
  (t_updates-1,transaction B) t_updates==0
__jbd2_journal_refile_buffer

ocfs2_write_end
ocfs2_write_end_nolock
ocfs2_journal_dirty
jbd2_journal_dirty_metadata(bug)
   ocfs2_commit_trans

In ext4, I found that: jbd2_journal_get_write_access() called by

ext4_write_end.
ext4_write_begin
ext4_journal_start
__ext4_journal_start_sb
ext4_journal_check_start
jbd2__journal_start

ext4_write_end
ext4_mark_inode_dirty
ext4_reserve_inode_write
ext4_journal_get_write_access
jbd2_journal_get_write_access
ext4_mark_iloc_dirty
ext4_do_update_inode
ext4_handle_dirty_metadata
jbd2_journal_dirty_metadata


So I think we should put ocfs2_journal_access_di before
  ocfs2_journal_dirty in the ocfs2_write_end.  and it works well after my
  modification.

Signed-off-by: vicky vicky.yangwenf...@huawei.com
Cc: Mark Fasheh mfas...@suse.com
Cc: Joel Becker jl...@evilplan.org
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/aops.c |   21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff -puN 
fs/ocfs2/aops.c~ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock
 fs/ocfs2/aops.c
--- 
a/fs/ocfs2/aops.c~ocfs2-call-ocfs2_journal_access_di-before-ocfs2_journal_dirty-in-ocfs2_write_end_nolock
+++ a/fs/ocfs2/aops.c
@@ -1818,16 +1818,6 @@ try_again:
if (ret)
goto out_commit;
}
-   /*
-* We don't want this to fail in ocfs2_write_end(), so do it
-* here.
-*/
-   ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), wc-w_di_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
-   if (ret) {
-   mlog_errno(ret);
-   goto out_quota;
-   }
 
/*
 * Fill our page array first. That way we've grabbed enough so
@@ -1978,7 +1968,7 @@ int ocfs2_write_end_nolock(struct addres
   loff_t pos, unsigned len, unsigned copied,
   struct page *page, void *fsdata)
 {
-   int i;
+   int i, ret;
unsigned from, to, start = pos  (PAGE_CACHE_SIZE - 1);
struct inode *inode = mapping-host;
struct ocfs2_super *osb = OCFS2_SB(inode-i_sb);
@@ -2028,6 +2018,14 @@ int ocfs2_write_end_nolock(struct addres
}
}
 
+   ret = ocfs2_journal_access_di(handle, INODE_CACHE(inode), wc-w_di_bh,
+ OCFS2_JOURNAL_ACCESS_WRITE);
+   if (ret) {
+   copied = ret;
+   mlog_errno(ret);
+   goto out;
+   }
+
 out_write_size:
pos += copied;
if (pos  i_size_read(inode)) {
@@ -2042,6 +2040,7 @@ out_write_size:
ocfs2_update_inode_fsync_trans(handle, inode, 1);
ocfs2_journal_dirty(handle, wc-w_di_bh);
 
+out:
ocfs2_commit_trans(osb, handle);
 
ocfs2_run_deallocs(osb, wc-w_dealloc);
_

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 11/15] ocfs2: implement ocfs2_direct_IO_write

2014-12-15 Thread akpm
From: Weiwei Wang wangww...@huawei.com
Subject: ocfs2: implement ocfs2_direct_IO_write

Implement ocfs2_direct_IO_write.  Add the inode to orphan dir first, and
then delete it once append O_DIRECT finished.

This is to make sure block allocation and inode size are consistent.

[joseph...@huawei.com: fix brelse warning if ocfs2_direct_IO_get_blocks failed]
Signed-off-by: Weiwei Wang wangww...@huawei.com
Signed-off-by: Joseph Qi joseph...@huawei.com
Signed-off-by: Joseph Qi joseph...@huawei.com
Cc: Joel Becker jl...@evilplan.org
Cc: Mark Fasheh mfas...@suse.com
Cc: alex chen alex.c...@huawei.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/aops.c |  192 +-
 1 file changed, 189 insertions(+), 3 deletions(-)

diff -puN fs/ocfs2/aops.c~ocfs2-implement-ocfs2_direct_io_write fs/ocfs2/aops.c
--- a/fs/ocfs2/aops.c~ocfs2-implement-ocfs2_direct_io_write
+++ a/fs/ocfs2/aops.c
@@ -28,6 +28,7 @@
 #include linux/pipe_fs_i.h
 #include linux/mpage.h
 #include linux/quotaops.h
+#include linux/blkdev.h
 
 #include cluster/masklog.h
 
@@ -47,6 +48,9 @@
 #include ocfs2_trace.h
 
 #include buffer_head_io.h
+#include dir.h
+#include namei.h
+#include sysfile.h
 
 static int ocfs2_symlink_get_block(struct inode *inode, sector_t iblock,
   struct buffer_head *bh_result, int create)
@@ -597,6 +601,180 @@ static int ocfs2_releasepage(struct page
return try_to_free_buffers(page);
 }
 
+static int ocfs2_is_overwrite(struct ocfs2_super *osb,
+   struct inode *inode , loff_t offset)
+{
+   int ret = 0;
+   u32 v_cpos = 0;
+   u32 p_cpos = 0;
+   unsigned int num_clusters = 0;
+   unsigned int ext_flags = 0;
+
+   v_cpos = ocfs2_bytes_to_clusters(osb-sb, offset);
+   ret = ocfs2_get_clusters(inode, v_cpos, p_cpos,
+   num_clusters, ext_flags);
+   if (ret  0) {
+   mlog_errno(ret);
+   return ret;
+   }
+
+   if (p_cpos  !(ext_flags  OCFS2_EXT_UNWRITTEN))
+   return 1;
+
+   return 0;
+}
+
+static ssize_t ocfs2_direct_IO_write(struct kiocb *iocb,
+   struct iov_iter *iter,
+   loff_t offset)
+{
+   ssize_t ret = 0;
+   ssize_t written = 0;
+   bool orphaned = false;
+   int is_overwrite = 0;
+   struct file *file = iocb-ki_filp;
+   struct inode *inode = file_inode(file)-i_mapping-host;
+   struct ocfs2_super *osb = OCFS2_SB(inode-i_sb);
+   struct buffer_head *di_bh = NULL;
+   size_t count = iter-count;
+   journal_t *journal = osb-journal-j_journal;
+   u32 zero_len;
+   int cluster_align;
+   loff_t final_size = offset + count;
+   int append_write = offset = i_size_read(inode) ? 1 : 0;
+   unsigned int num_clusters = 0;
+   unsigned int ext_flags = 0;
+
+   {
+   u64 o = offset;
+
+   zero_len = do_div(o, 1  osb-s_clustersize_bits);
+   cluster_align = !!zero_len;
+   }
+
+   /*
+* when final_size  inode-i_size, inode-i_size will be
+* updated after direct write, so add the inode to orphan
+* dir first.
+*/
+   if (final_size  i_size_read(inode)) {
+   ret = ocfs2_add_inode_to_orphan(osb, inode);
+   if (ret  0)
+   goto out;
+   orphaned = true;
+   }
+
+   if (append_write) {
+   ret = ocfs2_inode_lock(inode, di_bh, 1);
+   if (ret  0) {
+   mlog_errno(ret);
+   goto clean_orphan;
+   }
+
+   if (ocfs2_sparse_alloc(OCFS2_SB(inode-i_sb)))
+   ret = ocfs2_zero_extend(inode, di_bh, offset);
+   else
+   ret = ocfs2_extend_no_holes(inode, di_bh, offset, 
offset);
+   if (ret  0) {
+   mlog_errno(ret);
+   ocfs2_inode_unlock(inode, 1);
+   brelse(di_bh);
+   goto clean_orphan;
+   }
+
+   is_overwrite = ocfs2_is_overwrite(osb, inode, offset);
+   if (is_overwrite  0) {
+   mlog_errno(is_overwrite);
+   ocfs2_inode_unlock(inode, 1);
+   brelse(di_bh);
+   goto clean_orphan;
+   }
+
+   ocfs2_inode_unlock(inode, 1);
+   brelse(di_bh);
+   di_bh = NULL;
+   }
+
+   written = __blockdev_direct_IO(WRITE, iocb, inode, inode-i_sb-s_bdev,
+   iter, offset,
+   ocfs2_direct_IO_get_blocks,
+   ocfs2_dio_end_io, NULL, 0);
+   if (unlikely(written  0)) {
+   loff_t i_size = i_size_read(inode);
+
+   if (offset + count  i_size) {
+   ret = ocfs2_inode_lock(inode, di_bh, 1);
+   

[Ocfs2-devel] [patch 12/15] ocfs2: allocate blocks in ocfs2_direct_IO_get_blocks

2014-12-15 Thread akpm
From: Weiwei Wang wangww...@huawei.com
Subject: ocfs2: allocate blocks in ocfs2_direct_IO_get_blocks

Allow blocks allocation in ocfs2_direct_IO_get_blocks.

Signed-off-by: Weiwei Wang wangww...@huawei.com
Signed-off-by: Joseph Qi joseph...@huawei.com
Cc: Joel Becker jl...@evilplan.org
Cc: Mark Fasheh mfas...@suse.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/aops.c |   45 ++---
 1 file changed, 42 insertions(+), 3 deletions(-)

diff -puN fs/ocfs2/aops.c~ocfs2-allocate-blocks-in-ocfs2_direct_io_get_blocks 
fs/ocfs2/aops.c
--- a/fs/ocfs2/aops.c~ocfs2-allocate-blocks-in-ocfs2_direct_io_get_blocks
+++ a/fs/ocfs2/aops.c
@@ -510,18 +510,21 @@ bail:
  *
  * called like this: dio-get_blocks(dio-inode, fs_startblk,
  * fs_count, map_bh, dio-rw == WRITE);
- *
- * Note that we never bother to allocate blocks here, and thus ignore the
- * create argument.
  */
 static int ocfs2_direct_IO_get_blocks(struct inode *inode, sector_t iblock,
 struct buffer_head *bh_result, int create)
 {
int ret;
+   u32 cpos = 0;
+   int alloc_locked = 0;
u64 p_blkno, inode_blocks, contig_blocks;
unsigned int ext_flags;
unsigned char blocksize_bits = inode-i_sb-s_blocksize_bits;
unsigned long max_blocks = bh_result-b_size  inode-i_blkbits;
+   unsigned long len = bh_result-b_size;
+   unsigned int clusters_to_alloc = 0;
+
+   cpos = ocfs2_blocks_to_clusters(inode-i_sb, iblock);
 
/* This function won't even be called if the request isn't all
 * nicely aligned and of the right size, so there's no need
@@ -543,6 +546,40 @@ static int ocfs2_direct_IO_get_blocks(st
/* We should already CoW the refcounted extent in case of create. */
BUG_ON(create  (ext_flags  OCFS2_EXT_REFCOUNTED));
 
+   /* allocate blocks if no p_blkno is found, and create == 1 */
+   if (!p_blkno  create) {
+   ret = ocfs2_inode_lock(inode, NULL, 1);
+   if (ret  0) {
+   mlog_errno(ret);
+   goto bail;
+   }
+
+   alloc_locked = 1;
+
+   /* fill hole, allocate blocks can't be larger than the size
+* of the hole */
+   clusters_to_alloc = ocfs2_clusters_for_bytes(inode-i_sb, len);
+   if (clusters_to_alloc  contig_blocks)
+   clusters_to_alloc = contig_blocks;
+
+   /* allocate extent and insert them into the extent tree */
+   ret = ocfs2_extend_allocation(inode, cpos,
+   clusters_to_alloc, 0);
+   if (ret  0) {
+   mlog_errno(ret);
+   goto bail;
+   }
+
+   ret = ocfs2_extent_map_get_blocks(inode, iblock, p_blkno,
+   contig_blocks, ext_flags);
+   if (ret  0) {
+   mlog(ML_ERROR, get_blocks() failed iblock=%llu\n,
+   (unsigned long long)iblock);
+   ret = -EIO;
+   goto bail;
+   }
+   }
+
/*
 * get_more_blocks() expects us to describe a hole by clearing
 * the mapped bit on bh_result().
@@ -560,6 +597,8 @@ static int ocfs2_direct_IO_get_blocks(st
contig_blocks = max_blocks;
bh_result-b_size = contig_blocks  blocksize_bits;
 bail:
+   if (alloc_locked)
+   ocfs2_inode_unlock(inode, 1);
return ret;
 }
 
_

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch 14/15] ocfs2: do not fallback to buffer I/O write if fill holes

2014-12-15 Thread akpm
From: Weiwei Wang wangww...@huawei.com
Subject: ocfs2: do not fallback to buffer I/O write if fill holes

Now append O_DIRECT write to a hole will try direct io first, then
fallback to buffered IO if fails.

Signed-off-by: Weiwei Wang wangww...@huawei.com
Signed-off-by: Joseph Qi joseph...@huawei.com
Cc: Joel Becker jl...@evilplan.org
Cc: Mark Fasheh mfas...@suse.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
---

 fs/ocfs2/file.c |   92 --
 1 file changed, 41 insertions(+), 51 deletions(-)

diff -puN 
fs/ocfs2/file.c~ocfs2-do-not-fallback-to-buffer-i-o-write-if-fill-holes 
fs/ocfs2/file.c
--- a/fs/ocfs2/file.c~ocfs2-do-not-fallback-to-buffer-i-o-write-if-fill-holes
+++ a/fs/ocfs2/file.c
@@ -1359,44 +1359,6 @@ out:
return ret;
 }
 
-/*
- * Will look for holes and unwritten extents in the range starting at
- * pos for count bytes (inclusive).
- */
-static int ocfs2_check_range_for_holes(struct inode *inode, loff_t pos,
-  size_t count)
-{
-   int ret = 0;
-   unsigned int extent_flags;
-   u32 cpos, clusters, extent_len, phys_cpos;
-   struct super_block *sb = inode-i_sb;
-
-   cpos = pos  OCFS2_SB(sb)-s_clustersize_bits;
-   clusters = ocfs2_clusters_for_bytes(sb, pos + count) - cpos;
-
-   while (clusters) {
-   ret = ocfs2_get_clusters(inode, cpos, phys_cpos, extent_len,
-extent_flags);
-   if (ret  0) {
-   mlog_errno(ret);
-   goto out;
-   }
-
-   if (phys_cpos == 0 || (extent_flags  OCFS2_EXT_UNWRITTEN)) {
-   ret = 1;
-   break;
-   }
-
-   if (extent_len  clusters)
-   extent_len = clusters;
-
-   clusters -= extent_len;
-   cpos += extent_len;
-   }
-out:
-   return ret;
-}
-
 static int ocfs2_write_remove_suid(struct inode *inode)
 {
int ret;
@@ -2212,18 +2174,6 @@ static int ocfs2_prepare_inode_for_write
break;
}
 
-   /*
-* We don't fill holes during direct io, so
-* check for them here. If any are found, the
-* caller will have to retake some cluster
-* locks and initiate the io as buffered.
-*/
-   ret = ocfs2_check_range_for_holes(inode, saved_pos, count);
-   if (ret == 1) {
-   *direct_io = 0;
-   ret = 0;
-   } else if (ret  0)
-   mlog_errno(ret);
break;
}
 
@@ -2253,6 +2203,7 @@ static ssize_t ocfs2_file_write_iter(str
u32 old_clusters;
struct file *file = iocb-ki_filp;
struct inode *inode = file_inode(file);
+   struct address_space *mapping = file-f_mapping;
struct ocfs2_super *osb = OCFS2_SB(inode-i_sb);
int full_coherency = !(osb-s_mount_opt 
   OCFS2_MOUNT_COHERENCY_BUFFERED);
@@ -2367,11 +2318,50 @@ relock:
 
iov_iter_truncate(from, count);
if (direct_io) {
+   loff_t endbyte;
+   ssize_t written_buffered;
written = generic_file_direct_write(iocb, from, *ppos);
-   if (written  0) {
+   if (written  0 || written == count) {
ret = written;
goto out_dio;
}
+
+   /*
+* direct-io write to a hole: fall through to buffered I/O
+* for completing the rest of the request.
+*/
+   count -= written;
+   written_buffered = generic_perform_write(file, from, *ppos);
+   /*
+* If generic_file_buffered_write() returned a synchronous error
+* then we want to return the number of bytes which were
+* direct-written, or the error code if that was zero. Note
+* that this differs from normal direct-io semantics, which
+* will return -EFOO even if some bytes were written.
+*/
+   if (written_buffered  0) {
+   ret = written_buffered;
+   goto out;
+   }
+
+   /* We need to ensure that the page cache pages are written to
+* disk and invalidated to preserve the expected O_DIRECT
+* semantics.
+*/
+   endbyte = *ppos + written_buffered - written - 1;
+   ret = filemap_write_and_wait_range(file-f_mapping, *ppos,
+   endbyte);
+   if (ret == 0) {
+   written = written_buffered;
+   invalidate_mapping_pages(mapping,
+   *ppos  

Re: [Ocfs2-devel] [patch 02/15] ocfs2: free inode when i_count becomes zero

2014-12-15 Thread Xue jiufei
Hi, Andrew,
This patch may lead to data loss so please remove it from mm tree please.
Here is the situation: 
When i_count becomes zero but there still exists dirty pages in i_mapping,
the dirty pages would be freed without flushing the data.

To avoid this problem, we should flush dirty page before dropping
the inode, but I don't think it it a good idea to flush page in
function ocfs2_drop_inode().

So now there is no better way to solve this problem.

Thanks,
Xuejiufei

On 2014/12/16 6:50, a...@linux-foundation.org wrote:
 From: Xue jiufei xuejiu...@huawei.com
 Subject: ocfs2: free inode when i_count becomes zero
 
 Disk inode deletion may be heavily delayed when one node unlink a file
 after the same dentry is freed on another node(say N1) because of memory
 shrink but inode is left in memory.  This inode can only be freed while N1
 doing the orphan scan work.
 
 However, N1 may skip orphan scan for several times because other nodes may
 do the work earlier.  In our tests, it may take 1 hour on 4 nodes cluster
 and this will cause bad user experience.  So we think the inode should be
 freed when i_count becomes zero to avoid such circumstances.
 
 [a...@linux-foundation.org: coding-style fixes]
 Signed-off-by: joyce.xue xuejiu...@huawei.com
 Cc: Mark Fasheh mfas...@suse.com
 Cc: Joel Becker jl...@evilplan.org
 Signed-off-by: Andrew Morton a...@linux-foundation.org
 ---
 
  fs/ocfs2/inode.c |   10 +-
  1 file changed, 1 insertion(+), 9 deletions(-)
 
 diff -puN fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero 
 fs/ocfs2/inode.c
 --- a/fs/ocfs2/inode.c~ocfs2-free-inode-when-i_count-becomes-zero
 +++ a/fs/ocfs2/inode.c
 @@ -1191,17 +1191,9 @@ void ocfs2_evict_inode(struct inode *ino
  int ocfs2_drop_inode(struct inode *inode)
  {
   struct ocfs2_inode_info *oi = OCFS2_I(inode);
 - int res;
 -
   trace_ocfs2_drop_inode((unsigned long long)oi-ip_blkno,
   inode-i_nlink, oi-ip_flags);
 -
 - if (oi-ip_flags  OCFS2_INODE_MAYBE_ORPHANED)
 - res = 1;
 - else
 - res = generic_drop_inode(inode);
 -
 - return res;
 + return 1;
  }
  
  /*
 _
 .
 



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] ocfs2-tools repository and documentation

2014-12-15 Thread Srinivas Eeda
Hi Goldwyn, Germano

I have updated ocfs2-tools git repo with latest fixes :)

Germano,

following documentation is available for OCFS2

http://docs.oracle.com/cd/E37670_01/E37355/html/ol_ocfs2.html
http://www.oracle.com/us/technologies/linux/025995.htm
https://oss.oracle.com/projects/ocfs2/dist/documentation/v1.8/ocfs2-1_8_2-manpages.pdf
https://oss.oracle.com/projects/ocfs2/documentation/


On 12/15/2014 10:47 AM, Germano Percossi wrote:
 Hi Goldwyn,

 Thanks for your answer. I will have a look at your code.
 I just want to be sure that the source code I am looking at
 is the latest one to avoid fixing things that maybe have been already
 fixed elsewhere.

 Same is true for documentation and package repositories.
 If I could have links to the latest, no matter how old and not
 maintained, documentation and repository, it would be helpful,
 as well.

 Do you know of any other distro, besides Suse, maintaining a stack
 of patches?

 Cheers,
 Germano

 On 14/12/14 07:24, Goldwyn Rodrigues wrote:
 Yes, if you don't count the last 2 patches, it is more close to 3 years
 now ;)

 Since the patches were not being updated. I started maintaining an
 alternate repository where I am putting all the bugs reported at SUSE:

 https://github.com/goldwynr/ocfs2-tools

 Branch suse-fixes has the fixes found by SUSE over the upstream branch.

 Branch nocontrold has the patches for the feature of doing away with
 ocfs2_controld to work with the latest corosync/pacemaker stack. Patches
 for the kernel are already in the kernel but the ones in the tools need
 some review.

 I had a mail conversation with Srini and he has promised to update the
 upstream branch soon.

 Regards,


 ___
 Ocfs2-devel mailing list
 Ocfs2-devel@oss.oracle.com
 https://oss.oracle.com/mailman/listinfo/ocfs2-devel


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel