[Ocfs2-devel] [PATCH] ocfs2: Don't duplicate page passes i_size during CoW.

2010-07-12 Thread Tao Ma
During CoW, actually all the pages after i_size contains
garbage data, so don't read and duplicate them.

Signed-off-by: Tao Ma tao...@oracle.com
---
 fs/ocfs2/refcounttree.c |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 1cf9cda..e082623 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -2921,7 +2921,7 @@ static int ocfs2_duplicate_clusters_by_page(handle_t 
*handle,
struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
u64 new_block = ocfs2_clusters_to_blocks(sb, new_cluster);
struct page *page;
-   pgoff_t page_index;
+   pgoff_t page_index, last_page;
unsigned int from, to;
loff_t offset, end, map_end;
struct address_space *mapping = context-inode-i_mapping;
@@ -2932,12 +2932,20 @@ static int ocfs2_duplicate_clusters_by_page(handle_t 
*handle,
offset = ((loff_t)cpos)  OCFS2_SB(sb)-s_clustersize_bits;
end = offset + (new_len  OCFS2_SB(sb)-s_clustersize_bits);
 
+   last_page = i_size_read(context-inode)  PAGE_CACHE_SHIFT;
while (offset  end) {
page_index = offset  PAGE_CACHE_SHIFT;
map_end = ((loff_t)page_index + 1)  PAGE_CACHE_SHIFT;
if (map_end  end)
map_end = end;
 
+   /*
+* If this page is beyond the page contains i_size,
+* stop duplicating it.
+*/
+   if (page_index  last_page)
+   break;
+
/* from, to is the offset within the page. */
from = offset  (PAGE_CACHE_SIZE - 1);
to = PAGE_CACHE_SIZE;
-- 
1.7.1.571.gba4d01


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: Don't duplicate page passes i_size during CoW.

2010-07-12 Thread Joel Becker
On Mon, Jul 12, 2010 at 03:19:48PM +0800, Tao Ma wrote:
 During CoW, actually all the pages after i_size contains
 garbage data, so don't read and duplicate them.
 
 Signed-off-by: Tao Ma tao...@oracle.com
 ---
  fs/ocfs2/refcounttree.c |   10 +-
  1 files changed, 9 insertions(+), 1 deletions(-)
 
 diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
 index 1cf9cda..e082623 100644
 --- a/fs/ocfs2/refcounttree.c
 +++ b/fs/ocfs2/refcounttree.c
 @@ -2921,7 +2921,7 @@ static int ocfs2_duplicate_clusters_by_page(handle_t 
 *handle,
   struct super_block *sb = ocfs2_metadata_cache_get_super(ci);
   u64 new_block = ocfs2_clusters_to_blocks(sb, new_cluster);
   struct page *page;
 - pgoff_t page_index;
 + pgoff_t page_index, last_page;
   unsigned int from, to;
   loff_t offset, end, map_end;
   struct address_space *mapping = context-inode-i_mapping;
 @@ -2932,12 +2932,20 @@ static int ocfs2_duplicate_clusters_by_page(handle_t 
 *handle,
   offset = ((loff_t)cpos)  OCFS2_SB(sb)-s_clustersize_bits;
   end = offset + (new_len  OCFS2_SB(sb)-s_clustersize_bits);
  
 + last_page = i_size_read(context-inode)  PAGE_CACHE_SHIFT;
   while (offset  end) {

Why trigger on both index and byte offset?  Why not just adjust
end?

if (end  i_size_read(context-inode))
end = i_size_read(inode);

Then you don't need the last_page variable at all, because
you're guaranteed never to go past i_size.

Joel

-- 

Three o'clock is always too late or too early for anything you
 want to do.
- Jean-Paul Sartre

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking

2010-07-12 Thread Wengang Wang
Hi Dan,

I think O2NM_MAX_NAME_LEN is the max valid length of the domain name.
Regarding your patch, it changed to be that a domain name with length
O2NM_MAX_NAME_LEN (NULL character not included) is not permitted.

Though that check seems useless for current calls, we'd better keep it.

Checking the structure,

 99 struct ocfs2_cluster_connection {
100 char cc_name[GROUP_NAME_MAX];
101 int cc_namelen;

cc_name is not a NULL tailed string. the cc_namelen specifies the length of it.
There does is misuse of cc_name, such as


   7832  fs/ocfs2/stack_user.c user_cluster_connect
 rc = dlm_new_lockspace(conn-cc_name, strlen(conn-cc_name),

   5308  fs/ocfs2/stack_o2cb.c o2cb_cluster_connect
 dlm = dlm_register_domain(conn-cc_name, dlm_key, fs_version);

Also, the uuid shouldn't be treated as NULL tailed string.
142 struct ocfs2_control_message_down {
143 chartag[OCFS2_CONTROL_MESSAGE_OP_LEN];
144 charspace1;
145 charuuid[OCFS2_TEXT_UUID_LEN];
146 charspace2;

thus,
the calling of 
fs/ocfs2/stack_user.c:474:  ocfs2_control_send_down(msg-uuid, nodenum);
   -ocfs2_connection_find(uuid)
-size_t len = strlen(name);

is suspectable.

Could you please make patch for that instead?

regards,
wengang.

On 10-07-10 16:33, Dan Carpenter wrote:
 This function is only called from one place and it's like this:
   dlm_register_domain(conn-cc_name, dlm_key, fs_version);
 
 The conn-cc_name is 64 characters long.  If strlen(conn-cc_name)
 were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because
 strlen() doesn't count the NULL character.
 
 In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes
 64 character buffers.  The only exception is nd_name from struct
 o2nm_node.
 
 Anyway I looked into it and in this case the domain string comes from
 osb-uuid_str in ocfs2_setup_osb_uuid().  That's 32 characters and NULL
 which easily fits into O2NM_MAX_NAME_LEN.  This patch doesn't change how
 the code works, but I think it makes the code a little cleaner.
 
 Signed-off-by: Dan Carpenter erro...@gmail.com
 ---
 Or we could get rid of check entirely.
 
 diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
 index 6b5a492..084b051 100644
 --- a/fs/ocfs2/dlm/dlmdomain.c
 +++ b/fs/ocfs2/dlm/dlmdomain.c
 @@ -1671,7 +1671,7 @@ struct dlm_ctxt * dlm_register_domain(const char 
 *domain,
   struct dlm_ctxt *dlm = NULL;
   struct dlm_ctxt *new_ctxt = NULL;
  
 - if (strlen(domain)  O2NM_MAX_NAME_LEN) {
 + if (strlen(domain) = O2NM_MAX_NAME_LEN) {
   ret = -ENAMETOOLONG;
   mlog(ML_ERROR, domain name length too long\n);
   goto leave;

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking

2010-07-12 Thread Wengang Wang
Hi Dan,

On 10-07-12 15:39, Dan Carpenter wrote:
 On Mon, Jul 12, 2010 at 07:30:06PM +0800, Wengang Wang wrote:
  Also, the uuid shouldn't be treated as NULL tailed string.
  142 struct ocfs2_control_message_down {
  143 chartag[OCFS2_CONTROL_MESSAGE_OP_LEN];
  144 charspace1;
  145 charuuid[OCFS2_TEXT_UUID_LEN];
  146 charspace2;
 
 The space1 and space2 characters are NULL terminators:
 
 From ocfs2_control_do_down_msg():
 msg-space1 = msg-space2 = msg-newline = '\0';
 
 I would have thought it had to be a packed struct, but it works
 because there are only chars in that struct.  So that code is fine.

Yes, it is working fine.
I would rather think that structure is misleading than smart :-D.
I prefer this:
struct ocfs2_control_message_down {
chartag[OCFS2_CONTROL_MESSAGE_OP_LEN + 1];
#define space1 tag[OCFS2_CONTROL_MESSAGE_OP_LEN]
charuuid[OCFS2_TEXT_UUID_LEN + 1];
#define space2 uuid[OCFS2_TEXT_UUID_LEN]


regards,
wengang.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking

2010-07-12 Thread Sunil Mushran
Acked-by: Sunil Mushran sunil.mush...@oracle.com

On 07/10/2010 07:33 AM, Dan Carpenter wrote:
 This function is only called from one place and it's like this:
   dlm_register_domain(conn-cc_name, dlm_key,fs_version);

 The conn-cc_name is 64 characters long.  If strlen(conn-cc_name)
 were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because
 strlen() doesn't count the NULL character.

 In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes
 64 character buffers.  The only exception is nd_name from struct
 o2nm_node.

 Anyway I looked into it and in this case the domain string comes from
 osb-uuid_str in ocfs2_setup_osb_uuid().  That's 32 characters and NULL
 which easily fits into O2NM_MAX_NAME_LEN.  This patch doesn't change how
 the code works, but I think it makes the code a little cleaner.

 Signed-off-by: Dan Carpentererro...@gmail.com
 ---
 Or we could get rid of check entirely.

 diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
 index 6b5a492..084b051 100644
 --- a/fs/ocfs2/dlm/dlmdomain.c
 +++ b/fs/ocfs2/dlm/dlmdomain.c
 @@ -1671,7 +1671,7 @@ struct dlm_ctxt * dlm_register_domain(const char 
 *domain,
   struct dlm_ctxt *dlm = NULL;
   struct dlm_ctxt *new_ctxt = NULL;

 - if (strlen(domain)  O2NM_MAX_NAME_LEN) {
 + if (strlen(domain)= O2NM_MAX_NAME_LEN) {
   ret = -ENAMETOOLONG;
   mlog(ML_ERROR, domain name length too long\n);
   goto leave;



___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking

2010-07-12 Thread Sunil Mushran
So o2dlm expects a null terminated domain name. The original
patch is good as it adds the check in o2dlm only.

For userspace, we allow non-null terminating group (domain) name.
That remains unchanged.

On 07/12/2010 04:30 AM, Wengang Wang wrote:
 Hi Dan,

 I think O2NM_MAX_NAME_LEN is the max valid length of the domain name.
 Regarding your patch, it changed to be that a domain name with length
 O2NM_MAX_NAME_LEN (NULL character not included) is not permitted.

 Though that check seems useless for current calls, we'd better keep it.

 Checking the structure,

   99 struct ocfs2_cluster_connection {
 100 char cc_name[GROUP_NAME_MAX];
 101 int cc_namelen;

 cc_name is not a NULL tailed string. the cc_namelen specifies the length of 
 it.
 There does is misuse of cc_name, such as


 7832  fs/ocfs2/stack_user.cuser_cluster_connect
   rc = dlm_new_lockspace(conn-cc_name, strlen(conn-cc_name),

 5308  fs/ocfs2/stack_o2cb.co2cb_cluster_connect
   dlm = dlm_register_domain(conn-cc_name, dlm_key,fs_version);

 Also, the uuid shouldn't be treated as NULL tailed string.
 142 struct ocfs2_control_message_down {
 143 chartag[OCFS2_CONTROL_MESSAGE_OP_LEN];
 144 charspace1;
 145 charuuid[OCFS2_TEXT_UUID_LEN];
 146 charspace2;

 thus,
 the calling of
 fs/ocfs2/stack_user.c:474:  ocfs2_control_send_down(msg-uuid, nodenum);
 -ocfs2_connection_find(uuid)
   -size_t len = strlen(name);

 is suspectable.

 Could you please make patch for that instead?

 regards,
 wengang.

 On 10-07-10 16:33, Dan Carpenter wrote:

 This function is only called from one place and it's like this:
  dlm_register_domain(conn-cc_name, dlm_key,fs_version);

 The conn-cc_name is 64 characters long.  If strlen(conn-cc_name)
 were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because
 strlen() doesn't count the NULL character.

 In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes
 64 character buffers.  The only exception is nd_name from struct
 o2nm_node.

 Anyway I looked into it and in this case the domain string comes from
 osb-uuid_str in ocfs2_setup_osb_uuid().  That's 32 characters and NULL
 which easily fits into O2NM_MAX_NAME_LEN.  This patch doesn't change how
 the code works, but I think it makes the code a little cleaner.

 Signed-off-by: Dan Carpentererro...@gmail.com
 ---
 Or we could get rid of check entirely.

 diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
 index 6b5a492..084b051 100644
 --- a/fs/ocfs2/dlm/dlmdomain.c
 +++ b/fs/ocfs2/dlm/dlmdomain.c
 @@ -1671,7 +1671,7 @@ struct dlm_ctxt * dlm_register_domain(const char 
 *domain,
  struct dlm_ctxt *dlm = NULL;
  struct dlm_ctxt *new_ctxt = NULL;

 -if (strlen(domain)  O2NM_MAX_NAME_LEN) {
 +if (strlen(domain)= O2NM_MAX_NAME_LEN) {
  ret = -ENAMETOOLONG;
  mlog(ML_ERROR, domain name length too long\n);
  goto leave;
  


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2: Remove the redundant cpu_to_le64.

2010-07-12 Thread Joel Becker
On Thu, Jul 08, 2010 at 11:11:11AM +0800, Tao Ma wrote:
 In ocfs2_block_group_alloc, we set c_blkno by bg-bg_blkno.
 But actually bg-bg_blkno is already changed to little endian
 in ocfs2_block_group_fill. So remove the extra cpu_to_le64.
 
 Reported-by: Marcos Matsunaga marcos.matsun...@oracle.com
 Signed-off-by: Tao Ma tao...@oracle.com

This patch is now in the fixes branch of ocfs2.git.

Joel

-- 

f/8 and be there.

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: don't access beyond bitmap size

2010-07-12 Thread Joel Becker
On Wed, Jun 30, 2010 at 08:23:30PM +0800, Wengang Wang wrote:
 dlm-recovery_map is defined as
   unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
 
 We should treat O2NM_MAX_NODES as the bit map size in bits.
 This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap 
 size.
 
 Signed-off-by: Wengang Wang wen.gang.w...@oracle.com

This patch is now in the 'fixes' branch of ocfs2.git.

Joel

-- 

 The zen have a saying:
 When you learn how to listen, ANYONE can be your teacher.

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 1/2 v2] ocfs2: make xattr extension work with new local alloc reservation.

2010-07-12 Thread Joel Becker
On Fri, Jul 09, 2010 at 02:53:11PM +0800, Tao Ma wrote:
 The old ocfs2_xattr_extent_allocation is too optimistic about
 the clusters we can get. So actually if the file system is
 too fragmented, ocfs2_add_clusters_in_btree will return us
 with EGAIN and we need to allocate clusters once again.
 
 So this patch change it to a while loop so that we can allocate
 clusters until we reach clusters_to_add.
 
 Signed-off-by: Tao Ma tao...@oracle.com

This patch is now in the fixes branch of ocfs2.git.

Joel

-- 

Born under a bad sign.
 I been down since I began to crawl.
 If it wasn't for bad luck,
 I wouldn't have no luck at all.

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/2 v2] ocfs2: Make xattr reflink work with new local alloc reservation.

2010-07-12 Thread Joel Becker
On Fri, Jul 09, 2010 at 02:53:12PM +0800, Tao Ma wrote:
 The new reservation code in local alloc has add the limitation
 that the caller should handle the case that the local alloc
 doesn't give use enough contiguous clusters. It make the old
 xattr reflink code broken.
 
 So this patch udpate the xattr reflink code so that it can
 handle the case that local alloc give us one cluster at a time.
 
 Signed-off-by: Tao Ma tao...@oracle.com

This patch, with the gcc warning fixed, is now in the fixes
branch of ocfs2.git.

Joel

-- 

Life's Little Instruction Book #109

Know how to drive a stick shift.

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking

2010-07-12 Thread Joel Becker
On Sat, Jul 10, 2010 at 04:33:36PM +0200, Dan Carpenter wrote:
 This function is only called from one place and it's like this:
   dlm_register_domain(conn-cc_name, dlm_key, fs_version);
 
 The conn-cc_name is 64 characters long.  If strlen(conn-cc_name)
 were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because
 strlen() doesn't count the NULL character.
 
 In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes
 64 character buffers.  The only exception is nd_name from struct
 o2nm_node.
 
 Anyway I looked into it and in this case the domain string comes from
 osb-uuid_str in ocfs2_setup_osb_uuid().  That's 32 characters and NULL
 which easily fits into O2NM_MAX_NAME_LEN.  This patch doesn't change how
 the code works, but I think it makes the code a little cleaner.
 
 Signed-off-by: Dan Carpenter erro...@gmail.com

This patch is now in the fixes branch of ocfs2.git.

Joel

-- 

Viro's Razor:
Any race condition, no matter how unlikely, will occur just
often enough to bite you.

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH V3 3/8] Cleancache: core ops functions and configuration

2010-07-12 Thread Nitin Gupta
On 07/07/2010 02:21 AM, Konrad Rzeszutek Wilk wrote:
 On Tue, Jun 22, 2010 at 09:26:28AM -0700, Dave Hansen wrote:
 On Mon, 2010-06-21 at 16:19 -0700, Dan Magenheimer wrote:
 --- linux-2.6.35-rc2/include/linux/cleancache.h 1969-12-31 
 17:00:00.0 -0700
 +++ linux-2.6.35-rc2-cleancache/include/linux/cleancache.h  2010-06-21 
 14:45:18.0 -0600
 @@ -0,0 +1,88 @@
 +#ifndef _LINUX_CLEANCACHE_H
 +#define _LINUX_CLEANCACHE_H
 +
 +#include linux/fs.h
 +#include linux/mm.h
 +
 +struct cleancache_ops {
 +   int (*init_fs)(size_t);
 +   int (*init_shared_fs)(char *uuid, size_t);
 +   int (*get_page)(int, ino_t, pgoff_t, struct page *);
 +   void (*put_page)(int, ino_t, pgoff_t, struct page *);
 +   void (*flush_page)(int, ino_t, pgoff_t);
 +   void (*flush_inode)(int, ino_t);
 +   void (*flush_fs)(int);
 +};
 + 

 How would someone go about testing this code?  Is there an example
 cleancache implementation?
 
 Dan,
 
 Can you reference with a link or a git branch the patches that utilize
 this?
 
 And also mention that in the 0/X patch so that folks can reference your
 cleancache implementation?
 


FYI.

I am working on 'zcache' which uses cleancache_ops to provide page cache
compression support. I will be posting it to LKML before end of next week.

Thanks,
Nitin

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH] Cleancache: shim to Xen Transcendent Memory

2010-07-12 Thread Konrad Rzeszutek Wilk
On Thu, Jul 08, 2010 at 09:42:08AM -0700, Dan Magenheimer wrote:
 Signed-off-by: Dan Magenheimer dan.magenhei...@oracle.com

One nitpick:
..
 +
 +int tmem_enabled;
 +
 +static int __init enable_tmem(char *s)
 +{
 + tmem_enabled = 1;
 + return 1;
 +}
 +
 +__setup(tmem, enable_tmem);

Perhaps 'tmem_setup' is more appropiate as it might be that this
function in the future would be only used to disable tmem, not actually
enable it?

Otherwise, it has been reviewed by me.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH 27/36] fs/ocfs2: Remove unnecessary casts of private_data

2010-07-12 Thread Joe Perches
Signed-off-by: Joe Perches j...@perches.com
---
 fs/ocfs2/dlm/dlmdebug.c |6 +++---
 fs/ocfs2/dlmfs/dlmfs.c  |3 +--
 fs/ocfs2/dlmglue.c  |4 ++--
 3 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c
index 0cd24cf..5efdd37 100644
--- a/fs/ocfs2/dlm/dlmdebug.c
+++ b/fs/ocfs2/dlm/dlmdebug.c
@@ -419,7 +419,7 @@ static loff_t debug_buffer_llseek(struct file *file, loff_t 
off, int whence)
 
 static int debug_buffer_release(struct inode *inode, struct file *file)
 {
-   struct debug_buffer *db = (struct debug_buffer *)file-private_data;
+   struct debug_buffer *db = file-private_data;
 
if (db)
kfree(db-buf);
@@ -715,7 +715,7 @@ static int debug_lockres_open(struct inode *inode, struct 
file *file)
goto bail;
}
 
-   seq = (struct seq_file *) file-private_data;
+   seq = file-private_data;
seq-private = dl;
 
dlm_grab(dlm);
@@ -731,7 +731,7 @@ bail:
 
 static int debug_lockres_release(struct inode *inode, struct file *file)
 {
-   struct seq_file *seq = (struct seq_file *)file-private_data;
+   struct seq_file *seq = file-private_data;
struct debug_lockres *dl = (struct debug_lockres *)seq-private;
 
if (dl-dl_res)
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index a43ebb1..c2903b8 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -182,8 +182,7 @@ static int dlmfs_file_release(struct inode *inode,
 {
int level, status;
struct dlmfs_inode_private *ip = DLMFS_I(inode);
-   struct dlmfs_filp_private *fp =
-   (struct dlmfs_filp_private *) file-private_data;
+   struct dlmfs_filp_private *fp = file-private_data;
 
if (S_ISDIR(inode-i_mode))
BUG();
diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c
index 39eb16a..5e02a89 100644
--- a/fs/ocfs2/dlmglue.c
+++ b/fs/ocfs2/dlmglue.c
@@ -2966,7 +2966,7 @@ static const struct seq_operations ocfs2_dlm_seq_ops = {
 
 static int ocfs2_dlm_debug_release(struct inode *inode, struct file *file)
 {
-   struct seq_file *seq = (struct seq_file *) file-private_data;
+   struct seq_file *seq = file-private_data;
struct ocfs2_dlm_seq_priv *priv = seq-private;
struct ocfs2_lock_res *res = priv-p_iter_res;
 
@@ -3000,7 +3000,7 @@ static int ocfs2_dlm_debug_open(struct inode *inode, 
struct file *file)
goto out;
}
 
-   seq = (struct seq_file *) file-private_data;
+   seq = file-private_data;
seq-private = priv;
 
ocfs2_add_lockres_tracking(priv-p_iter_res,
-- 
1.7.1.337.g6068.dirty


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH V3 3/8] Cleancache: core ops functions and configuration

2010-07-12 Thread Konrad Rzeszutek Wilk
  Can you reference with a link or a git branch the patches that utilize
  this?
  
  And also mention that in the 0/X patch so that folks can reference your
  cleancache implementation?
  
 
 
 FYI.
 
 I am working on 'zcache' which uses cleancache_ops to provide page cache
 compression support. I will be posting it to LKML before end of next week.

Yes! That too, please. Thanks for pointing this out.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking

2010-07-12 Thread Dan Carpenter
On Mon, Jul 12, 2010 at 07:30:06PM +0800, Wengang Wang wrote:
 Also, the uuid shouldn't be treated as NULL tailed string.
 142 struct ocfs2_control_message_down {
 143 chartag[OCFS2_CONTROL_MESSAGE_OP_LEN];
 144 charspace1;
 145 charuuid[OCFS2_TEXT_UUID_LEN];
 146 charspace2;

The space1 and space2 characters are NULL terminators:

From ocfs2_control_do_down_msg():
msg-space1 = msg-space2 = msg-newline = '\0';

I would have thought it had to be a packed struct, but it works
because there are only chars in that struct.  So that code is fine.

regards,
dan carpenter

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking

2010-07-12 Thread Dan Carpenter
This function is only called from one place and it's like this:
dlm_register_domain(conn-cc_name, dlm_key, fs_version);

The conn-cc_name is 64 characters long.  If strlen(conn-cc_name)
were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because
strlen() doesn't count the NULL character.

In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes
64 character buffers.  The only exception is nd_name from struct
o2nm_node.

Anyway I looked into it and in this case the domain string comes from
osb-uuid_str in ocfs2_setup_osb_uuid().  That's 32 characters and NULL
which easily fits into O2NM_MAX_NAME_LEN.  This patch doesn't change how
the code works, but I think it makes the code a little cleaner.

Signed-off-by: Dan Carpenter erro...@gmail.com
---
Or we could get rid of check entirely.

diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
index 6b5a492..084b051 100644
--- a/fs/ocfs2/dlm/dlmdomain.c
+++ b/fs/ocfs2/dlm/dlmdomain.c
@@ -1671,7 +1671,7 @@ struct dlm_ctxt * dlm_register_domain(const char *domain,
struct dlm_ctxt *dlm = NULL;
struct dlm_ctxt *new_ctxt = NULL;
 
-   if (strlen(domain)  O2NM_MAX_NAME_LEN) {
+   if (strlen(domain) = O2NM_MAX_NAME_LEN) {
ret = -ENAMETOOLONG;
mlog(ML_ERROR, domain name length too long\n);
goto leave;

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH V3 3/8] Cleancache: core ops functions and configuration

2010-07-12 Thread Konrad Rzeszutek Wilk
On Tue, Jun 22, 2010 at 09:26:28AM -0700, Dave Hansen wrote:
 On Mon, 2010-06-21 at 16:19 -0700, Dan Magenheimer wrote:
  --- linux-2.6.35-rc2/include/linux/cleancache.h 1969-12-31 
  17:00:00.0 -0700
  +++ linux-2.6.35-rc2-cleancache/include/linux/cleancache.h  2010-06-21 
  14:45:18.0 -0600
  @@ -0,0 +1,88 @@
  +#ifndef _LINUX_CLEANCACHE_H
  +#define _LINUX_CLEANCACHE_H
  +
  +#include linux/fs.h
  +#include linux/mm.h
  +
  +struct cleancache_ops {
  +   int (*init_fs)(size_t);
  +   int (*init_shared_fs)(char *uuid, size_t);
  +   int (*get_page)(int, ino_t, pgoff_t, struct page *);
  +   void (*put_page)(int, ino_t, pgoff_t, struct page *);
  +   void (*flush_page)(int, ino_t, pgoff_t);
  +   void (*flush_inode)(int, ino_t);
  +   void (*flush_fs)(int);
  +};
  + 
 
 How would someone go about testing this code?  Is there an example
 cleancache implementation?

Dan,

Can you reference with a link or a git branch the patches that utilize
this?

And also mention that in the 0/X patch so that folks can reference your
cleancache implementation?

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH V3 0/8] Cleancache: overview

2010-07-12 Thread Konrad Rzeszutek Wilk
On Mon, Jun 21, 2010 at 04:18:09PM -0700, Dan Magenheimer wrote:
 [PATCH V3 0/8] Cleancache: overview

Dan,

Two comments:
 - Mention where one can get the implementor of the cleancache API.
   Either a link to where the patches reside or a git branch.
   If you need pointers on branch names:
   http://lkml.org/lkml/2010/6/7/269

-  Point out the presentation you did on this. It has an excellent
   overview of how this API works, and most importantly: a) images
   and b). performance numbers.

Otherwise, please consider all of these patches to have
Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com

tag.

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH] Cleancache: shim to Xen Transcendent Memory

2010-07-12 Thread Dan Magenheimer
[PATCH] Cleancache: shim to Xen Transcendent Memory

This companion patch to the cleancache V3 patchset (see
http://lkml.org/lkml/2010/6/21/411) provides one consumer
for the proposed cleancache internal kernel API.  This
user is Xen Transcendent Memory (tmem), supported in Xen 4.0+.
(A second user, zcache, will be posted by Nitin Gupta soon.)

This patch affects Xen directories only; while broader review
is welcome, the patch is provided primarily to answer concerns
expressed about the apparent non-existence of consumers for
the proposed cleancache patch.

Xen tmem provides hypervisor RAM as an ephemeral page-oriented
pseudo-RAM store for cleancache pages, shared cleancache pages,
and frontswap pages.  Tmem provides enterprise-quality concurrency,
full save/restore and live migration support, compression
and deduplication.

A presentation showing up to 8% faster performance and up to 52%
reduction in sectors read on a kernel compile workload, despite
aggressive in-kernel page reclamation (self-ballooning) can be
found at:

http://oss.oracle.com/projects/tmem/dist/documentation/presentations/TranscendentMemoryXenSummit2010.pdf

Signed-off-by: Dan Magenheimer dan.magenhei...@oracle.com

Diffstat:
 arch/x86/include/asm/xen/hypercall.h |7 
 drivers/xen/Makefile |2 
 drivers/xen/tmem.c   |  238 +
 include/xen/interface/xen.h  |   20 +
 4 files changed, 266 insertions(+), 1 deletion(-)

diff -Napur linux-2.6.35-rc4/arch/x86/include/asm/xen/hypercall.h 
linux-2.6.35-rc4-cleancache_tmem/arch/x86/include/asm/xen/hypercall.h
--- linux-2.6.35-rc4/arch/x86/include/asm/xen/hypercall.h   2010-07-04 
21:22:50.0 -0600
+++ linux-2.6.35-rc4-cleancache_tmem/arch/x86/include/asm/xen/hypercall.h   
2010-07-07 15:37:46.0 -0600
@@ -417,6 +417,13 @@ HYPERVISOR_nmi_op(unsigned long op, unsi
return _hypercall2(int, nmi_op, op, arg);
 }
 
+static inline int
+HYPERVISOR_tmem_op(
+   struct tmem_op *op)
+{
+   return _hypercall1(int, tmem_op, op);
+}
+
 static inline void
 MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
 {
diff -Napur linux-2.6.35-rc4/drivers/xen/Makefile 
linux-2.6.35-rc4-cleancache_tmem/drivers/xen/Makefile
--- linux-2.6.35-rc4/drivers/xen/Makefile   2010-07-04 21:22:50.0 
-0600
+++ linux-2.6.35-rc4-cleancache_tmem/drivers/xen/Makefile   2010-07-07 
09:38:21.0 -0600
@@ -1,5 +1,6 @@
 obj-y  += grant-table.o features.o events.o manage.o
 obj-y  += xenbus/
+obj-y  += tmem.o
 
 nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_features.o  := $(nostackp)
@@ -9,4 +10,4 @@ obj-$(CONFIG_XEN_XENCOMM)  += xencomm.o
 obj-$(CONFIG_XEN_BALLOON)  += balloon.o
 obj-$(CONFIG_XEN_DEV_EVTCHN)   += evtchn.o
 obj-$(CONFIG_XENFS)+= xenfs/
-obj-$(CONFIG_XEN_SYS_HYPERVISOR)   += sys-hypervisor.o
\ No newline at end of file
+obj-$(CONFIG_XEN_SYS_HYPERVISOR)   += sys-hypervisor.o
diff -Napur linux-2.6.35-rc4/drivers/xen/tmem.c 
linux-2.6.35-rc4-cleancache_tmem/drivers/xen/tmem.c
--- linux-2.6.35-rc4/drivers/xen/tmem.c 1969-12-31 17:00:00.0 -0700
+++ linux-2.6.35-rc4-cleancache_tmem/drivers/xen/tmem.c 2010-07-07 
16:24:08.0 -0600
@@ -0,0 +1,238 @@
+/*
+ * Xen implementation for transcendent memory (tmem)
+ *
+ * Copyright (C) 2009-2010 Oracle Corp.  All rights reserved.
+ * Author: Dan Magenheimer
+ */
+
+#include linux/kernel.h
+#include linux/types.h
+#include linux/init.h
+#include linux/pagemap.h
+#include linux/cleancache.h
+
+#include xen/xen.h
+#include xen/interface/xen.h
+#include asm/xen/hypercall.h
+#include asm/xen/page.h
+#include asm/xen/hypervisor.h
+
+#define TMEM_CONTROL   0
+#define TMEM_NEW_POOL  1
+#define TMEM_DESTROY_POOL  2
+#define TMEM_NEW_PAGE  3
+#define TMEM_PUT_PAGE  4
+#define TMEM_GET_PAGE  5
+#define TMEM_FLUSH_PAGE6
+#define TMEM_FLUSH_OBJECT  7
+#define TMEM_READ  8
+#define TMEM_WRITE 9
+#define TMEM_XCHG 10
+
+/* Bits for HYPERVISOR_tmem_op(TMEM_NEW_POOL) */
+#define TMEM_POOL_PERSIST  1
+#define TMEM_POOL_SHARED   2
+#define TMEM_POOL_PAGESIZE_SHIFT   4
+
+
+struct tmem_pool_uuid {
+   u64 uuid_lo;
+   u64 uuid_hi;
+};
+
+#define TMEM_POOL_PRIVATE_UUID { 0, 0 }
+
+/* flags for tmem_ops.new_pool */
+#define TMEM_POOL_PERSIST  1
+#define TMEM_POOL_SHARED   2
+
+/* xen tmem foundation ops/hypercalls */
+
+static inline int xen_tmem_op(u32 tmem_cmd, u32 tmem_pool, u64 object,
+   u32 index, unsigned long gmfn, u32 tmem_offset, u32 pfn_offset, u32 len)
+{
+   struct tmem_op op;
+   int rc = 0;
+
+   op.cmd = tmem_cmd;
+   op.pool_id = tmem_pool;
+   op.u.gen.object = object;
+   op.u.gen.index = index;
+   op.u.gen.tmem_offset = tmem_offset;
+   op.u.gen.pfn_offset = 

Re: [Ocfs2-devel] [PATCH 1/1] ocfs2 fix o2dlm dlm run purgelist(bug 9094491) - rev3

2010-07-12 Thread Srinivas Eeda
Joel Becker wrote:
 On Tue, Jun 22, 2010 at 10:48:28PM -0700, Srinivas Eeda wrote:
   
 +if (!__dlm_lockres_unused) {
 +mlog(ML_ERROR, found lockres %s:%.*s: in use after deref\n,
 + dlm-name, res-lockname.len, res-lockname.name);
 +__dlm_print_one_lock_resource(res);
 +BUG();
 +}
 

 /build/jlbec/linux-2.6/working/fs/ocfs2/dlm/dlmthread.c: In function 
 ‘dlm_purge_lockres’:
 /build/jlbec/linux-2.6/working/fs/ocfs2/dlm/dlmthread.c:203: warning: the 
 address of ‘__dlm_lockres_unused’ will always evaluate as ‘true’

   Was this even tested?  I'm leaving this patch out of 'fixes'
 until corrected and tested.
   
Sorry, I had the typo while making the review changes. I ran the usual 
tests but that  didn't catch this problem. I should have payed more 
attention to the build log.

I made the change and tested it, will send you the modified patch
 Joel

   


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel

[Ocfs2-devel] [PATCH 1/1] ocfs2 fix o2dlm dlm run purgelist(bug 9094491) - rev3

2010-07-12 Thread Srinivas Eeda
There are two problems in dlm_run_purgelist

1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge
the same lockres instead of trying the next lockres.

2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock
before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres.
spinlock is reacquired but in this window lockres can get reused. This leads
to BUG.

This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge
 next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the
lockres spinlock protecting it from getting reused.

Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
Signed-off-by: Sunil Mushransunil.mush...@oracle.com
---
 fs/ocfs2/dlm/dlmthread.c |   79 +++--
 1 files changed, 33 insertions(+), 46 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c
index 11a6d1f..76ab4aa 100644
--- a/fs/ocfs2/dlm/dlmthread.c
+++ b/fs/ocfs2/dlm/dlmthread.c
@@ -152,45 +152,25 @@ void dlm_lockres_calc_usage(struct dlm_ctxt *dlm,
spin_unlock(dlm-spinlock);
 }
 
-static int dlm_purge_lockres(struct dlm_ctxt *dlm,
+static void dlm_purge_lockres(struct dlm_ctxt *dlm,
 struct dlm_lock_resource *res)
 {
int master;
int ret = 0;
 
-   spin_lock(res-spinlock);
-   if (!__dlm_lockres_unused(res)) {
-   mlog(0, %s:%.*s: tried to purge but not unused\n,
-dlm-name, res-lockname.len, res-lockname.name);
-   __dlm_print_one_lock_resource(res);
-   spin_unlock(res-spinlock);
-   BUG();
-   }
-
-   if (res-state  DLM_LOCK_RES_MIGRATING) {
-   mlog(0, %s:%.*s: Delay dropref as this lockres is 
-being remastered\n, dlm-name, res-lockname.len,
-res-lockname.name);
-   /* Re-add the lockres to the end of the purge list */
-   if (!list_empty(res-purge)) {
-   list_del_init(res-purge);
-   list_add_tail(res-purge, dlm-purge_list);
-   }
-   spin_unlock(res-spinlock);
-   return 0;
-   }
+   assert_spin_locked(dlm-spinlock);
+   assert_spin_locked(res-spinlock);
 
master = (res-owner == dlm-node_num);
 
-   if (!master)
-   res-state |= DLM_LOCK_RES_DROPPING_REF;
-   spin_unlock(res-spinlock);
 
mlog(0, purging lockres %.*s, master = %d\n, res-lockname.len,
 res-lockname.name, master);
 
if (!master) {
+   res-state |= DLM_LOCK_RES_DROPPING_REF;
/* drop spinlock...  retake below */
+   spin_unlock(res-spinlock);
spin_unlock(dlm-spinlock);
 
spin_lock(res-spinlock);
@@ -208,31 +188,35 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm,
mlog(0, %s:%.*s: dlm_deref_lockres returned %d\n,
 dlm-name, res-lockname.len, res-lockname.name, ret);
spin_lock(dlm-spinlock);
+   spin_lock(res-spinlock);
}
 
-   spin_lock(res-spinlock);
if (!list_empty(res-purge)) {
mlog(0, removing lockres %.*s:%p from purgelist, 
 master = %d\n, res-lockname.len, res-lockname.name,
 res, master);
list_del_init(res-purge);
-   spin_unlock(res-spinlock);
dlm_lockres_put(res);
dlm-purge_count--;
-   } else
-   spin_unlock(res-spinlock);
+   }
+
+   if (!__dlm_lockres_unused(res)) {
+   mlog(ML_ERROR, found lockres %s:%.*s: in use after deref\n,
+dlm-name, res-lockname.len, res-lockname.name);
+   __dlm_print_one_lock_resource(res);
+   BUG();
+   }
 
__dlm_unhash_lockres(res);
 
/* lockres is not in the hash now.  drop the flag and wake up
 * any processes waiting in dlm_get_lock_resource. */
if (!master) {
-   spin_lock(res-spinlock);
res-state = ~DLM_LOCK_RES_DROPPING_REF;
spin_unlock(res-spinlock);
wake_up(res-wq);
-   }
-   return 0;
+   } else
+   spin_unlock(res-spinlock);
 }
 
 static void dlm_run_purge_list(struct dlm_ctxt *dlm,
@@ -251,17 +235,7 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm,
lockres = list_entry(dlm-purge_list.next,
 struct dlm_lock_resource, purge);
 
-   /* Status of the lockres *might* change so double
-* check. If the lockres is unused, holding the dlm
-* spinlock will prevent people from getting and more
-* refs on it -- there's no need to keep the lockres
-* spinlock. */
spin_lock(lockres-spinlock);
-   

Re: [Ocfs2-devel] [PATCH 27/36] fs/ocfs2: Remove unnecessary casts of private_data

2010-07-12 Thread Joel Becker
On Mon, Jul 12, 2010 at 01:50:19PM -0700, Joe Perches wrote:
 Signed-off-by: Joe Perches j...@perches.com
 ---
  fs/ocfs2/dlm/dlmdebug.c |6 +++---
  fs/ocfs2/dlmfs/dlmfs.c  |3 +--
  fs/ocfs2/dlmglue.c  |4 ++--
  3 files changed, 6 insertions(+), 7 deletions(-)

Acked-by: Joel Becker joel.bec...@oracle.com

Btw, how hilarious is the comment for private_data?

Joel

-- 

A good programming language should have features that make the
kind of people who use the phrase software engineering shake
their heads disapprovingly.
- Paul Graham

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 1/1] ocfs2 fix o2dlm dlm run purgelist(bug 9094491) - rev3

2010-07-12 Thread Joel Becker
On Mon, Jul 12, 2010 at 03:16:28PM -0700, Srinivas Eeda wrote:
 There are two problems in dlm_run_purgelist
 
 1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge
 the same lockres instead of trying the next lockres.
 
 2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock
 before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres.
 spinlock is reacquired but in this window lockres can get reused. This leads
 to BUG.
 
 This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge
  next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the
 lockres spinlock protecting it from getting reused.
 
 Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com
 Signed-off-by: Sunil Mushransunil.mush...@oracle.com

Has this been tested with a testcase that would fail before the
change?
Also, don't add SoB lines for kernel patches.  You can add
Acked-by for Sunil, but SoB only works for the chain that goes upstream.

Joel

-- 

War doesn't determine who's right; war determines who's left.

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.bec...@oracle.com
Phone: (650) 506-8127

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 0/3] ocfs2: Tail zeroing fixes.

2010-07-12 Thread Joel Becker
On Wed, Jul 07, 2010 at 04:16:04AM -0700, Joel Becker wrote:
   This is version 3 of the ocfs2 tail zeroing fixes.  This version
 has some major changes.  Tao correctly pointed out that we can have
 multiple extents past i_size due to unwritten extents.  I've reworked
 the zeroing code to walk them all.  Since I had to do that, and I had to
 handle refcounted extents, I end up fixing a refcount bug with
 non-sparse extentds.
   There are now three patches.  The first changes our zeroing
 code to go page-by-page at the high level.  The second actually changes
 the zeroing code.  The final patch, limiting zeroing to the end of a
 write, is unchanged from v2.

Version 4 of these fixes are now in the fixes and linux-next
branches of ocfs2.git.  I'm just appending the diff to this file rather
than resending all the patches.  They've passed the first round of heavy
testing from our testers, and we're going to keep pounding on them as we
get towards 2.6.35.
Linus, I'm going to let all of the ocfs2 fixes stew in
linux-next for a few days before I send the pull request.  Figure it by
the end of the week.

Joel

diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c
index 9b3381a..356e976 100644
--- a/fs/ocfs2/aops.c
+++ b/fs/ocfs2/aops.c
@@ -1107,6 +1107,7 @@ static int ocfs2_grab_pages_for_write(struct 
address_space *mapping,
int ret = 0, i;
unsigned long start, target_index, end_index, index;
struct inode *inode = mapping-host;
+   loff_t last_byte;
 
target_index = user_pos  PAGE_CACHE_SHIFT;
 
@@ -1120,8 +1121,14 @@ static int ocfs2_grab_pages_for_write(struct 
address_space *mapping,
if (new) {
wc-w_num_pages = ocfs2_pages_per_cluster(inode-i_sb);
start = ocfs2_align_clusters_to_page_index(inode-i_sb, cpos);
-   /* This is the index *past* the write */
-   end_index = ((user_pos + user_len - 1)  PAGE_CACHE_SHIFT) + 1;
+   /*
+* We need the index *past* the last page we could possibly
+* touch.  This is the page past the end of the write or
+* i_size, whichever is greater.
+*/
+   last_byte = max(user_pos + user_len, i_size_read(inode));
+   BUG_ON(last_byte  1);
+   end_index = ((last_byte - 1)  PAGE_CACHE_SHIFT) + 1;
if ((start + wc-w_num_pages)  end_index)
wc-w_num_pages = end_index - start;
} else {
@@ -1619,6 +1626,18 @@ static int ocfs2_expand_nonsparse_inode(struct inode 
*inode,
return ret;
 }
 
+static int ocfs2_zero_tail(struct inode *inode, struct buffer_head *di_bh,
+  loff_t pos)
+{
+   int ret = 0;
+
+   BUG_ON(!ocfs2_sparse_alloc(OCFS2_SB(inode-i_sb)));
+   if (pos  i_size_read(inode))
+   ret = ocfs2_zero_extend(inode, di_bh, pos);
+
+   return ret;
+}
+
 int ocfs2_write_begin_nolock(struct address_space *mapping,
 loff_t pos, unsigned len, unsigned flags,
 struct page **pagep, void **fsdata,
@@ -1655,7 +1674,7 @@ int ocfs2_write_begin_nolock(struct address_space 
*mapping,
}
 
if (ocfs2_sparse_alloc(osb))
-   ret = ocfs2_zero_extend(inode, di_bh, pos);
+   ret = ocfs2_zero_tail(inode, di_bh, pos);
else
ret = ocfs2_expand_nonsparse_inode(inode, di_bh, pos, len,
   wc);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index 1fdc45a..ac15911 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -734,7 +734,7 @@ static handle_t 
*ocfs2_zero_start_ordered_transaction(struct inode *inode)
handle_t *handle = NULL;
int ret = 0;
 
-   if (ocfs2_should_order_data(inode))
+   if (!ocfs2_should_order_data(inode))
goto out;
 
handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS);
@@ -771,7 +771,7 @@ static int ocfs2_write_zero_page(struct inode *inode, u64 
abs_from,
unsigned zero_from, zero_to, block_start, block_end;
 
BUG_ON(abs_from = abs_to);
-   BUG_ON(abs_to  ((index + 1)  PAGE_CACHE_SHIFT));
+   BUG_ON(abs_to  (((u64)index + 1)  PAGE_CACHE_SHIFT));
BUG_ON(abs_from  (inode-i_blkbits - 1));
 
page = grab_cache_page(mapping, index);
@@ -793,8 +793,7 @@ static int ocfs2_write_zero_page(struct inode *inode, u64 
abs_from,
 index, zero_from, zero_to);
 
/* We know that zero_from is block aligned */
-   for (block_start = zero_from;
-(block_start  PAGE_CACHE_SIZE)  (block_start  zero_to);
+   for (block_start = zero_from; block_start  zero_to;
 block_start = block_end) {
block_end = block_start + (1  inode-i_blkbits);
 
@@ -966,6 +965,9 @@ int ocfs2_zero_extend(struct inode *inode, struct 
buffer_head *di_bh,
struct super_block *sb = inode-i_sb;

Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount

2010-07-12 Thread Andreas Dilger
On 2010-07-11, at 11:04, Patrick J. LoPresti wrote:
 +/* Check to make sure entire volume is addressable on this system.
 +   Requires osb_clusters_at_boot to be valid and for the journal to
 +   have been initialized by ocfs2_journal_init(). */
 +static int ocfs2_check_addressable(struct ocfs2_super *osb)
 +{
 + /* Absolute addressability check (borrowed from ext4/super.c) */
 + if ((max_block 
 +  (sector_t)(~0LL)  (osb-sb-s_blocksize_bits - 9)) ||
 + (max_block  (pgoff_t)(~0LL)  (PAGE_CACHE_SHIFT -
 +  osb-sb-s_blocksize_bits))) {
 + mlog(ML_ERROR, Volume too large 
 +  to mount safely on this system);
 + status = -EFBIG;
 + goto out;
 + }

This hunk of code is actually in several filesystems.  It wouldn't be a bad 
idea to make it a library function that can be called by the filesystem to 
check the kernel page cache and block layer can handle these large filesystems.

Cheers, Andreas






___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount

2010-07-12 Thread Patrick J. LoPresti
On Mon, Jul 12, 2010 at 5:21 PM, Andreas Dilger adil...@dilger.ca wrote:
 On 2010-07-11, at 11:04, Patrick J. LoPresti wrote:
 
 +     /* Absolute addressability check (borrowed from ext4/super.c) */
 +     if ((max_block 
 +          (sector_t)(~0LL)  (osb-sb-s_blocksize_bits - 9)) ||
 +         (max_block  (pgoff_t)(~0LL)  (PAGE_CACHE_SHIFT -
 +                                          osb-sb-s_blocksize_bits))) {
 +             mlog(ML_ERROR, Volume too large 
 +                  to mount safely on this system);
 +             status = -EFBIG;
 +             goto out;
 +     }

 This hunk of code is actually in several filesystems.  It wouldn't be a bad 
 idea to make it a library function that can be called by the filesystem to 
 check the kernel page cache and block layer can handle these large 
 filesystems.

True, but some of them do it differently (e.g. see the #if switch in
xfs_sb_validate_fsb_count).  Tracking down all variants and changing
them is a much larger task than my simple patch.

Are you suggesting I need to do this before my patch is accepted at
all?  Or is this a refactoring that can happen later?

 - Pat

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount

2010-07-12 Thread Patrick J. LoPresti
On Mon, Jul 12, 2010 at 6:25 PM, Dave Chinner da...@fromorbit.com wrote:

 The XFS code is different to the above because there is still a 16TB
 size limit on 32 bit systemsi (i.e. page cache address limits). IOWs,
 you can't just remove the above 16TB check unless you (i.e. OCFS2)
 handle 16TB block devices on 32 bit systems correctly...

If you look at my patch, you will see that is precisely what it does.
As the comments indicate, it uses the exact same check as ext4, which
will correctly refuse to mount huge volumes on 32-bit systems.

The XFS test appears to be the same thing written a little
differently.  Andreas is suggesting that somebody should factor out
this check into a common library routine.  That sounds like a fine
idea, but it also sounds orthogonal to the (simple and useful) patch I
am attempting to submit.

 - Pat

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


[Ocfs2-devel] [PATCH v2] ocfs2: Don't duplicate page passes i_size during CoW.

2010-07-12 Thread Tao Ma
Hi Joel,
 @@ -2932,12 +2932,20 @@ static int ocfs2_duplicate_clusters_by_page(handle_t 
 *handle,
  offset = ((loff_t)cpos)  OCFS2_SB(sb)-s_clustersize_bits;
  end = offset + (new_len  OCFS2_SB(sb)-s_clustersize_bits);

 +last_page = i_size_read(context-inode)  PAGE_CACHE_SHIFT;
  while (offset  end) {
 
   Why trigger on both index and byte offset?  Why not just adjust
 end?
OK, here is the udpated one.

Regards,
Tao

From b30377da2dff2b5823c6a4a66153aa035334a632 Mon Sep 17 00:00:00 2001
From: Tao Ma tao...@oracle.com
Date: Tue, 13 Jul 2010 09:53:28 +0800
Subject: [PATCH v2] ocfs2: Don't duplicate page passes i_size during CoW.

During CoW, actually all the pages after i_size contains
garbage data, so don't read and duplicate them.

Signed-off-by: Tao Ma tao...@oracle.com
---
 fs/ocfs2/refcounttree.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c
index 1cf9cda..e4e85e6 100644
--- a/fs/ocfs2/refcounttree.c
+++ b/fs/ocfs2/refcounttree.c
@@ -2931,6 +2931,13 @@ static int ocfs2_duplicate_clusters_by_page(handle_t 
*handle,
 
offset = ((loff_t)cpos)  OCFS2_SB(sb)-s_clustersize_bits;
end = offset + (new_len  OCFS2_SB(sb)-s_clustersize_bits);
+   /*
+* We only duplicate pages until we reach i_size.
+* So trim 'end' to the boundary of that page.
+*/
+   if (end  i_size_read(context-inode))
+   end = ((i_size_read(context-inode) + PAGE_CACHE_SIZE - 1) 
+PAGE_CACHE_SHIFT)  PAGE_CACHE_SHIFT;
 
while (offset  end) {
page_index = offset  PAGE_CACHE_SHIFT;
-- 
1.7.1.571.gba4d01


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount

2010-07-12 Thread Andreas Dilger
On 2010-07-12, at 19:08, Patrick J. LoPresti wrote:
 On Mon, Jul 12, 2010 at 5:21 PM, Andreas Dilger adil...@dilger.ca wrote:
 On 2010-07-11, at 11:04, Patrick J. LoPresti wrote:
 
 + /* Absolute addressability check (borrowed from ext4/super.c) */
 + if ((max_block 
 +  (sector_t)(~0LL)  (osb-sb-s_blocksize_bits - 9)) ||
 + (max_block  (pgoff_t)(~0LL)  (PAGE_CACHE_SHIFT -
 +  osb-sb-s_blocksize_bits))) {
 + mlog(ML_ERROR, Volume too large 
 +  to mount safely on this system);
 + status = -EFBIG;
 + goto out;
 + }
 
 This hunk of code is actually in several filesystems.  It wouldn't be a bad 
 idea to make it a library function that can be called by the filesystem to 
 check the kernel page cache and block layer can handle these large 
 filesystems.
 
 True, but some of them do it differently (e.g. see the #if switch in
 xfs_sb_validate_fsb_count).  Tracking down all variants and changing
 them is a much larger task than my simple patch.
 
 Are you suggesting I need to do this before my patch is accepted at
 all?  Or is this a refactoring that can happen later?

I'm just suggesting it should be done at some point.  I thought it would be 
better to do it first, rather than add yet another copy of this code.  That 
said, I hate to block useful fixes because of cleanup (and I have no control 
over OCFS2 anyway :-).  However, I've found that once the fix is in people 
usually forget (or become too busy) to do the cleanup and it just lingers on 
unseen.

Cheers, Andreas






___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel


Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount

2010-07-12 Thread Patrick J. LoPresti
On Mon, Jul 12, 2010 at 9:46 PM, Andreas Dilger adil...@dilger.ca wrote:
 On 2010-07-12, at 19:08, Patrick J. LoPresti wrote:

 Are you suggesting I need to do this before my patch is accepted at
 all?  Or is this a refactoring that can happen later?

 I'm just suggesting it should be done at some point.  I thought it would be 
 better to do it first, rather than add yet another copy of this code.  That 
 said, I hate to block useful fixes because of cleanup (and I have no control 
 over OCFS2 anyway :-).  However, I've found that once the fix is in people 
 usually forget (or become too busy) to do the cleanup and it just lingers on 
 unseen.

I hear you.

I do not object to factoring out the basic addressability test and
using it in my patch, leaving it for others -- like yourself :-) -- to
modify other file systems to invoke it.

Does that sound like a reasonable compromise?  If so, where should the
function live and what should it be called, do you think?

 - Pat

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-devel