[Ocfs2-devel] [PATCH] ocfs2: Don't duplicate page passes i_size during CoW.
During CoW, actually all the pages after i_size contains garbage data, so don't read and duplicate them. Signed-off-by: Tao Ma tao...@oracle.com --- fs/ocfs2/refcounttree.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c index 1cf9cda..e082623 100644 --- a/fs/ocfs2/refcounttree.c +++ b/fs/ocfs2/refcounttree.c @@ -2921,7 +2921,7 @@ static int ocfs2_duplicate_clusters_by_page(handle_t *handle, struct super_block *sb = ocfs2_metadata_cache_get_super(ci); u64 new_block = ocfs2_clusters_to_blocks(sb, new_cluster); struct page *page; - pgoff_t page_index; + pgoff_t page_index, last_page; unsigned int from, to; loff_t offset, end, map_end; struct address_space *mapping = context-inode-i_mapping; @@ -2932,12 +2932,20 @@ static int ocfs2_duplicate_clusters_by_page(handle_t *handle, offset = ((loff_t)cpos) OCFS2_SB(sb)-s_clustersize_bits; end = offset + (new_len OCFS2_SB(sb)-s_clustersize_bits); + last_page = i_size_read(context-inode) PAGE_CACHE_SHIFT; while (offset end) { page_index = offset PAGE_CACHE_SHIFT; map_end = ((loff_t)page_index + 1) PAGE_CACHE_SHIFT; if (map_end end) map_end = end; + /* +* If this page is beyond the page contains i_size, +* stop duplicating it. +*/ + if (page_index last_page) + break; + /* from, to is the offset within the page. */ from = offset (PAGE_CACHE_SIZE - 1); to = PAGE_CACHE_SIZE; -- 1.7.1.571.gba4d01 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2: Don't duplicate page passes i_size during CoW.
On Mon, Jul 12, 2010 at 03:19:48PM +0800, Tao Ma wrote: During CoW, actually all the pages after i_size contains garbage data, so don't read and duplicate them. Signed-off-by: Tao Ma tao...@oracle.com --- fs/ocfs2/refcounttree.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c index 1cf9cda..e082623 100644 --- a/fs/ocfs2/refcounttree.c +++ b/fs/ocfs2/refcounttree.c @@ -2921,7 +2921,7 @@ static int ocfs2_duplicate_clusters_by_page(handle_t *handle, struct super_block *sb = ocfs2_metadata_cache_get_super(ci); u64 new_block = ocfs2_clusters_to_blocks(sb, new_cluster); struct page *page; - pgoff_t page_index; + pgoff_t page_index, last_page; unsigned int from, to; loff_t offset, end, map_end; struct address_space *mapping = context-inode-i_mapping; @@ -2932,12 +2932,20 @@ static int ocfs2_duplicate_clusters_by_page(handle_t *handle, offset = ((loff_t)cpos) OCFS2_SB(sb)-s_clustersize_bits; end = offset + (new_len OCFS2_SB(sb)-s_clustersize_bits); + last_page = i_size_read(context-inode) PAGE_CACHE_SHIFT; while (offset end) { Why trigger on both index and byte offset? Why not just adjust end? if (end i_size_read(context-inode)) end = i_size_read(inode); Then you don't need the last_page variable at all, because you're guaranteed never to go past i_size. Joel -- Three o'clock is always too late or too early for anything you want to do. - Jean-Paul Sartre Joel Becker Consulting Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking
Hi Dan, I think O2NM_MAX_NAME_LEN is the max valid length of the domain name. Regarding your patch, it changed to be that a domain name with length O2NM_MAX_NAME_LEN (NULL character not included) is not permitted. Though that check seems useless for current calls, we'd better keep it. Checking the structure, 99 struct ocfs2_cluster_connection { 100 char cc_name[GROUP_NAME_MAX]; 101 int cc_namelen; cc_name is not a NULL tailed string. the cc_namelen specifies the length of it. There does is misuse of cc_name, such as 7832 fs/ocfs2/stack_user.c user_cluster_connect rc = dlm_new_lockspace(conn-cc_name, strlen(conn-cc_name), 5308 fs/ocfs2/stack_o2cb.c o2cb_cluster_connect dlm = dlm_register_domain(conn-cc_name, dlm_key, fs_version); Also, the uuid shouldn't be treated as NULL tailed string. 142 struct ocfs2_control_message_down { 143 chartag[OCFS2_CONTROL_MESSAGE_OP_LEN]; 144 charspace1; 145 charuuid[OCFS2_TEXT_UUID_LEN]; 146 charspace2; thus, the calling of fs/ocfs2/stack_user.c:474: ocfs2_control_send_down(msg-uuid, nodenum); -ocfs2_connection_find(uuid) -size_t len = strlen(name); is suspectable. Could you please make patch for that instead? regards, wengang. On 10-07-10 16:33, Dan Carpenter wrote: This function is only called from one place and it's like this: dlm_register_domain(conn-cc_name, dlm_key, fs_version); The conn-cc_name is 64 characters long. If strlen(conn-cc_name) were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because strlen() doesn't count the NULL character. In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes 64 character buffers. The only exception is nd_name from struct o2nm_node. Anyway I looked into it and in this case the domain string comes from osb-uuid_str in ocfs2_setup_osb_uuid(). That's 32 characters and NULL which easily fits into O2NM_MAX_NAME_LEN. This patch doesn't change how the code works, but I think it makes the code a little cleaner. Signed-off-by: Dan Carpenter erro...@gmail.com --- Or we could get rid of check entirely. diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c index 6b5a492..084b051 100644 --- a/fs/ocfs2/dlm/dlmdomain.c +++ b/fs/ocfs2/dlm/dlmdomain.c @@ -1671,7 +1671,7 @@ struct dlm_ctxt * dlm_register_domain(const char *domain, struct dlm_ctxt *dlm = NULL; struct dlm_ctxt *new_ctxt = NULL; - if (strlen(domain) O2NM_MAX_NAME_LEN) { + if (strlen(domain) = O2NM_MAX_NAME_LEN) { ret = -ENAMETOOLONG; mlog(ML_ERROR, domain name length too long\n); goto leave; ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking
Hi Dan, On 10-07-12 15:39, Dan Carpenter wrote: On Mon, Jul 12, 2010 at 07:30:06PM +0800, Wengang Wang wrote: Also, the uuid shouldn't be treated as NULL tailed string. 142 struct ocfs2_control_message_down { 143 chartag[OCFS2_CONTROL_MESSAGE_OP_LEN]; 144 charspace1; 145 charuuid[OCFS2_TEXT_UUID_LEN]; 146 charspace2; The space1 and space2 characters are NULL terminators: From ocfs2_control_do_down_msg(): msg-space1 = msg-space2 = msg-newline = '\0'; I would have thought it had to be a packed struct, but it works because there are only chars in that struct. So that code is fine. Yes, it is working fine. I would rather think that structure is misleading than smart :-D. I prefer this: struct ocfs2_control_message_down { chartag[OCFS2_CONTROL_MESSAGE_OP_LEN + 1]; #define space1 tag[OCFS2_CONTROL_MESSAGE_OP_LEN] charuuid[OCFS2_TEXT_UUID_LEN + 1]; #define space2 uuid[OCFS2_TEXT_UUID_LEN] regards, wengang. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking
Acked-by: Sunil Mushran sunil.mush...@oracle.com On 07/10/2010 07:33 AM, Dan Carpenter wrote: This function is only called from one place and it's like this: dlm_register_domain(conn-cc_name, dlm_key,fs_version); The conn-cc_name is 64 characters long. If strlen(conn-cc_name) were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because strlen() doesn't count the NULL character. In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes 64 character buffers. The only exception is nd_name from struct o2nm_node. Anyway I looked into it and in this case the domain string comes from osb-uuid_str in ocfs2_setup_osb_uuid(). That's 32 characters and NULL which easily fits into O2NM_MAX_NAME_LEN. This patch doesn't change how the code works, but I think it makes the code a little cleaner. Signed-off-by: Dan Carpentererro...@gmail.com --- Or we could get rid of check entirely. diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c index 6b5a492..084b051 100644 --- a/fs/ocfs2/dlm/dlmdomain.c +++ b/fs/ocfs2/dlm/dlmdomain.c @@ -1671,7 +1671,7 @@ struct dlm_ctxt * dlm_register_domain(const char *domain, struct dlm_ctxt *dlm = NULL; struct dlm_ctxt *new_ctxt = NULL; - if (strlen(domain) O2NM_MAX_NAME_LEN) { + if (strlen(domain)= O2NM_MAX_NAME_LEN) { ret = -ENAMETOOLONG; mlog(ML_ERROR, domain name length too long\n); goto leave; ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking
So o2dlm expects a null terminated domain name. The original patch is good as it adds the check in o2dlm only. For userspace, we allow non-null terminating group (domain) name. That remains unchanged. On 07/12/2010 04:30 AM, Wengang Wang wrote: Hi Dan, I think O2NM_MAX_NAME_LEN is the max valid length of the domain name. Regarding your patch, it changed to be that a domain name with length O2NM_MAX_NAME_LEN (NULL character not included) is not permitted. Though that check seems useless for current calls, we'd better keep it. Checking the structure, 99 struct ocfs2_cluster_connection { 100 char cc_name[GROUP_NAME_MAX]; 101 int cc_namelen; cc_name is not a NULL tailed string. the cc_namelen specifies the length of it. There does is misuse of cc_name, such as 7832 fs/ocfs2/stack_user.cuser_cluster_connect rc = dlm_new_lockspace(conn-cc_name, strlen(conn-cc_name), 5308 fs/ocfs2/stack_o2cb.co2cb_cluster_connect dlm = dlm_register_domain(conn-cc_name, dlm_key,fs_version); Also, the uuid shouldn't be treated as NULL tailed string. 142 struct ocfs2_control_message_down { 143 chartag[OCFS2_CONTROL_MESSAGE_OP_LEN]; 144 charspace1; 145 charuuid[OCFS2_TEXT_UUID_LEN]; 146 charspace2; thus, the calling of fs/ocfs2/stack_user.c:474: ocfs2_control_send_down(msg-uuid, nodenum); -ocfs2_connection_find(uuid) -size_t len = strlen(name); is suspectable. Could you please make patch for that instead? regards, wengang. On 10-07-10 16:33, Dan Carpenter wrote: This function is only called from one place and it's like this: dlm_register_domain(conn-cc_name, dlm_key,fs_version); The conn-cc_name is 64 characters long. If strlen(conn-cc_name) were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because strlen() doesn't count the NULL character. In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes 64 character buffers. The only exception is nd_name from struct o2nm_node. Anyway I looked into it and in this case the domain string comes from osb-uuid_str in ocfs2_setup_osb_uuid(). That's 32 characters and NULL which easily fits into O2NM_MAX_NAME_LEN. This patch doesn't change how the code works, but I think it makes the code a little cleaner. Signed-off-by: Dan Carpentererro...@gmail.com --- Or we could get rid of check entirely. diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c index 6b5a492..084b051 100644 --- a/fs/ocfs2/dlm/dlmdomain.c +++ b/fs/ocfs2/dlm/dlmdomain.c @@ -1671,7 +1671,7 @@ struct dlm_ctxt * dlm_register_domain(const char *domain, struct dlm_ctxt *dlm = NULL; struct dlm_ctxt *new_ctxt = NULL; -if (strlen(domain) O2NM_MAX_NAME_LEN) { +if (strlen(domain)= O2NM_MAX_NAME_LEN) { ret = -ENAMETOOLONG; mlog(ML_ERROR, domain name length too long\n); goto leave; ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2: Remove the redundant cpu_to_le64.
On Thu, Jul 08, 2010 at 11:11:11AM +0800, Tao Ma wrote: In ocfs2_block_group_alloc, we set c_blkno by bg-bg_blkno. But actually bg-bg_blkno is already changed to little endian in ocfs2_block_group_fill. So remove the extra cpu_to_le64. Reported-by: Marcos Matsunaga marcos.matsun...@oracle.com Signed-off-by: Tao Ma tao...@oracle.com This patch is now in the fixes branch of ocfs2.git. Joel -- f/8 and be there. Joel Becker Consulting Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] ocfs2/dlm: don't access beyond bitmap size
On Wed, Jun 30, 2010 at 08:23:30PM +0800, Wengang Wang wrote: dlm-recovery_map is defined as unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; We should treat O2NM_MAX_NODES as the bit map size in bits. This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size. Signed-off-by: Wengang Wang wen.gang.w...@oracle.com This patch is now in the 'fixes' branch of ocfs2.git. Joel -- The zen have a saying: When you learn how to listen, ANYONE can be your teacher. Joel Becker Consulting Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 1/2 v2] ocfs2: make xattr extension work with new local alloc reservation.
On Fri, Jul 09, 2010 at 02:53:11PM +0800, Tao Ma wrote: The old ocfs2_xattr_extent_allocation is too optimistic about the clusters we can get. So actually if the file system is too fragmented, ocfs2_add_clusters_in_btree will return us with EGAIN and we need to allocate clusters once again. So this patch change it to a while loop so that we can allocate clusters until we reach clusters_to_add. Signed-off-by: Tao Ma tao...@oracle.com This patch is now in the fixes branch of ocfs2.git. Joel -- Born under a bad sign. I been down since I began to crawl. If it wasn't for bad luck, I wouldn't have no luck at all. Joel Becker Consulting Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/2 v2] ocfs2: Make xattr reflink work with new local alloc reservation.
On Fri, Jul 09, 2010 at 02:53:12PM +0800, Tao Ma wrote: The new reservation code in local alloc has add the limitation that the caller should handle the case that the local alloc doesn't give use enough contiguous clusters. It make the old xattr reflink code broken. So this patch udpate the xattr reflink code so that it can handle the case that local alloc give us one cluster at a time. Signed-off-by: Tao Ma tao...@oracle.com This patch, with the gcc warning fixed, is now in the fixes branch of ocfs2.git. Joel -- Life's Little Instruction Book #109 Know how to drive a stick shift. Joel Becker Consulting Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking
On Sat, Jul 10, 2010 at 04:33:36PM +0200, Dan Carpenter wrote: This function is only called from one place and it's like this: dlm_register_domain(conn-cc_name, dlm_key, fs_version); The conn-cc_name is 64 characters long. If strlen(conn-cc_name) were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because strlen() doesn't count the NULL character. In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes 64 character buffers. The only exception is nd_name from struct o2nm_node. Anyway I looked into it and in this case the domain string comes from osb-uuid_str in ocfs2_setup_osb_uuid(). That's 32 characters and NULL which easily fits into O2NM_MAX_NAME_LEN. This patch doesn't change how the code works, but I think it makes the code a little cleaner. Signed-off-by: Dan Carpenter erro...@gmail.com This patch is now in the fixes branch of ocfs2.git. Joel -- Viro's Razor: Any race condition, no matter how unlikely, will occur just often enough to bite you. Joel Becker Consulting Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH V3 3/8] Cleancache: core ops functions and configuration
On 07/07/2010 02:21 AM, Konrad Rzeszutek Wilk wrote: On Tue, Jun 22, 2010 at 09:26:28AM -0700, Dave Hansen wrote: On Mon, 2010-06-21 at 16:19 -0700, Dan Magenheimer wrote: --- linux-2.6.35-rc2/include/linux/cleancache.h 1969-12-31 17:00:00.0 -0700 +++ linux-2.6.35-rc2-cleancache/include/linux/cleancache.h 2010-06-21 14:45:18.0 -0600 @@ -0,0 +1,88 @@ +#ifndef _LINUX_CLEANCACHE_H +#define _LINUX_CLEANCACHE_H + +#include linux/fs.h +#include linux/mm.h + +struct cleancache_ops { + int (*init_fs)(size_t); + int (*init_shared_fs)(char *uuid, size_t); + int (*get_page)(int, ino_t, pgoff_t, struct page *); + void (*put_page)(int, ino_t, pgoff_t, struct page *); + void (*flush_page)(int, ino_t, pgoff_t); + void (*flush_inode)(int, ino_t); + void (*flush_fs)(int); +}; + How would someone go about testing this code? Is there an example cleancache implementation? Dan, Can you reference with a link or a git branch the patches that utilize this? And also mention that in the 0/X patch so that folks can reference your cleancache implementation? FYI. I am working on 'zcache' which uses cleancache_ops to provide page cache compression support. I will be posting it to LKML before end of next week. Thanks, Nitin ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH] Cleancache: shim to Xen Transcendent Memory
On Thu, Jul 08, 2010 at 09:42:08AM -0700, Dan Magenheimer wrote: Signed-off-by: Dan Magenheimer dan.magenhei...@oracle.com One nitpick: .. + +int tmem_enabled; + +static int __init enable_tmem(char *s) +{ + tmem_enabled = 1; + return 1; +} + +__setup(tmem, enable_tmem); Perhaps 'tmem_setup' is more appropiate as it might be that this function in the future would be only used to disable tmem, not actually enable it? Otherwise, it has been reviewed by me. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [PATCH 27/36] fs/ocfs2: Remove unnecessary casts of private_data
Signed-off-by: Joe Perches j...@perches.com --- fs/ocfs2/dlm/dlmdebug.c |6 +++--- fs/ocfs2/dlmfs/dlmfs.c |3 +-- fs/ocfs2/dlmglue.c |4 ++-- 3 files changed, 6 insertions(+), 7 deletions(-) diff --git a/fs/ocfs2/dlm/dlmdebug.c b/fs/ocfs2/dlm/dlmdebug.c index 0cd24cf..5efdd37 100644 --- a/fs/ocfs2/dlm/dlmdebug.c +++ b/fs/ocfs2/dlm/dlmdebug.c @@ -419,7 +419,7 @@ static loff_t debug_buffer_llseek(struct file *file, loff_t off, int whence) static int debug_buffer_release(struct inode *inode, struct file *file) { - struct debug_buffer *db = (struct debug_buffer *)file-private_data; + struct debug_buffer *db = file-private_data; if (db) kfree(db-buf); @@ -715,7 +715,7 @@ static int debug_lockres_open(struct inode *inode, struct file *file) goto bail; } - seq = (struct seq_file *) file-private_data; + seq = file-private_data; seq-private = dl; dlm_grab(dlm); @@ -731,7 +731,7 @@ bail: static int debug_lockres_release(struct inode *inode, struct file *file) { - struct seq_file *seq = (struct seq_file *)file-private_data; + struct seq_file *seq = file-private_data; struct debug_lockres *dl = (struct debug_lockres *)seq-private; if (dl-dl_res) diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c index a43ebb1..c2903b8 100644 --- a/fs/ocfs2/dlmfs/dlmfs.c +++ b/fs/ocfs2/dlmfs/dlmfs.c @@ -182,8 +182,7 @@ static int dlmfs_file_release(struct inode *inode, { int level, status; struct dlmfs_inode_private *ip = DLMFS_I(inode); - struct dlmfs_filp_private *fp = - (struct dlmfs_filp_private *) file-private_data; + struct dlmfs_filp_private *fp = file-private_data; if (S_ISDIR(inode-i_mode)) BUG(); diff --git a/fs/ocfs2/dlmglue.c b/fs/ocfs2/dlmglue.c index 39eb16a..5e02a89 100644 --- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -2966,7 +2966,7 @@ static const struct seq_operations ocfs2_dlm_seq_ops = { static int ocfs2_dlm_debug_release(struct inode *inode, struct file *file) { - struct seq_file *seq = (struct seq_file *) file-private_data; + struct seq_file *seq = file-private_data; struct ocfs2_dlm_seq_priv *priv = seq-private; struct ocfs2_lock_res *res = priv-p_iter_res; @@ -3000,7 +3000,7 @@ static int ocfs2_dlm_debug_open(struct inode *inode, struct file *file) goto out; } - seq = (struct seq_file *) file-private_data; + seq = file-private_data; seq-private = priv; ocfs2_add_lockres_tracking(priv-p_iter_res, -- 1.7.1.337.g6068.dirty ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH V3 3/8] Cleancache: core ops functions and configuration
Can you reference with a link or a git branch the patches that utilize this? And also mention that in the 0/X patch so that folks can reference your cleancache implementation? FYI. I am working on 'zcache' which uses cleancache_ops to provide page cache compression support. I will be posting it to LKML before end of next week. Yes! That too, please. Thanks for pointing this out. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking
On Mon, Jul 12, 2010 at 07:30:06PM +0800, Wengang Wang wrote: Also, the uuid shouldn't be treated as NULL tailed string. 142 struct ocfs2_control_message_down { 143 chartag[OCFS2_CONTROL_MESSAGE_OP_LEN]; 144 charspace1; 145 charuuid[OCFS2_TEXT_UUID_LEN]; 146 charspace2; The space1 and space2 characters are NULL terminators: From ocfs2_control_do_down_msg(): msg-space1 = msg-space2 = msg-newline = '\0'; I would have thought it had to be a packed struct, but it works because there are only chars in that struct. So that code is fine. regards, dan carpenter ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [patch] ocfs2: tighten up strlen() checking
This function is only called from one place and it's like this: dlm_register_domain(conn-cc_name, dlm_key, fs_version); The conn-cc_name is 64 characters long. If strlen(conn-cc_name) were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because strlen() doesn't count the NULL character. In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes 64 character buffers. The only exception is nd_name from struct o2nm_node. Anyway I looked into it and in this case the domain string comes from osb-uuid_str in ocfs2_setup_osb_uuid(). That's 32 characters and NULL which easily fits into O2NM_MAX_NAME_LEN. This patch doesn't change how the code works, but I think it makes the code a little cleaner. Signed-off-by: Dan Carpenter erro...@gmail.com --- Or we could get rid of check entirely. diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c index 6b5a492..084b051 100644 --- a/fs/ocfs2/dlm/dlmdomain.c +++ b/fs/ocfs2/dlm/dlmdomain.c @@ -1671,7 +1671,7 @@ struct dlm_ctxt * dlm_register_domain(const char *domain, struct dlm_ctxt *dlm = NULL; struct dlm_ctxt *new_ctxt = NULL; - if (strlen(domain) O2NM_MAX_NAME_LEN) { + if (strlen(domain) = O2NM_MAX_NAME_LEN) { ret = -ENAMETOOLONG; mlog(ML_ERROR, domain name length too long\n); goto leave; ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH V3 3/8] Cleancache: core ops functions and configuration
On Tue, Jun 22, 2010 at 09:26:28AM -0700, Dave Hansen wrote: On Mon, 2010-06-21 at 16:19 -0700, Dan Magenheimer wrote: --- linux-2.6.35-rc2/include/linux/cleancache.h 1969-12-31 17:00:00.0 -0700 +++ linux-2.6.35-rc2-cleancache/include/linux/cleancache.h 2010-06-21 14:45:18.0 -0600 @@ -0,0 +1,88 @@ +#ifndef _LINUX_CLEANCACHE_H +#define _LINUX_CLEANCACHE_H + +#include linux/fs.h +#include linux/mm.h + +struct cleancache_ops { + int (*init_fs)(size_t); + int (*init_shared_fs)(char *uuid, size_t); + int (*get_page)(int, ino_t, pgoff_t, struct page *); + void (*put_page)(int, ino_t, pgoff_t, struct page *); + void (*flush_page)(int, ino_t, pgoff_t); + void (*flush_inode)(int, ino_t); + void (*flush_fs)(int); +}; + How would someone go about testing this code? Is there an example cleancache implementation? Dan, Can you reference with a link or a git branch the patches that utilize this? And also mention that in the 0/X patch so that folks can reference your cleancache implementation? ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH V3 0/8] Cleancache: overview
On Mon, Jun 21, 2010 at 04:18:09PM -0700, Dan Magenheimer wrote: [PATCH V3 0/8] Cleancache: overview Dan, Two comments: - Mention where one can get the implementor of the cleancache API. Either a link to where the patches reside or a git branch. If you need pointers on branch names: http://lkml.org/lkml/2010/6/7/269 - Point out the presentation you did on this. It has an excellent overview of how this API works, and most importantly: a) images and b). performance numbers. Otherwise, please consider all of these patches to have Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com tag. ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [PATCH] Cleancache: shim to Xen Transcendent Memory
[PATCH] Cleancache: shim to Xen Transcendent Memory This companion patch to the cleancache V3 patchset (see http://lkml.org/lkml/2010/6/21/411) provides one consumer for the proposed cleancache internal kernel API. This user is Xen Transcendent Memory (tmem), supported in Xen 4.0+. (A second user, zcache, will be posted by Nitin Gupta soon.) This patch affects Xen directories only; while broader review is welcome, the patch is provided primarily to answer concerns expressed about the apparent non-existence of consumers for the proposed cleancache patch. Xen tmem provides hypervisor RAM as an ephemeral page-oriented pseudo-RAM store for cleancache pages, shared cleancache pages, and frontswap pages. Tmem provides enterprise-quality concurrency, full save/restore and live migration support, compression and deduplication. A presentation showing up to 8% faster performance and up to 52% reduction in sectors read on a kernel compile workload, despite aggressive in-kernel page reclamation (self-ballooning) can be found at: http://oss.oracle.com/projects/tmem/dist/documentation/presentations/TranscendentMemoryXenSummit2010.pdf Signed-off-by: Dan Magenheimer dan.magenhei...@oracle.com Diffstat: arch/x86/include/asm/xen/hypercall.h |7 drivers/xen/Makefile |2 drivers/xen/tmem.c | 238 + include/xen/interface/xen.h | 20 + 4 files changed, 266 insertions(+), 1 deletion(-) diff -Napur linux-2.6.35-rc4/arch/x86/include/asm/xen/hypercall.h linux-2.6.35-rc4-cleancache_tmem/arch/x86/include/asm/xen/hypercall.h --- linux-2.6.35-rc4/arch/x86/include/asm/xen/hypercall.h 2010-07-04 21:22:50.0 -0600 +++ linux-2.6.35-rc4-cleancache_tmem/arch/x86/include/asm/xen/hypercall.h 2010-07-07 15:37:46.0 -0600 @@ -417,6 +417,13 @@ HYPERVISOR_nmi_op(unsigned long op, unsi return _hypercall2(int, nmi_op, op, arg); } +static inline int +HYPERVISOR_tmem_op( + struct tmem_op *op) +{ + return _hypercall1(int, tmem_op, op); +} + static inline void MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set) { diff -Napur linux-2.6.35-rc4/drivers/xen/Makefile linux-2.6.35-rc4-cleancache_tmem/drivers/xen/Makefile --- linux-2.6.35-rc4/drivers/xen/Makefile 2010-07-04 21:22:50.0 -0600 +++ linux-2.6.35-rc4-cleancache_tmem/drivers/xen/Makefile 2010-07-07 09:38:21.0 -0600 @@ -1,5 +1,6 @@ obj-y += grant-table.o features.o events.o manage.o obj-y += xenbus/ +obj-y += tmem.o nostackp := $(call cc-option, -fno-stack-protector) CFLAGS_features.o := $(nostackp) @@ -9,4 +10,4 @@ obj-$(CONFIG_XEN_XENCOMM) += xencomm.o obj-$(CONFIG_XEN_BALLOON) += balloon.o obj-$(CONFIG_XEN_DEV_EVTCHN) += evtchn.o obj-$(CONFIG_XENFS)+= xenfs/ -obj-$(CONFIG_XEN_SYS_HYPERVISOR) += sys-hypervisor.o \ No newline at end of file +obj-$(CONFIG_XEN_SYS_HYPERVISOR) += sys-hypervisor.o diff -Napur linux-2.6.35-rc4/drivers/xen/tmem.c linux-2.6.35-rc4-cleancache_tmem/drivers/xen/tmem.c --- linux-2.6.35-rc4/drivers/xen/tmem.c 1969-12-31 17:00:00.0 -0700 +++ linux-2.6.35-rc4-cleancache_tmem/drivers/xen/tmem.c 2010-07-07 16:24:08.0 -0600 @@ -0,0 +1,238 @@ +/* + * Xen implementation for transcendent memory (tmem) + * + * Copyright (C) 2009-2010 Oracle Corp. All rights reserved. + * Author: Dan Magenheimer + */ + +#include linux/kernel.h +#include linux/types.h +#include linux/init.h +#include linux/pagemap.h +#include linux/cleancache.h + +#include xen/xen.h +#include xen/interface/xen.h +#include asm/xen/hypercall.h +#include asm/xen/page.h +#include asm/xen/hypervisor.h + +#define TMEM_CONTROL 0 +#define TMEM_NEW_POOL 1 +#define TMEM_DESTROY_POOL 2 +#define TMEM_NEW_PAGE 3 +#define TMEM_PUT_PAGE 4 +#define TMEM_GET_PAGE 5 +#define TMEM_FLUSH_PAGE6 +#define TMEM_FLUSH_OBJECT 7 +#define TMEM_READ 8 +#define TMEM_WRITE 9 +#define TMEM_XCHG 10 + +/* Bits for HYPERVISOR_tmem_op(TMEM_NEW_POOL) */ +#define TMEM_POOL_PERSIST 1 +#define TMEM_POOL_SHARED 2 +#define TMEM_POOL_PAGESIZE_SHIFT 4 + + +struct tmem_pool_uuid { + u64 uuid_lo; + u64 uuid_hi; +}; + +#define TMEM_POOL_PRIVATE_UUID { 0, 0 } + +/* flags for tmem_ops.new_pool */ +#define TMEM_POOL_PERSIST 1 +#define TMEM_POOL_SHARED 2 + +/* xen tmem foundation ops/hypercalls */ + +static inline int xen_tmem_op(u32 tmem_cmd, u32 tmem_pool, u64 object, + u32 index, unsigned long gmfn, u32 tmem_offset, u32 pfn_offset, u32 len) +{ + struct tmem_op op; + int rc = 0; + + op.cmd = tmem_cmd; + op.pool_id = tmem_pool; + op.u.gen.object = object; + op.u.gen.index = index; + op.u.gen.tmem_offset = tmem_offset; + op.u.gen.pfn_offset =
Re: [Ocfs2-devel] [PATCH 1/1] ocfs2 fix o2dlm dlm run purgelist(bug 9094491) - rev3
Joel Becker wrote: On Tue, Jun 22, 2010 at 10:48:28PM -0700, Srinivas Eeda wrote: +if (!__dlm_lockres_unused) { +mlog(ML_ERROR, found lockres %s:%.*s: in use after deref\n, + dlm-name, res-lockname.len, res-lockname.name); +__dlm_print_one_lock_resource(res); +BUG(); +} /build/jlbec/linux-2.6/working/fs/ocfs2/dlm/dlmthread.c: In function ‘dlm_purge_lockres’: /build/jlbec/linux-2.6/working/fs/ocfs2/dlm/dlmthread.c:203: warning: the address of ‘__dlm_lockres_unused’ will always evaluate as ‘true’ Was this even tested? I'm leaving this patch out of 'fixes' until corrected and tested. Sorry, I had the typo while making the review changes. I ran the usual tests but that didn't catch this problem. I should have payed more attention to the build log. I made the change and tested it, will send you the modified patch Joel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [PATCH 1/1] ocfs2 fix o2dlm dlm run purgelist(bug 9094491) - rev3
There are two problems in dlm_run_purgelist 1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge the same lockres instead of trying the next lockres. 2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres. spinlock is reacquired but in this window lockres can get reused. This leads to BUG. This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the lockres spinlock protecting it from getting reused. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com Signed-off-by: Sunil Mushransunil.mush...@oracle.com --- fs/ocfs2/dlm/dlmthread.c | 79 +++-- 1 files changed, 33 insertions(+), 46 deletions(-) diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index 11a6d1f..76ab4aa 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -152,45 +152,25 @@ void dlm_lockres_calc_usage(struct dlm_ctxt *dlm, spin_unlock(dlm-spinlock); } -static int dlm_purge_lockres(struct dlm_ctxt *dlm, +static void dlm_purge_lockres(struct dlm_ctxt *dlm, struct dlm_lock_resource *res) { int master; int ret = 0; - spin_lock(res-spinlock); - if (!__dlm_lockres_unused(res)) { - mlog(0, %s:%.*s: tried to purge but not unused\n, -dlm-name, res-lockname.len, res-lockname.name); - __dlm_print_one_lock_resource(res); - spin_unlock(res-spinlock); - BUG(); - } - - if (res-state DLM_LOCK_RES_MIGRATING) { - mlog(0, %s:%.*s: Delay dropref as this lockres is -being remastered\n, dlm-name, res-lockname.len, -res-lockname.name); - /* Re-add the lockres to the end of the purge list */ - if (!list_empty(res-purge)) { - list_del_init(res-purge); - list_add_tail(res-purge, dlm-purge_list); - } - spin_unlock(res-spinlock); - return 0; - } + assert_spin_locked(dlm-spinlock); + assert_spin_locked(res-spinlock); master = (res-owner == dlm-node_num); - if (!master) - res-state |= DLM_LOCK_RES_DROPPING_REF; - spin_unlock(res-spinlock); mlog(0, purging lockres %.*s, master = %d\n, res-lockname.len, res-lockname.name, master); if (!master) { + res-state |= DLM_LOCK_RES_DROPPING_REF; /* drop spinlock... retake below */ + spin_unlock(res-spinlock); spin_unlock(dlm-spinlock); spin_lock(res-spinlock); @@ -208,31 +188,35 @@ static int dlm_purge_lockres(struct dlm_ctxt *dlm, mlog(0, %s:%.*s: dlm_deref_lockres returned %d\n, dlm-name, res-lockname.len, res-lockname.name, ret); spin_lock(dlm-spinlock); + spin_lock(res-spinlock); } - spin_lock(res-spinlock); if (!list_empty(res-purge)) { mlog(0, removing lockres %.*s:%p from purgelist, master = %d\n, res-lockname.len, res-lockname.name, res, master); list_del_init(res-purge); - spin_unlock(res-spinlock); dlm_lockres_put(res); dlm-purge_count--; - } else - spin_unlock(res-spinlock); + } + + if (!__dlm_lockres_unused(res)) { + mlog(ML_ERROR, found lockres %s:%.*s: in use after deref\n, +dlm-name, res-lockname.len, res-lockname.name); + __dlm_print_one_lock_resource(res); + BUG(); + } __dlm_unhash_lockres(res); /* lockres is not in the hash now. drop the flag and wake up * any processes waiting in dlm_get_lock_resource. */ if (!master) { - spin_lock(res-spinlock); res-state = ~DLM_LOCK_RES_DROPPING_REF; spin_unlock(res-spinlock); wake_up(res-wq); - } - return 0; + } else + spin_unlock(res-spinlock); } static void dlm_run_purge_list(struct dlm_ctxt *dlm, @@ -251,17 +235,7 @@ static void dlm_run_purge_list(struct dlm_ctxt *dlm, lockres = list_entry(dlm-purge_list.next, struct dlm_lock_resource, purge); - /* Status of the lockres *might* change so double -* check. If the lockres is unused, holding the dlm -* spinlock will prevent people from getting and more -* refs on it -- there's no need to keep the lockres -* spinlock. */ spin_lock(lockres-spinlock); -
Re: [Ocfs2-devel] [PATCH 27/36] fs/ocfs2: Remove unnecessary casts of private_data
On Mon, Jul 12, 2010 at 01:50:19PM -0700, Joe Perches wrote: Signed-off-by: Joe Perches j...@perches.com --- fs/ocfs2/dlm/dlmdebug.c |6 +++--- fs/ocfs2/dlmfs/dlmfs.c |3 +-- fs/ocfs2/dlmglue.c |4 ++-- 3 files changed, 6 insertions(+), 7 deletions(-) Acked-by: Joel Becker joel.bec...@oracle.com Btw, how hilarious is the comment for private_data? Joel -- A good programming language should have features that make the kind of people who use the phrase software engineering shake their heads disapprovingly. - Paul Graham Joel Becker Consulting Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 1/1] ocfs2 fix o2dlm dlm run purgelist(bug 9094491) - rev3
On Mon, Jul 12, 2010 at 03:16:28PM -0700, Srinivas Eeda wrote: There are two problems in dlm_run_purgelist 1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge the same lockres instead of trying the next lockres. 2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres. spinlock is reacquired but in this window lockres can get reused. This leads to BUG. This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the lockres spinlock protecting it from getting reused. Signed-off-by: Srinivas Eeda srinivas.e...@oracle.com Signed-off-by: Sunil Mushransunil.mush...@oracle.com Has this been tested with a testcase that would fail before the change? Also, don't add SoB lines for kernel patches. You can add Acked-by for Sunil, but SoB only works for the chain that goes upstream. Joel -- War doesn't determine who's right; war determines who's left. Joel Becker Consulting Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 0/3] ocfs2: Tail zeroing fixes.
On Wed, Jul 07, 2010 at 04:16:04AM -0700, Joel Becker wrote: This is version 3 of the ocfs2 tail zeroing fixes. This version has some major changes. Tao correctly pointed out that we can have multiple extents past i_size due to unwritten extents. I've reworked the zeroing code to walk them all. Since I had to do that, and I had to handle refcounted extents, I end up fixing a refcount bug with non-sparse extentds. There are now three patches. The first changes our zeroing code to go page-by-page at the high level. The second actually changes the zeroing code. The final patch, limiting zeroing to the end of a write, is unchanged from v2. Version 4 of these fixes are now in the fixes and linux-next branches of ocfs2.git. I'm just appending the diff to this file rather than resending all the patches. They've passed the first round of heavy testing from our testers, and we're going to keep pounding on them as we get towards 2.6.35. Linus, I'm going to let all of the ocfs2 fixes stew in linux-next for a few days before I send the pull request. Figure it by the end of the week. Joel diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 9b3381a..356e976 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -1107,6 +1107,7 @@ static int ocfs2_grab_pages_for_write(struct address_space *mapping, int ret = 0, i; unsigned long start, target_index, end_index, index; struct inode *inode = mapping-host; + loff_t last_byte; target_index = user_pos PAGE_CACHE_SHIFT; @@ -1120,8 +1121,14 @@ static int ocfs2_grab_pages_for_write(struct address_space *mapping, if (new) { wc-w_num_pages = ocfs2_pages_per_cluster(inode-i_sb); start = ocfs2_align_clusters_to_page_index(inode-i_sb, cpos); - /* This is the index *past* the write */ - end_index = ((user_pos + user_len - 1) PAGE_CACHE_SHIFT) + 1; + /* +* We need the index *past* the last page we could possibly +* touch. This is the page past the end of the write or +* i_size, whichever is greater. +*/ + last_byte = max(user_pos + user_len, i_size_read(inode)); + BUG_ON(last_byte 1); + end_index = ((last_byte - 1) PAGE_CACHE_SHIFT) + 1; if ((start + wc-w_num_pages) end_index) wc-w_num_pages = end_index - start; } else { @@ -1619,6 +1626,18 @@ static int ocfs2_expand_nonsparse_inode(struct inode *inode, return ret; } +static int ocfs2_zero_tail(struct inode *inode, struct buffer_head *di_bh, + loff_t pos) +{ + int ret = 0; + + BUG_ON(!ocfs2_sparse_alloc(OCFS2_SB(inode-i_sb))); + if (pos i_size_read(inode)) + ret = ocfs2_zero_extend(inode, di_bh, pos); + + return ret; +} + int ocfs2_write_begin_nolock(struct address_space *mapping, loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata, @@ -1655,7 +1674,7 @@ int ocfs2_write_begin_nolock(struct address_space *mapping, } if (ocfs2_sparse_alloc(osb)) - ret = ocfs2_zero_extend(inode, di_bh, pos); + ret = ocfs2_zero_tail(inode, di_bh, pos); else ret = ocfs2_expand_nonsparse_inode(inode, di_bh, pos, len, wc); diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 1fdc45a..ac15911 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -734,7 +734,7 @@ static handle_t *ocfs2_zero_start_ordered_transaction(struct inode *inode) handle_t *handle = NULL; int ret = 0; - if (ocfs2_should_order_data(inode)) + if (!ocfs2_should_order_data(inode)) goto out; handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS); @@ -771,7 +771,7 @@ static int ocfs2_write_zero_page(struct inode *inode, u64 abs_from, unsigned zero_from, zero_to, block_start, block_end; BUG_ON(abs_from = abs_to); - BUG_ON(abs_to ((index + 1) PAGE_CACHE_SHIFT)); + BUG_ON(abs_to (((u64)index + 1) PAGE_CACHE_SHIFT)); BUG_ON(abs_from (inode-i_blkbits - 1)); page = grab_cache_page(mapping, index); @@ -793,8 +793,7 @@ static int ocfs2_write_zero_page(struct inode *inode, u64 abs_from, index, zero_from, zero_to); /* We know that zero_from is block aligned */ - for (block_start = zero_from; -(block_start PAGE_CACHE_SIZE) (block_start zero_to); + for (block_start = zero_from; block_start zero_to; block_start = block_end) { block_end = block_start + (1 inode-i_blkbits); @@ -966,6 +965,9 @@ int ocfs2_zero_extend(struct inode *inode, struct buffer_head *di_bh, struct super_block *sb = inode-i_sb;
Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount
On 2010-07-11, at 11:04, Patrick J. LoPresti wrote: +/* Check to make sure entire volume is addressable on this system. + Requires osb_clusters_at_boot to be valid and for the journal to + have been initialized by ocfs2_journal_init(). */ +static int ocfs2_check_addressable(struct ocfs2_super *osb) +{ + /* Absolute addressability check (borrowed from ext4/super.c) */ + if ((max_block + (sector_t)(~0LL) (osb-sb-s_blocksize_bits - 9)) || + (max_block (pgoff_t)(~0LL) (PAGE_CACHE_SHIFT - + osb-sb-s_blocksize_bits))) { + mlog(ML_ERROR, Volume too large + to mount safely on this system); + status = -EFBIG; + goto out; + } This hunk of code is actually in several filesystems. It wouldn't be a bad idea to make it a library function that can be called by the filesystem to check the kernel page cache and block layer can handle these large filesystems. Cheers, Andreas ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount
On Mon, Jul 12, 2010 at 5:21 PM, Andreas Dilger adil...@dilger.ca wrote: On 2010-07-11, at 11:04, Patrick J. LoPresti wrote: + /* Absolute addressability check (borrowed from ext4/super.c) */ + if ((max_block + (sector_t)(~0LL) (osb-sb-s_blocksize_bits - 9)) || + (max_block (pgoff_t)(~0LL) (PAGE_CACHE_SHIFT - + osb-sb-s_blocksize_bits))) { + mlog(ML_ERROR, Volume too large + to mount safely on this system); + status = -EFBIG; + goto out; + } This hunk of code is actually in several filesystems. It wouldn't be a bad idea to make it a library function that can be called by the filesystem to check the kernel page cache and block layer can handle these large filesystems. True, but some of them do it differently (e.g. see the #if switch in xfs_sb_validate_fsb_count). Tracking down all variants and changing them is a much larger task than my simple patch. Are you suggesting I need to do this before my patch is accepted at all? Or is this a refactoring that can happen later? - Pat ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount
On Mon, Jul 12, 2010 at 6:25 PM, Dave Chinner da...@fromorbit.com wrote: The XFS code is different to the above because there is still a 16TB size limit on 32 bit systemsi (i.e. page cache address limits). IOWs, you can't just remove the above 16TB check unless you (i.e. OCFS2) handle 16TB block devices on 32 bit systems correctly... If you look at my patch, you will see that is precisely what it does. As the comments indicate, it uses the exact same check as ext4, which will correctly refuse to mount huge volumes on 32-bit systems. The XFS test appears to be the same thing written a little differently. Andreas is suggesting that somebody should factor out this check into a common library routine. That sounds like a fine idea, but it also sounds orthogonal to the (simple and useful) patch I am attempting to submit. - Pat ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
[Ocfs2-devel] [PATCH v2] ocfs2: Don't duplicate page passes i_size during CoW.
Hi Joel, @@ -2932,12 +2932,20 @@ static int ocfs2_duplicate_clusters_by_page(handle_t *handle, offset = ((loff_t)cpos) OCFS2_SB(sb)-s_clustersize_bits; end = offset + (new_len OCFS2_SB(sb)-s_clustersize_bits); +last_page = i_size_read(context-inode) PAGE_CACHE_SHIFT; while (offset end) { Why trigger on both index and byte offset? Why not just adjust end? OK, here is the udpated one. Regards, Tao From b30377da2dff2b5823c6a4a66153aa035334a632 Mon Sep 17 00:00:00 2001 From: Tao Ma tao...@oracle.com Date: Tue, 13 Jul 2010 09:53:28 +0800 Subject: [PATCH v2] ocfs2: Don't duplicate page passes i_size during CoW. During CoW, actually all the pages after i_size contains garbage data, so don't read and duplicate them. Signed-off-by: Tao Ma tao...@oracle.com --- fs/ocfs2/refcounttree.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/fs/ocfs2/refcounttree.c b/fs/ocfs2/refcounttree.c index 1cf9cda..e4e85e6 100644 --- a/fs/ocfs2/refcounttree.c +++ b/fs/ocfs2/refcounttree.c @@ -2931,6 +2931,13 @@ static int ocfs2_duplicate_clusters_by_page(handle_t *handle, offset = ((loff_t)cpos) OCFS2_SB(sb)-s_clustersize_bits; end = offset + (new_len OCFS2_SB(sb)-s_clustersize_bits); + /* +* We only duplicate pages until we reach i_size. +* So trim 'end' to the boundary of that page. +*/ + if (end i_size_read(context-inode)) + end = ((i_size_read(context-inode) + PAGE_CACHE_SIZE - 1) +PAGE_CACHE_SHIFT) PAGE_CACHE_SHIFT; while (offset end) { page_index = offset PAGE_CACHE_SHIFT; -- 1.7.1.571.gba4d01 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount
On 2010-07-12, at 19:08, Patrick J. LoPresti wrote: On Mon, Jul 12, 2010 at 5:21 PM, Andreas Dilger adil...@dilger.ca wrote: On 2010-07-11, at 11:04, Patrick J. LoPresti wrote: + /* Absolute addressability check (borrowed from ext4/super.c) */ + if ((max_block + (sector_t)(~0LL) (osb-sb-s_blocksize_bits - 9)) || + (max_block (pgoff_t)(~0LL) (PAGE_CACHE_SHIFT - + osb-sb-s_blocksize_bits))) { + mlog(ML_ERROR, Volume too large + to mount safely on this system); + status = -EFBIG; + goto out; + } This hunk of code is actually in several filesystems. It wouldn't be a bad idea to make it a library function that can be called by the filesystem to check the kernel page cache and block layer can handle these large filesystems. True, but some of them do it differently (e.g. see the #if switch in xfs_sb_validate_fsb_count). Tracking down all variants and changing them is a much larger task than my simple patch. Are you suggesting I need to do this before my patch is accepted at all? Or is this a refactoring that can happen later? I'm just suggesting it should be done at some point. I thought it would be better to do it first, rather than add yet another copy of this code. That said, I hate to block useful fixes because of cleanup (and I have no control over OCFS2 anyway :-). However, I've found that once the fix is in people usually forget (or become too busy) to do the cleanup and it just lingers on unseen. Cheers, Andreas ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 2/2] OCFS2: Allow huge ( 16 TiB) volumes to mount
On Mon, Jul 12, 2010 at 9:46 PM, Andreas Dilger adil...@dilger.ca wrote: On 2010-07-12, at 19:08, Patrick J. LoPresti wrote: Are you suggesting I need to do this before my patch is accepted at all? Or is this a refactoring that can happen later? I'm just suggesting it should be done at some point. I thought it would be better to do it first, rather than add yet another copy of this code. That said, I hate to block useful fixes because of cleanup (and I have no control over OCFS2 anyway :-). However, I've found that once the fix is in people usually forget (or become too busy) to do the cleanup and it just lingers on unseen. I hear you. I do not object to factoring out the basic addressability test and using it in my patch, leaving it for others -- like yourself :-) -- to modify other file systems to invoke it. Does that sound like a reasonable compromise? If so, where should the function live and what should it be called, do you think? - Pat ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel