Re: [patch] VFS: extend /proc/mounts

2008-01-17 Thread Miklos Szeredi
  The alternative (and completely safe) solution is to add another file
  to proc.  Me no likey.
 
 Since we need saner layout, I would strongly suggest exactly that.

I don't think there's all that much wrong with the current layout,
except the two dummy zeroes at the end.  Or, something else needs
fixing in there?

  major:minor -- is the major minor number of the device hosting the 
  filesystem
 
 Bad description.  Value of st_dev for files on that filesystem, please -
 there might be no such thing as the device hosting the filesystem _and_
 the value here may bloody well be unrelated to device actually holding
 all data (for things like ext2meta, etc.).

Right.

  1) The mount is a shared mount.
  2) Its peer mount of mount with id 20
  3) It is also a slave mount of the master-mount with the id  19
  4) The filesystem on device with major/minor number 98:0 and subdirectory
  mnt/1/abc makes the root directory of this mount.
  5) And finally the mount with id 16 is its parent.
 
 I'd suggest doing a new file that would *not* try to imitate /etc/mtab.
 Another thing is, how much of propagation information do we want to
 be exposed and what do we intend to do with it?

I think the scheme devised by Ram is basically right.  It shows the
relationships (slave, peer) and the ID of a master/peer mount.

What I changed, is to always show a canonical peer, because I think
that is more useful in establishing relationships between mounts.  Is
this info sensitive?  I can't see why it would be.

  Note that entire
 propagation tree is out of question - it spans many namespaces and
 contains potentially sensitive information.  So we won't see all nodes.

With multiple namespaces, of course you are only allowed to see a part
of the tree, but you could have xterms for all of them, and can put
together the big picture from the pieces.

 What do we want to *do* with the information about propagation?

Just feedback about the state of the thing.  It's very annoying, that
after setting up propagation, it's impossible to check the result.

Miklos
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] VFS: extend /proc/mounts

2008-01-17 Thread Karel Zak
On Thu, Jan 17, 2008 at 09:36:11AM +0100, Miklos Szeredi wrote:
  I'd suggest doing a new file that would *not* try to imitate /etc/mtab.
  Another thing is, how much of propagation information do we want to
  be exposed and what do we intend to do with it?
 
 I think the scheme devised by Ram is basically right.  It shows the
 relationships (slave, peer) and the ID of a master/peer mount.

 Yes. It also shows the full relationship between source and
 destination for bind mounts. Now the /proc/mounts is useless:

  # mount --bind /mnt/test /mnt/test2

  # cat /proc/mounts | grep test
  /dev/root /mnt/test2 ext3 rw,noatime,data=ordered 0 0


  What do we want to *do* with the information about propagation?
 
 Just feedback about the state of the thing.  It's very annoying, that
 after setting up propagation, it's impossible to check the result.

 Exactly.

Karel

-- 
 Karel Zak  [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] VFS: extend /proc/mounts

2008-01-17 Thread Chuck Lever

On Jan 17, 2008, at 3:55 AM, Miklos Szeredi wrote:
Hey, I just found /proc/X/mountstats.  How does this fit in to the  
big

picture?


It seems to show some counters for NFS mounts, no other filesystem
uses it.  Format looks rather less nice, than /proc/X/mounts (why do
we need long english sentences under /proc?).



I introduced /proc/self/mountstats because we need a way for non- 
block-device-based file systems to report I/O statistics.  Everything  
else I tried was rejected, and apparently what we ended up with was  
reviewed by only a handful of people, so no one else likes it or uses  
it.


It can go away for all I care, as long as we retain some flexible  
mechanism for non-block-based file systems to report I/O stats.  As  
far as I am aware, there are only two user utilities that understand  
and parse this data, and I maintain both.


--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-17 Thread Szabolcs Szakacsits

On Tue, 15 Jan 2008, Daniel Phillips wrote:

 Along with this effort, could you let me know if the world actually
 cares about online fsck?  Now we know how to do it I think, but is it
 worth the effort.

Most users seem to care deeply about things just work. Here is why 
ntfs-3g also took the online fsck path some time ago.

NTFS support had a highly bad reputation on Linux thus the new code was 
written with rigid sanity checks and extensive automatic, regression 
testing. One of the consequences is that we're detecting way too many 
inconsistencies left behind by the Windows and other NTFS drivers, 
hardware faults, device drivers.

To better utilize the non-existing developer resources, it was obvious to 
suggest the already existing Windows fsck (chkdsk) in such cases. Simple 
and safe as most people like us would think who never used Windows. 

However years of experience shows that depending on several factors chkdsk 
may start or not, may report the real problems or not, but on the other 
hand it may report bogus issues, it may run long or just forever, and it 
may even remove completely valid files. So one could perhaps even consider 
suggestions to run chkdsk a call to play Russian roulette.

Thankfully NTFS has some level of metadata redundancy with signatures and 
weak checksums which make possible to correct some common and obvious 
corruptions on the fly.

Similarly to ZFS, Windows Server 2008 also has self-healing NTFS:
http://technet2.microsoft.com/windowsserver2008/en/library/6f883d0d-3668-4e15-b7ad-4df0f6e6805d1033.mspx?mfr=true

Szaka

--
NTFS-3G:  http://ntfs-3g.org
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Btrfs-devel] [ANNOUNCE] Btrfs v0.10 (online growing/shrinking, ext3 conversion, and more)

2008-01-17 Thread Chris mason
On Tuesday 15 January 2008, Chris Mason wrote:
 Hello everyone,

 Btrfs v0.10 is now available for download from:

 http://oss.oracle.com/projects/btrfs/

Well, it turns out this release had a few small problems:

* data=ordered deadlock on older kernels (including 2.6.23)
* Compile problems when ACLs were not enabled in the kernel

So, I've put v0.11 out there.  It fixes those two problems and will also 
compile on older (2.6.18) enterprise kernels.

v0.11 does not have any disk format changes.

-chris


-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Btrfs-devel] [ANNOUNCE] Btrfs v0.10 (online growing/shrinking, ext3 conversion, and more)

2008-01-17 Thread Daniel Phillips
On Jan 17, 2008 1:25 PM, Chris mason [EMAIL PROTECTED] wrote:
 So, I've put v0.11 out there.  It fixes those two problems and will also
 compile on older (2.6.18) enterprise kernels.

 v0.11 does not have any disk format changes.

Hi Chris,

First, massive congratulations for bringing this to fruition in such a
short time.

Now back to the regular carping: why even support older kernels?

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Btrfs-devel] [ANNOUNCE] Btrfs v0.10 (online growing/shrinking, ext3 conversion, and more)

2008-01-17 Thread Chris mason
On Thursday 17 January 2008, Daniel Phillips wrote:
 On Jan 17, 2008 1:25 PM, Chris mason [EMAIL PROTECTED] wrote:
  So, I've put v0.11 out there.  It fixes those two problems and will also
  compile on older (2.6.18) enterprise kernels.
 
  v0.11 does not have any disk format changes.

 Hi Chris,

 First, massive congratulations for bringing this to fruition in such a
 short time.

 Now back to the regular carping: why even support older kernels?

The general answer is the backports are small and easy.  I don't test them 
heavily, and I don't go out of my way to make things work. 

But, they do make it easier for people to try out, and to figure how to use 
all these new features to solve problems.  Small changes that enable more 
testers are always welcome.

In general, the core parts of the kernel that btrfs uses haven't had many 
interface changes since 2.6.18, so this isn't a huge deal.

-chris
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] [CIFS] Provides DFS shrinkable submounts functionality

2008-01-17 Thread Q (Igor Mammedov)
Christoph, thanks for your review. Here is the dfs patch 1/4 I rewrote taking
into account your comments.

Patch still depends on patch 1/3 that is to be fixed yet.


Signed-off-by: Igor Mammedov [EMAIL PROTECTED]
---
 fs/cifs/Makefile   |2 +-
 fs/cifs/cifs_dfs_ref.c |  376 
 fs/cifs/cifs_dfs_ref.h |   28 
 fs/cifs/cifsfs.c   |3 +
 4 files changed, 408 insertions(+), 1 deletions(-)
 create mode 100644 fs/cifs/cifs_dfs_ref.c
 create mode 100644 fs/cifs/cifs_dfs_ref.h

diff --git a/fs/cifs/Makefile b/fs/cifs/Makefile
index 09898b8..6ba43fb 100644
--- a/fs/cifs/Makefile
+++ b/fs/cifs/Makefile
@@ -10,4 +10,4 @@ cifs-y := cifsfs.o cifssmb.o cifs_debug.o connect.o dir.o 
file.o inode.o \
 
 cifs-$(CONFIG_CIFS_UPCALL) += cifs_spnego.o
 
-cifs-$(CONFIG_CIFS_DFS_UPCALL) += dns_resolve.o
+cifs-$(CONFIG_CIFS_DFS_UPCALL) += dns_resolve.o cifs_dfs_ref.o
diff --git a/fs/cifs/cifs_dfs_ref.c b/fs/cifs/cifs_dfs_ref.c
new file mode 100644
index 000..740d99d
--- /dev/null
+++ b/fs/cifs/cifs_dfs_ref.c
@@ -0,0 +1,376 @@
+/*
+ *   Contains the CIFS DFS referral mounting routines used for handling
+ *   traversal via DFS junction point
+ *
+ *   Copyright (c) 2007 Igor Mammedov
+ *   Author(s): Igor Mammedov ([EMAIL PROTECTED])
+ *
+ *   This program is free software; you can redistribute it and/or
+ *   modify it under the terms of the GNU General Public License
+ *   as published by the Free Software Foundation; either version
+ *   2 of the License, or (at your option) any later version.
+ */
+
+#include linux/dcache.h
+#include linux/mount.h
+#include linux/namei.h
+#include linux/vfs.h
+#include linux/fs.h
+#include cifsglob.h
+#include cifsproto.h
+#include cifsfs.h
+#include dns_resolve.h
+#include cifs_dfs_ref.h
+#include cifs_debug.h
+
+LIST_HEAD(cifs_dfs_automount_list);
+
+/*
+ * DFS functions
+*/
+
+void dfs_shrink_umount_helper(struct vfsmount *vfsmnt)
+{
+   mark_mounts_for_expiry(cifs_dfs_automount_list);
+   mark_mounts_for_expiry(cifs_dfs_automount_list);
+   shrink_submounts(vfsmnt, cifs_dfs_automount_list);
+}
+
+/**
+ * cifs_get_share_name -   extracts share name from UNC
+ * @node_name: pointer to UNC string
+ *
+ * Extracts sharename form full UNC.
+ * i.e. strips from UNC trailing path that is not part of share
+ * name and fixup missing '\' in the begining of DFS node refferal
+ * if neccessary.
+ * Returns pointer to share name on success or NULL on error.
+ * Caller is responcible for freeing returned string.
+ */
+static char *cifs_get_share_name(const char *node_name)
+{
+   int len;
+   char *UNC;
+   char *pSep;
+
+   len = strlen(node_name);
+   UNC = kmalloc(len+2 /*for term null and additional \ if it's missed */,
+GFP_KERNEL);
+   if (!UNC)
+   return NULL;
+
+   /* get share name and server name */
+   if (node_name[1] != '\\') {
+   UNC[0] = '\\';
+   strncpy(UNC+1, node_name, len);
+   len++;
+   UNC[len] = 0;
+   } else {
+   strncpy(UNC, node_name, len);
+   UNC[len] = 0;
+   }
+
+   /* find server name end */
+   pSep = memchr(UNC+2, '\\', len-2);
+   if (!pSep) {
+   cERROR(1, (%s: no server name end in node name: %s,
+   __FUNCTION__, node_name));
+   kfree(UNC);
+   return NULL;
+   }
+
+   /* find sharename end */
+   pSep++;
+   pSep = memchr(UNC+(pSep-UNC), '\\', len-(pSep-UNC));
+   if (!pSep) {
+   cERROR(1, (%s:2 cant find share name in node name: %s,
+   __FUNCTION__, node_name));
+   kfree(UNC);
+   return NULL;
+   }
+   /* trim path up to sharename end
+*  * now we have share name in UNC */
+   *pSep = 0;
+
+   return UNC;
+}
+
+
+/**
+ * compose_mount_options   -   creates mount options for refferral
+ * @sb_mountdata:  parent/root DFS mount options (template)
+ * @ref_unc:   refferral server UNC
+ * @devname:   pointer for saving device name
+ *
+ * creates mount options for submount based on template options sb_mountdata
+ * and replacing unc,ip,prefixpath options with ones we've got form ref_unc.
+ *
+ * Returns: pointer to new mount options or ERR_PTR.
+ * Caller is responcible for freeing retunrned value if it is not error.
+ */
+char *compose_mount_options(const char *sb_mountdata, const char *ref_unc,
+   char **devname)
+{
+   int rc;
+   char *mountdata;
+   int md_len;
+   char *tkn_e;
+   char *srvIP = NULL;
+   char sep = ',';
+   int off, noff;
+
+   if (sb_mountdata == NULL)
+   return ERR_PTR(-EINVAL);
+
+   *devname = cifs_get_share_name(ref_unc);
+   rc = dns_resolve_server_name_to_ip(*devname, srvIP);
+   if (rc != 0) {
+   

Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-17 Thread Pavel Machek
On Tue 2008-01-15 20:36:16, Chris Mason wrote:
 On Tue, 15 Jan 2008 20:24:27 -0500
 Daniel Phillips [EMAIL PROTECTED] wrote:
 
  On Jan 15, 2008 7:15 PM, Alan Cox [EMAIL PROTECTED] wrote:
Writeback cache on disk in iteself is not bad, it only gets bad
if the disk is not engineered to save all its dirty cache on
power loss, using the disk motor as a generator or alternatively
a small battery. It would be awfully nice to know which brands
fail here, if any, because writeback cache is a big performance
booster.
  
   AFAIK no drive saves the cache. The worst case cache flush for
   drives is several seconds with no retries and a couple of minutes
   if something really bad happens.
  
   This is why the kernel has some knowledge of barriers and uses them
   to issue flushes when needed.
  
  Indeed, you are right, which is supported by actual measurements:
  
  http://sr5tech.com/write_back_cache_experiments.htm
  
  Sorry for implying that anybody has engineered a drive that can do
  such a nice thing with writeback cache.
  
  The disk motor as a generator tale may not be purely folklore.  When
  an IDE drive is not in writeback mode, something special needs to done
  to ensure the last write to media is not a scribble.
  
  A small UPS can make writeback mode actually reliable, provided the
  system is smart enough to take the drives out of writeback mode when
  the line power is off.
 
 We've had mount -o barrier=1 for ext3 for a while now, it makes
 writeback caching safe.  XFS has this on by default, as does reiserfs.

Maybe ext3 should do barriers by default? Having ext3 in lets corrupt
data by default... seems like bad idea.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch] document ext3 requirements (was Re: [RFD] Incremental fsck)

2008-01-17 Thread Daniel Phillips
On Jan 17, 2008 7:29 AM, Szabolcs Szakacsits [EMAIL PROTECTED] wrote:
 Similarly to ZFS, Windows Server 2008 also has self-healing NTFS:

I guess that is enough votes to justify going ahead and trying an
implementation of the reverse mapping ideas I posted.  But of course
more votes for this is better.  If online incremental fsck is
something people want, then please speak up here and that will very
definitely help make it happen.

On the walk-before-run principle, it would initially just be
filesystem checking, not repair.  But even this would help, by setting
per-group checked flags that offline fsck could use to do a much
quicker repair pass.  And it will let you know when a volume needs to
be taken offline without having to build in planned downtime just in
case, which already eats a bunch of nines.

Regards,

Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-17 Thread David Chinner
On Wed, Jan 16, 2008 at 01:30:43PM -0800, Valerie Henson wrote:
 Hi y'all,
 
 This is a request for comments on the rewrite of the e2fsck IO
 parallelization patches I sent out a few months ago.  The mechanism is
 totally different.  Previously IO was parallelized by issuing IOs from
 multiple threads; now a single thread issues fadvise(WILLNEED) and
 then uses read() to complete the IO.

Interesting.

We ultimately rejected a similar patch to xfs_repair (pre-population
the kernel block device cache) mainly because of low memory
performance issues and it doesn't really enable you to do anything
particularly smart with optimising I/O patterns for larger, high
performance RAID arrays.

The low memory problems were particularly bad; the readahead
thrashing cause a slowdown of 2-3x compared to the baseline and
often it was due to the repair process requiring all of memory
to cache stuff it would need later. IIRC, multi-terabyte ext3
filesystems have similar memory usage problems to XFS, so there's
a good chance that this patch will see the same sorts of issues.

 Single disk performance doesn't change, but elapsed time drops by
 about 50% on a big RAID-5 box.  Passes 1 and 2 are parallelized.  Pass
 5 is left as an exercise for the reader.

Promising results, though

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Parallelize IO for e2fsck

2008-01-17 Thread Valerie Henson
On Jan 17, 2008 5:15 PM, David Chinner [EMAIL PROTECTED] wrote:
 On Wed, Jan 16, 2008 at 01:30:43PM -0800, Valerie Henson wrote:
  Hi y'all,
 
  This is a request for comments on the rewrite of the e2fsck IO
  parallelization patches I sent out a few months ago.  The mechanism is
  totally different.  Previously IO was parallelized by issuing IOs from
  multiple threads; now a single thread issues fadvise(WILLNEED) and
  then uses read() to complete the IO.

 Interesting.

 We ultimately rejected a similar patch to xfs_repair (pre-population
 the kernel block device cache) mainly because of low memory
 performance issues and it doesn't really enable you to do anything
 particularly smart with optimising I/O patterns for larger, high
 performance RAID arrays.

 The low memory problems were particularly bad; the readahead
 thrashing cause a slowdown of 2-3x compared to the baseline and
 often it was due to the repair process requiring all of memory
 to cache stuff it would need later. IIRC, multi-terabyte ext3
 filesystems have similar memory usage problems to XFS, so there's
 a good chance that this patch will see the same sorts of issues.

That was one of my first concerns - how to avoid overflowing memory?
Whenever I screw it up on e2fsck, it does go, oh, 2 times slower due
to the minor detail of every single block being read from disk twice.
:)

I have a partial solution that sort of blindly manages the buffer
cache.  First, the user passes e2fsck a parameter saying how much
memory is available as buffer cache.  The readahead thread reads
things in and immediately throws them away so they are only in buffer
cache (no double-caching).  Then readahead and e2fsck work together so
that readahead only reads in new blocks when the main thread is done
with earlier blocks.  The already-used blocks get kicked out of buffer
cache to make room for the new ones.

What would be nice is to take into account the current total memory
usage of the whole fsck process and factor that in.  I don't think it
would be hard to add to the existing cache management framework.
Thoughts?

 Promising results, though

Thanks!  It's solving a rather simpler problem than XFS check/repair. :)

-VAL
-
To unsubscribe from this list: send the line unsubscribe linux-fsdevel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch 4/6] xip: support non-struct page backed memory

2008-01-17 Thread npiggin
Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP
for the user mappings.

This requires the get_xip_page API to be changed to an address based one.
Improve the API layering a little bit too, while we're here.

(The kaddr-pfn conversion may not be quite right for all architectures or XIP
memory mappings, and the cacheflushing may need to be added for some archs). 

This scheme has been tested and works for Jared's work-in-progress filesystem,
with s390's xip, and with the new brd driver. It is required to have XIP
filesystems on memory that isn't backed with struct page.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]
Cc: Jared Hulbert [EMAIL PROTECTED]
Cc: Carsten Otte [EMAIL PROTECTED]
Cc: Martin Schwidefsky [EMAIL PROTECTED]
Cc: Heiko Carstens [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Cc: linux-fsdevel@vger.kernel.org
---
 fs/ext2/inode.c|2 
 fs/ext2/xip.c  |   36 -
 fs/ext2/xip.h  |8 +-
 fs/open.c  |2 
 include/linux/fs.h |3 
 mm/fadvise.c   |2 
 mm/filemap_xip.c   |  191 ++---
 mm/madvise.c   |2 
 8 files changed, 122 insertions(+), 124 deletions(-)

Index: linux-2.6/fs/ext2/inode.c
===
--- linux-2.6.orig/fs/ext2/inode.c
+++ linux-2.6/fs/ext2/inode.c
@@ -800,7 +800,7 @@ const struct address_space_operations ex
 
 const struct address_space_operations ext2_aops_xip = {
.bmap   = ext2_bmap,
-   .get_xip_page   = ext2_get_xip_page,
+   .get_xip_address= ext2_get_xip_address,
 };
 
 const struct address_space_operations ext2_nobh_aops = {
Index: linux-2.6/fs/ext2/xip.c
===
--- linux-2.6.orig/fs/ext2/xip.c
+++ linux-2.6/fs/ext2/xip.c
@@ -15,24 +15,25 @@
 #include xip.h
 
 static inline int
-__inode_direct_access(struct inode *inode, sector_t sector,
- unsigned long *data)
+__inode_direct_access(struct inode *inode, sector_t block, unsigned long *data)
 {
+   sector_t sector;
BUG_ON(!inode-i_sb-s_bdev-bd_disk-fops-direct_access);
+
+   sector = block * (PAGE_SIZE / 512); /* ext2 block to bdev sector */
return inode-i_sb-s_bdev-bd_disk-fops
-   -direct_access(inode-i_sb-s_bdev,sector,data);
+   -direct_access(inode-i_sb-s_bdev, sector, data);
 }
 
 static inline int
-__ext2_get_sector(struct inode *inode, sector_t offset, int create,
+__ext2_get_block(struct inode *inode, pgoff_t pgoff, int create,
   sector_t *result)
 {
struct buffer_head tmp;
int rc;
 
memset(tmp, 0, sizeof(struct buffer_head));
-   rc = ext2_get_block(inode, offset/ (PAGE_SIZE/512), tmp,
-   create);
+   rc = ext2_get_block(inode, pgoff, tmp, create);
*result = tmp.b_blocknr;
 
/* did we get a sparse block (hole in the file)? */
@@ -45,13 +46,12 @@ __ext2_get_sector(struct inode *inode, s
 }
 
 int
-ext2_clear_xip_target(struct inode *inode, int block)
+ext2_clear_xip_target(struct inode *inode, sector_t block)
 {
-   sector_t sector = block * (PAGE_SIZE/512);
unsigned long data;
int rc;
 
-   rc = __inode_direct_access(inode, sector, data);
+   rc = __inode_direct_access(inode, block, data);
if (!rc)
clear_page((void*)data);
return rc;
@@ -69,24 +69,24 @@ void ext2_xip_verify_sb(struct super_blo
}
 }
 
-struct page *
-ext2_get_xip_page(struct address_space *mapping, sector_t offset,
-  int create)
+void *
+ext2_get_xip_address(struct address_space *mapping, pgoff_t pgoff, int create)
 {
int rc;
unsigned long data;
-   sector_t sector;
+   sector_t block;
 
/* first, retrieve the sector number */
-   rc = __ext2_get_sector(mapping-host, offset, create, sector);
+   rc = __ext2_get_block(mapping-host, pgoff, create, block);
if (rc)
goto error;
 
/* retrieve address of the target data */
-   rc = __inode_direct_access
-   (mapping-host, sector * (PAGE_SIZE/512), data);
-   if (!rc)
-   return virt_to_page(data);
+   rc = __inode_direct_access(mapping-host, block, data);
+   if (rc)
+   goto error;
+
+   return (void *)data;
 
  error:
return ERR_PTR(rc);
Index: linux-2.6/fs/ext2/xip.h
===
--- linux-2.6.orig/fs/ext2/xip.h
+++ linux-2.6/fs/ext2/xip.h
@@ -7,19 +7,19 @@
 
 #ifdef CONFIG_EXT2_FS_XIP
 extern void ext2_xip_verify_sb (struct super_block *);
-extern int ext2_clear_xip_target (struct inode *, int);
+extern int ext2_clear_xip_target (struct inode *, sector_t);
 
 static inline int ext2_use_xip (struct super_block *sb)
 {
struct ext2_sb_info *sbi = EXT2_SB(sb);
return (sbi-s_mount_opt