Re: [Lustre-discuss] Future of lustre 1.8.3+

Dardo D Kleiner - CONTRACTOR Wed, 19 May 2010 18:07:03 -0700

Kevin, et al -

I'm both personally and professionally encouraged to hear ClusterStor stand
up and publicly state intent to support Lustre on SLES.  The million dollar
question is in regards to *server* support - in particular wrt the 2.x
series.  As a token of my interest, as well as a testament to my limited
ability to maintain this in the long run, I submit the attached patches to
the recent 1.10.0.40 beta release that enable building (and confirmed to be
runnable) the server on current SLES11 kernel (2.6.27.45-0.1-default).  One
caveat is that quota support is not compilable and appears to be a bit more
difficult job than I can probably manage.  And I most certainly didn't run
a full regression suite, but a straightforward single stream read/write
appears to work fine.


Up to .38 it was mostly monkey work - but .40 introduced additional patches
to the RHEL ext4 implementation that has more substantially diverged from
the one in current SLES11.  Perhaps SLES11SP1 will converge better...

There's perhaps a Bugzilla report where this is better posted, and tomorrow
I'll look around a bit more for that, but I felt like getting it out there
asap.  This has been a topic of much interest in my community and I'm
starting to feel a bit alone in my desire to keep SLE across the board in
my environment.  I've invested quite a bit of time and effort there and
though many are fine with black box appliances, in our research environment
I prefer to have more transparency.

Sincerely - Dardo

On 5/19/2010 11:21 AM, Kevin Canady wrote:

Quick "Public Service Announcement"

ClusterStor is and will be providing support services for SLES on both 1.8x and 
2.x releases.  If anyone would like to receive additional information please 
contact me at [email protected]  or 415.505.7701

Best regards,
Kevin

P. Kevin Canady
Vice President,
ClusterStor Inc.
415.505.7701
[email protected]

On May 19, 2010, at 8:01 AM, Andreas Dilger wrote:

I've used a SLES kernel on an FC install for a long time on my home
system. With newer distros there are also fewer changes to the base
kernel, so there shouldn't be as much trouble to use e.g. the SLES 11
SP1 kernel (2.6.32) when it is released.

Cheers, Andreas

On 2010-05-19, at 6:01, Heiko Schröter<[email protected]
e>  wrote:

Am Mittwoch 19 Mai 2010, um 10:33:04 schrieben Sie:

On 2010-05-19, at 01:40, Heiko Schröter wrote:

we would like to know which way lustre is heading.

 From the s/w repository we see that only redhat and suse ditros
seems to be supported.


Is this the official policy of the lustre development to stick to
(only) these two distros ?


On the client side, we will support the main distros that our
customers are using, namely RHEL/OEL/CentOS 5.x (and 6.x after
release), and SLES 10/11.  We make a best-effort attempt to have
the client work with all client kernels, but since our resources
are limited we cannot test kernels other than the supported ones.
I don't see any huge demand for e.g. an officially-supported Ubuntu
client kernel, but there has long been an unofficial Debian lustre
package.

On the server side, we will continue to support RHEL5.x and
SLES10/11 for the Lustre 1.8 release, and RHEL 5.x (6.x is being
worked on) for the Lustre 2.x release.  Since maintaining kernel
patches for other kernels is a lot of work, we do not attempt to
provide patches for other than official kernels.  However, there
have in the past been ports of the kernel patches to other kernels
by external contributors (e.g. FC11, FC12, etc) and this will
hopefully continue in the future.


The server side is the more critical part as we are using gentoo
+lustre running a vanilla kernel 2.6.22.19 with the lustre patches
version 1.6.6.
As far as we are concerned it would be nice to have the pathces for
the "vanilla-kernels" in 1.8.3+. This would be just fine.

On the other hand if maintaining is the key problem on your side
what would be a major argument against using a patched sles/rhel on
a lustre server not running the sles/rhel distro ?
I know a lot of things can happen but are these rhel/sles patches do
brake some key features of the kernel which would  only work under
that specific distro ?
I've positivley tested a lustre client with a sles patched kernel on
a gentoo distro. But i'am a bit nervous about testing it on our live
lustre server system.

If not, then the sun src patches are still missing in the lustre
AND e2fsprogs branches.


I'm not sure what you mean.  The e2fsprogs patches have always been
in a separate repository from the core Lustre code, and all of the
Lustre/ldiskfs kernel patches are in the Git repository.


I know. But the patches are missing for 1.41.10 in that repo. i.e.
as this here "e2fsprogs-1.41.6.sun1-patches.tgz"

Thanks very much for your help.
Regards
Heiko

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss


_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

--- lustre/configure    2010-04-11 22:05:30.000000000 +0000
+++ lustre.sles11/configure     2010-05-12 16:14:46.618989759 +0000
@@ -16368,7 +16368,6 @@
 else
        case $LINUXRELEASE in
        # ext4 was in 2.6.22-2.6.26 but not stable enough to use
-       2.6.2[0-9]*) enable_ext4='no' ;;
        *)  if test -f $LINUX/fs/ext4/ext4.h ; then
 enable_ext4='yes'
 else
--- lustre/lustre/kernel_patches/series/2.6-sles11.series     2010-01-11 
03:29:50.000000000 +0000
+++ lustre.sles11/lustre/kernel_patches/series/2.6-sles11.series      
2010-05-11 17:39:50.018989337 +0000
@@ -1,10 +1,10 @@
+lustre_version.patch
 vfs_races-2.6.22-vanilla.patch
 iopen-misc-2.6.22-vanilla.patch
 export_symbols-2.6.22-vanilla.patch 
 dev_read_only-2.6.27-vanilla.patch 
 export-2.6.27-vanilla.patch 
 sd_iostats-2.6.27-vanilla.patch
-blkdev_tunables-2.6-sles11.patch
 md-mmp-unplug-dev-sles11.patch
 quota-support-64-bit-quota-format.patch
 jbd2-jcberr-2.6-sles11.patch
--- lustre/lustre/kernel_patches/patches/dev_read_only-2.6.27-vanilla.patch     
2010-03-08 19:43:41.281514629 +0000
+++ 
lustre.sles11/lustre/kernel_patches/patches/dev_read_only-2.6.27-vanilla.patch  
2010-03-08 19:45:12.750071753 +0000
@@ -28,7 +28,7 @@
 +                              bio->bi_idx, bio->bi_size,
 +                              atomic_read(&bio->bi_cnt), bio->bi_private);
 +                       set_bit(BIO_RDONLY, &bio->bi_flags);
-+                       bio_endio(bio, bio->bi_size, 0);
++                       bio_endio(bio, 0);
 +                       clear_bit(BIO_RDONLY, &bio->bi_flags);
 +                       break;
 +               }
--- lustre/lustre/ptlrpc/gss/gss_krb5_mech.c  2010-01-26 19:31:27.000000000 
+0000
+++ lustre.sles11/lustre/ptlrpc/gss/gss_krb5_mech.c   2010-01-27 
13:44:36.951151277 +0000
@@ -533,9 +533,7 @@
 static
 void buf_to_sg(struct scatterlist *sg, void *ptr, int len)
 {
-        sg->page = virt_to_page(ptr);
-        sg->offset = offset_in_page(ptr);
-        sg->length = len;
+       sg_set_page(sg, virt_to_page(ptr), len, offset_in_page(ptr));
 }
 
 static
@@ -612,9 +610,7 @@
         for (i = 0; i < iovcnt; i++) {
                 if (iovs[i].kiov_len == 0)
                         continue;
-                sg[0].page = iovs[i].kiov_page;
-                sg[0].offset = iovs[i].kiov_offset;
-                sg[0].length = iovs[i].kiov_len;
+               sg_set_page(&sg[0], iovs[i].kiov_page, iovs[i].kiov_len, 
iovs[i].kiov_offset);
                 ll_crypto_hash_update(&desc, sg, iovs[i].kiov_len);
         }
 
@@ -651,9 +647,7 @@
         for (i = 0; i < iovcnt; i++) {
                 if (iovs[i].kiov_len == 0)
                         continue;
-                sg[0].page = iovs[i].kiov_page;
-                sg[0].offset = iovs[i].kiov_offset;
-                sg[0].length = iovs[i].kiov_len;
+               sg_set_page(sg[0], iovs[i].kiov_page, iovs[i].kiov_len, 
iovs[i].kiov_offset);
                 crypto_hmac_update(tfm, sg, 1);
         }
 
@@ -696,9 +690,7 @@
         for (i = 0; i < iovcnt; i++) {
                 if (iovs[i].kiov_len == 0)
                         continue;
-                sg[0].page = iovs[i].kiov_page;
-                sg[0].offset = iovs[i].kiov_offset;
-                sg[0].length = iovs[i].kiov_len;
+               sg_set_page(&sg[0], iovs[i].kiov_page, iovs[i].kiov_len, 
iovs[i].kiov_offset);
                 ll_crypto_hash_update(&desc, sg, iovs[i].kiov_len);
         }
 
@@ -1023,17 +1015,14 @@
 
         /* encrypt clear pages */
         for (i = 0; i < desc->bd_iov_count; i++) {
-                src.page = desc->bd_iov[i].kiov_page;
-                src.offset = desc->bd_iov[i].kiov_offset;
-                src.length = (desc->bd_iov[i].kiov_len + blocksize - 1) &
-                             (~(blocksize - 1));
+               sg_set_page(&src, desc->bd_iov[i].kiov_page,
+                       (desc->bd_iov[i].kiov_len + blocksize - 1) & 
(~(blocksize - 1)),
+                       desc->bd_iov[i].kiov_offset);
 
                 if (adj_nob)
                         nob += src.length;
 
-                dst.page = desc->bd_enc_iov[i].kiov_page;
-                dst.offset = src.offset;
-                dst.length = src.length;
+               sg_set_page(&dst, desc->bd_enc_iov[i].kiov_page, src.length, 
src.offset);
 
                 desc->bd_enc_iov[i].kiov_offset = dst.offset;
                 desc->bd_enc_iov[i].kiov_len = dst.length;
@@ -1150,13 +1139,13 @@
                 if (desc->bd_enc_iov[i].kiov_len == 0)
                         continue;
 
-                src.page = desc->bd_enc_iov[i].kiov_page;
-                src.offset = desc->bd_enc_iov[i].kiov_offset;
-                src.length = desc->bd_enc_iov[i].kiov_len;
+               sg_set_page(&src, desc->bd_enc_iov[i].kiov_page,
+                       desc->bd_enc_iov[i].kiov_len,
+                       desc->bd_enc_iov[i].kiov_offset);
 
                 dst = src;
                 if (desc->bd_iov[i].kiov_len % blocksize == 0)
-                        dst.page = desc->bd_iov[i].kiov_page;
+                       sg_set_page(&dst, desc->bd_iov[i].kiov_page, 
dst.length, dst.offset);
 
                 rc = ll_crypto_blkcipher_decrypt_iv(&ciph_desc, &dst, &src,
                                                     src.length);
--- lustre/lustre/mds/handler.c       2010-01-26 19:31:26.000000000 +0000
+++ lustre.sles11/lustre/mds/handler.c        2010-01-27 13:45:14.215224004 
+0000
@@ -368,7 +368,7 @@
         mds_init_ctxt(obd, mnt);
 
         push_ctxt(&saved, &obd->obd_lvfs_ctxt, NULL);
-        dentry = simple_mkdir(current->fs->pwd, mnt, "OBJECTS", 0777, 1);
+        dentry = simple_mkdir(current->fs->pwd.dentry, mnt, "OBJECTS", 0777, 
1);
         if (IS_ERR(dentry)) {
                 rc = PTR_ERR(dentry);
                 CERROR("cannot create OBJECTS directory: rc = %d\n", rc);
@@ -376,7 +376,7 @@
         }
         mds->mds_objects_dir = dentry;
 
-        dentry = lookup_one_len("__iopen__", current->fs->pwd,
+        dentry = lookup_one_len("__iopen__", current->fs->pwd.dentry,
                                 strlen("__iopen__"));
         if (IS_ERR(dentry)) {
                 rc = PTR_ERR(dentry);
@@ -449,7 +449,7 @@
         }
 
         dput(mds->mds_fid_de);
-        LL_DQUOT_OFF(obd->u.obt.obt_sb);
+        LL_DQUOT_OFF(obd->u.obt.obt_sb, 0);
         shrink_dcache_sb(mds->mds_obt.obt_sb);
         fsfilt_put_ops(obd->obd_fsops);
 
--- lustre/lustre/mgs/mgs_fs.c        2010-01-26 19:31:26.000000000 +0000
+++ lustre.sles11/lustre/mgs/mgs_fs.c 2010-01-27 13:45:30.367418797 +0000
@@ -197,7 +197,7 @@
         push_ctxt(&saved, &obd->obd_lvfs_ctxt, NULL);
 
         /* Setup the configs dir */
-        dentry = simple_mkdir(current->fs->pwd, mnt, MOUNT_CONFIGS_DIR, 0777, 
1);
+        dentry = simple_mkdir(current->fs->pwd.dentry, mnt, MOUNT_CONFIGS_DIR, 
0777, 1);
         if (IS_ERR(dentry)) {
                 rc = PTR_ERR(dentry);
                 CERROR("cannot create %s directory: rc = %d\n",
@@ -208,7 +208,7 @@
 
         /* Need the iopen dir for fid2dentry, required by
            LLOG_ORIGIN_HANDLE_READ_HEADER */
-        dentry = lookup_one_len("__iopen__", current->fs->pwd,
+        dentry = lookup_one_len("__iopen__", current->fs->pwd.dentry,
                                 strlen("__iopen__"));
         if (IS_ERR(dentry)) {
                 rc = PTR_ERR(dentry);
--- lustre/lustre/obdfilter/filter.c  2010-01-26 19:31:27.000000000 +0000
+++ lustre.sles11/lustre/obdfilter/filter.c   2010-01-27 13:46:25.351172449 
+0000
@@ -1227,7 +1227,7 @@
         loff_t off = 0;
         ENTRY;
 
-        O_dentry = simple_mkdir(current->fs->pwd, filter->fo_vfsmnt,
+        O_dentry = simple_mkdir(current->fs->pwd.dentry, filter->fo_vfsmnt,
                                 "O", 0700, 1);
         CDEBUG(D_INODE, "got/created O: %p\n", O_dentry);
         if (IS_ERR(O_dentry)) {
@@ -1591,7 +1591,7 @@
         if (/*!dentry->d_inode ||*/dentry->d_parent->d_inode != dir)
                 GOTO(out, rc = -ENOENT);
 
-        rc = ll_permission(dir, MAY_WRITE | MAY_EXEC, NULL);
+        rc = ll_permission(dir, MAY_WRITE | MAY_EXEC);
         if (rc)
                 GOTO(out, rc);
 
@@ -1991,7 +1991,7 @@
         __u8 *uuid_ptr;
         char *str, *label;
         char ns_name[48];
-        request_queue_t *q;
+        struct request_queue *q;
         int rc, i;
         ENTRY;
 
@@ -2635,7 +2635,7 @@
 
         filter_post(obd);
 
-        LL_DQUOT_OFF(obd->u.obt.obt_sb);
+        LL_DQUOT_OFF(obd->u.obt.obt_sb, 0);
         shrink_dcache_sb(obd->u.obt.obt_sb);
 
         server_put_mount(obd->obd_name, filter->fo_vfsmnt);
--- lustre/lustre/obdfilter/filter_io_26.c    2010-01-26 19:31:27.000000000 
+0000
+++ lustre.sles11/lustre/obdfilter/filter_io_26.c     2010-01-27 
13:49:27.463243240 +0000
@@ -127,7 +127,7 @@
                 cfs_waitq_signal(&iobuf->dr_wait);
 }
 
-static int dio_complete_routine(struct bio *bio, unsigned int done, int error)
+static void dio_complete_routine(struct bio *bio, int error)
 {
         struct filter_iobuf *iobuf = bio->bi_private;
         struct bio_vec *bvl;
@@ -139,19 +139,16 @@
 
                 CWARN("Write to readonly device %s (%#x) bi_flags: %lx, "
                       "bi_vcnt: %d, bi_idx: %d, bi->size: %d, bi_cnt: %d, "
-                      "bi_private: %p, done: %u, error: %d\n",
+                      "bi_private: %p, error: %d\n",
                       bdev->bd_disk ? bdev->bd_disk->disk_name : "",
                       bdev->bd_dev, bio->bi_flags, bio->bi_vcnt, bio->bi_idx,
                       bio->bi_size, atomic_read(&bio->bi_cnt), bio->bi_private,
-                      done, error);
+                      error);
         }
 
         /* CAVEAT EMPTOR: possibly in IRQ context
          * DO NOT record procfs stats here!!! */
 
-        if (bio->bi_size)                       /* Not complete */
-                return 1;
-
         if (unlikely(iobuf == NULL)) {
                 CERROR("***** bio->bi_private is NULL!  This should never "
                        "happen.  Normally, I would crash here, but instead I "
@@ -167,7 +164,7 @@
                        bio->bi_rw, bio->bi_vcnt, bio->bi_idx, bio->bi_size,
                        bio->bi_end_io, cfs_atomic_read(&bio->bi_cnt),
                        bio->bi_private);
-                return 0;
+                return;
         }
 
         /* the check is outside of the cycle for performance reason -bzzz */
@@ -198,7 +195,6 @@
          * deadlocking the OST.  The bios are now released as soon as complete
          * so the pool cannot be exhausted while IOs are competing. bug 10076 
*/
         bio_put(bio);
-        return 0;
 }
 
 static int can_be_merged(struct bio *bio, sector_t sector)
@@ -369,19 +365,17 @@
                                 continue;       /* added this frag OK */
 
                         if (bio != NULL) {
-                                request_queue_t *q =
+                                struct request_queue *q =
                                         bdev_get_queue(bio->bi_bdev);
 
                                 /* Dang! I have to fragment this I/O */
                                 CDEBUG(D_INODE, "bio++ sz %d vcnt %d(%d) "
-                                       "sectors %d(%d) psg %d(%d) hsg 
%d(%d)\n",
+                                       "sectors %d(%d) psg %d(%d)\n",
                                        bio->bi_size,
                                        bio->bi_vcnt, bio->bi_max_vecs,
                                        bio->bi_size >> 9, q->max_sectors,
                                        bio_phys_segments(q, bio),
-                                       q->max_phys_segments,
-                                       bio_hw_segments(q, bio),
-                                       q->max_hw_segments);
+                                       q->max_phys_segments);
 
                                 record_start_io(iobuf, rw, bio->bi_size, exp);
                                 rc = fsfilt_send_bio(rw, obd, inode, bio);
--- lustre/lustre/include/linux/lustre_compat25.h     2010-01-26 
19:31:24.000000000 +0000
+++ lustre.sles11/lustre/include/linux/lustre_compat25.h      2010-01-27 
13:49:52.439182961 +0000
@@ -178,7 +178,7 @@
 
 #define LTIME_S(time)                   (time.tv_sec)
 #define ll_path_lookup                  path_lookup
-#define ll_permission(inode,mask,nd)    permission(inode,mask,nd)
+#define ll_permission(inode,mask)       inode_permission(inode,mask)
 
 #define ll_pgcache_lock(mapping)          cfs_spin_lock(&mapping->page_lock)
 #define ll_pgcache_unlock(mapping)        cfs_spin_unlock(&mapping->page_lock)
--- lustre/lustre/osd/osd_handler.c   2010-01-26 19:31:27.000000000 +0000
+++ lustre.sles11/lustre/osd/osd_handler.c    2010-01-27 13:50:58.239292486 
+0000
@@ -2653,7 +2655,7 @@
                                       (char *)key, strlen((char *)key));
 
         cfs_down_write(&obj->oo_ext_idx_sem);
-        bh = ll_ldiskfs_find_entry(dir, dentry, &de);
+        bh = ldiskfs_find_entry(dir, &dentry->d_name, &de);
         if (bh) {
                 struct osd_thread_info *oti = osd_oti_get(env);
                 struct timespec *ctime = &oti->oti_time;
@@ -2955,7 +2957,7 @@
                                       (char *)key, strlen((char *)key));
 
         cfs_down_read(&obj->oo_ext_idx_sem);
-        bh = ll_ldiskfs_find_entry(dir, dentry, &de);
+        bh = ldiskfs_find_entry(dir, &dentry->d_name, &de);
         if (bh) {
                 ino = le32_to_cpu(de->inode);
                 rc = osd_get_fid_from_dentry(de, rec);
--- lustre/lustre/mdd/mdd_internal.h  2010-01-26 19:31:26.000000000 +0000
+++ lustre.sles11/lustre/mdd/mdd_internal.h   2010-01-27 13:51:10.063156866 
+0000
@@ -41,7 +41,7 @@
 #ifndef _MDD_INTERNAL_H
 #define _MDD_INTERNAL_H
 
-#include <asm/semaphore.h>
+#include <linux/semaphore.h>
 
 #include <lustre_acl.h>
 #include <lustre_eacl.h>
--- lustre/lustre/ptlrpc/gss/gss_keyring.c    2010-01-26 19:31:27.000000000 
+0000
+++ lustre.sles11/lustre/ptlrpc/gss/gss_keyring.c     2010-01-27 
13:51:21.463200196 +0000
@@ -50,7 +50,7 @@
 #include <linux/fs.h>
 #include <linux/random.h>
 #include <linux/crypto.h>
-#include <linux/key.h>
+#include <linux/key-type.h>
 #include <linux/keyctl.h>
 #include <linux/mutex.h>
 #include <asm/atomic.h>
--- lustre/lustre/utils/gss/context_mit.c     2010-01-12 16:21:35.000000000 
+0000
+++ lustre.sles11/lustre/utils/gss/context_mit.c      2010-01-27 
13:53:05.023162106 +0000
@@ -182,7 +182,11 @@
 extern void krb5int_enc_arcfour;
 extern void krb5int_enc_aes128;
 extern void krb5int_enc_aes256;
-extern int krb5_derive_key();
+struct krb5_enc_provider;
+extern krb5_error_code krb5_derive_key(const struct krb5_enc_provider *enc,
+                                       const krb5_keyblock *inkey,
+                                       krb5_keyblock *outkey,
+                                       const krb5_data *in_constant);
 
 /*
  * XXX Hack alert! XXX Do NOT submit upstream!
--- lustre/lustre/utils/gss/context_lucid.c   2010-01-12 16:21:35.000000000 
+0000
+++ lustre.sles11/lustre/utils/gss/context_lucid.c    2010-01-27 
13:53:36.187151804 +0000
@@ -208,7 +208,11 @@
 extern void krb5int_enc_des3;
 extern void krb5int_enc_aes128;
 extern void krb5int_enc_aes256;
-extern int krb5_derive_key();
+struct krb5_enc_provider;
+extern krb5_error_code krb5_derive_key(const struct krb5_enc_provider *enc,
+                                       const krb5_keyblock *inkey,
+                                       krb5_keyblock *outkey,
+                                       const krb5_data *in_constant);
 
 static void
 key_lucid_to_krb5(const gss_krb5_lucid_key_t *lin, krb5_keyblock *kout)
--- lustre/ldiskfs/kernel_patches/patches/ext4_data_in_dirent-sles11.patch      
1970-01-01 00:00:00.000000000 +0000
+++ 
lustre.sles11/ldiskfs/kernel_patches/patches/ext4_data_in_dirent-sles11.patch   
    2010-05-12 16:04:39.370990262 +0000
@@ -0,0 +1,492 @@
+Files a/fs/ext4/.namei.c.swp and b/fs/ext4/.namei.c.swp differ
+diff -u -r -N a/fs/ext4/dir.c b/fs/ext4/dir.c
+--- a/fs/ext4/dir.c    2010-05-12 15:43:57.931156137 +0000
++++ b/fs/ext4/dir.c    2010-05-12 15:46:03.146989458 +0000
+@@ -53,11 +53,18 @@
+ 
+ static unsigned char get_dtype(struct super_block *sb, int filetype)
+ {
++      int fl_index = filetype & EXT4_FT_MASK;
++
+       if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_FILETYPE) ||
+-          (filetype >= EXT4_FT_MAX))
++          (fl_index >= EXT4_FT_MAX))
+               return DT_UNKNOWN;
+ 
+-      return (ext4_filetype_table[filetype]);
++      if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_DIRDATA))
++              return (ext4_filetype_table[fl_index]);
++
++      return (ext4_filetype_table[fl_index]) |
++              (filetype & EXT4_DIRENT_LUFID);
++
+ }
+ 
+ 
+@@ -69,11 +76,11 @@
+       const char *error_msg = NULL;
+       const int rlen = ext4_rec_len_from_disk(de->rec_len);
+ 
+-      if (rlen < EXT4_DIR_REC_LEN(1))
++      if (rlen < __EXT4_DIR_REC_LEN(1))
+               error_msg = "rec_len is smaller than minimal";
+       else if (rlen % 4 != 0)
+               error_msg = "rec_len % 4 != 0";
+-      else if (rlen < EXT4_DIR_REC_LEN(de->name_len))
++      else if (rlen < EXT4_DIR_REC_LEN(de))
+               error_msg = "rec_len is too small for name_len";
+       else if (((char *) de - bh->b_data) + rlen > dir->i_sb->s_blocksize)
+               error_msg = "directory entry across blocks";
+@@ -179,7 +186,7 @@
+                                * failure will be detected in the
+                                * dirent test below. */
+                               if (ext4_rec_len_from_disk(de->rec_len)
+-                                              < EXT4_DIR_REC_LEN(1))
++                                              < __EXT4_DIR_REC_LEN(1))
+                                       break;
+                               i += ext4_rec_len_from_disk(de->rec_len);
+                       }
+@@ -339,12 +346,17 @@
+       struct fname *fname, *new_fn;
+       struct dir_private_info *info;
+       int len;
++      int extra_data = 1;
+ 
+       info = (struct dir_private_info *) dir_file->private_data;
+       p = &info->root.rb_node;
+ 
+       /* Create and allocate the fname structure */
+-      len = sizeof(struct fname) + dirent->name_len + 1;
++      if (dirent->file_type & EXT4_DIRENT_LUFID)
++              extra_data = ext4_get_dirent_data_len(dirent);
++
++      len = sizeof(struct fname) + dirent->name_len + extra_data;
++
+       new_fn = kzalloc(len, GFP_KERNEL);
+       if (!new_fn)
+               return -ENOMEM;
+@@ -353,7 +365,7 @@
+       new_fn->inode = le32_to_cpu(dirent->inode);
+       new_fn->name_len = dirent->name_len;
+       new_fn->file_type = dirent->file_type;
+-      memcpy(new_fn->name, dirent->name, dirent->name_len);
++      memcpy(new_fn->name, dirent->name, dirent->name_len + extra_data);
+       new_fn->name[dirent->name_len] = 0;
+ 
+       while (*p) {
+diff -u -r -N a/fs/ext4/ext4.h b/fs/ext4/ext4.h
+--- a/fs/ext4/ext4.h   2010-05-12 15:44:18.670989821 +0000
++++ b/fs/ext4/ext4.h   2010-05-12 15:48:25.203156936 +0000
+@@ -778,6 +778,7 @@
+ #define EXT4_FEATURE_INCOMPAT_64BIT           0x0080
+ #define EXT4_FEATURE_INCOMPAT_MMP               0x0100
+ #define EXT4_FEATURE_INCOMPAT_FLEX_BG         0x0200
++#define EXT4_FEATURE_INCOMPAT_DIRDATA         0x1000
+ 
+ #define EXT4_FEATURE_COMPAT_SUPP      EXT2_FEATURE_COMPAT_EXT_ATTR
+ #define EXT4_FEATURE_INCOMPAT_SUPP    (EXT4_FEATURE_INCOMPAT_FILETYPE| \
+@@ -786,7 +787,9 @@
+                                        EXT4_FEATURE_INCOMPAT_EXTENTS| \
+                                        EXT4_FEATURE_INCOMPAT_64BIT| \
+                                        EXT4_FEATURE_INCOMPAT_FLEX_BG| \
+-                                       EXT4_FEATURE_INCOMPAT_MMP)
++                                       EXT4_FEATURE_INCOMPAT_MMP| \
++                                       EXT4_FEATURE_INCOMPAT_DIRDATA)
++
+ #define EXT4_FEATURE_RO_COMPAT_SUPP   (EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
+                                        EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \
+                                        EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \
+@@ -856,6 +859,43 @@
+ #define EXT4_FT_SYMLINK               7
+ 
+ #define EXT4_FT_MAX           8
++#define EXT4_FT_MASK          0xf
++
++#if EXT4_FT_MAX > EXT4_FT_MASK
++#error "conflicting EXT4_FT_MAX and EXT4_FT_MASK"
++#endif
++
++/*
++ * d_type has 4 unused bits, so it can hold four types data. these different
++ * type of data (e.g. lustre data, high 32 bits of 64-bit inode number) can be
++ * stored, in flag order, after file-name in ext4 dirent.
++*/
++/*
++ * this flag is added to d_type if ext4 dirent has extra data after
++ * filename. this data length is variable and length is stored in first byte
++ * of data. data start after filename NUL byte.
++ * This is used by Lustre FS.
++  */
++#define EXT4_DIRENT_LUFID             0x10
++
++#define EXT4_LUFID_MAGIC    0xAD200907UL
++struct ext4_dentry_param {
++      __u32  edp_magic;       /* EXT4_LUFID_MAGIC */
++      char   edp_len;         /* size of edp_data in bytes */
++      char   edp_data[0];     /* packed array of data */
++} __attribute__((packed));
++
++static inline unsigned char *ext4_dentry_get_data(struct super_block *sb,
++              struct ext4_dentry_param* p)
++
++{
++      if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_DIRDATA))
++              return NULL;
++      if (p && p->edp_magic == EXT4_LUFID_MAGIC)
++              return &p->edp_len;
++      else
++              return NULL;
++}
+ 
+ /*
+  * EXT4_DIR_PAD defines the directory entries boundaries
+@@ -864,8 +904,11 @@
+  */
+ #define EXT4_DIR_PAD                  4
+ #define EXT4_DIR_ROUND                        (EXT4_DIR_PAD - 1)
+-#define EXT4_DIR_REC_LEN(name_len)    (((name_len) + 8 + EXT4_DIR_ROUND) & \
++#define __EXT4_DIR_REC_LEN(name_len)  (((name_len) + 8 + EXT4_DIR_ROUND) & \
+                                        ~EXT4_DIR_ROUND)
++#define EXT4_DIR_REC_LEN(de)          (__EXT4_DIR_REC_LEN(de->name_len +\
++                                      ext4_get_dirent_data_len(de)))
++
+ #define EXT4_MAX_REC_LEN              ((1<<16)-1)
+ 
+ static inline unsigned ext4_rec_len_from_disk(__le16 dlen)
+@@ -1182,7 +1225,7 @@
+                                           const struct qstr *d_name,
+                                           struct ext4_dir_entry_2 ** res_dir);
+ extern int ext4_add_dot_dotdot(handle_t *handle, struct inode *dir,
+-                                struct inode *inode);
++                        struct inode *inode, const void *, const void *);
+ extern int ext4_orphan_add(handle_t *, struct inode *);
+ extern int ext4_orphan_del(handle_t *, struct inode *);
+ extern int ext4_htree_fill_tree(struct file *dir_file, __u32 start_hash,
+@@ -1447,6 +1490,29 @@
+       set_bit(BH_BITMAP_UPTODATE, &(bh)->b_state);
+ }
+ 
++/*
++ * Compute the total directory entry data length.
++ * This includes the filename and an implicit NUL terminator (always present),
++ * and optional extensions.  Each extension has a bit set in the high 4 bits 
of
++ * de->file_type, and the extension length is the first byte in each entry.
++ */
++
++static inline int ext4_get_dirent_data_len(struct ext4_dir_entry_2 *de)
++{
++      char *len = de->name + de->name_len + 1 /* NUL terminator */;
++      int dlen = 0;
++      __u8 extra_data_flags = (de->file_type & ~EXT4_FT_MASK) >> 4;
++
++      while (extra_data_flags) {
++              if (extra_data_flags & 1) {
++                      dlen += *len + (dlen == 0);
++                      len += *len;
++              }
++              extra_data_flags >>= 1;
++      }
++      return dlen;
++}
++  
+ #endif        /* __KERNEL__ */
+ 
+ #endif        /* _EXT4_H */
+diff -u -r -N a/fs/ext4/namei.c b/fs/ext4/namei.c
+--- a/fs/ext4/namei.c  2010-05-12 15:44:18.682989925 +0000
++++ b/fs/ext4/namei.c  2010-05-12 16:04:13.322999721 +0000
+@@ -175,7 +175,8 @@
+ static unsigned dx_get_limit(struct dx_entry *entries);
+ static void dx_set_count(struct dx_entry *entries, unsigned value);
+ static void dx_set_limit(struct dx_entry *entries, unsigned value);
+-static unsigned dx_root_limit(struct inode *dir, unsigned infosize);
++static inline unsigned dx_root_limit(__u32 blocksize,
++              struct ext4_dir_entry_2 *dot_de, unsigned infosize);
+ static unsigned dx_node_limit(struct inode *dir);
+ static struct dx_frame *dx_probe(const struct qstr *d_name,
+                                struct inode *dir,
+@@ -218,11 +219,12 @@
+  */
+ struct dx_root_info * dx_get_dx_info(struct ext4_dir_entry_2 *de)
+ {
+-       /* get dotdot first */
+-       de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(1));
++      BUG_ON(de->name_len != 1);
++      /* get dotdot first */
++      de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(de));
+ 
+-       /* dx root info is after dotdot entry */
+-       de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(2));
++      /* dx root info is after dotdot entry */
++      de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(de));
+ 
+        return (struct dx_root_info *) de;
+ }
+@@ -267,16 +269,23 @@
+       ((struct dx_countlimit *) entries)->limit = cpu_to_le16(value);
+ }
+ 
+-static inline unsigned dx_root_limit(struct inode *dir, unsigned infosize)
++static inline unsigned dx_root_limit(__u32 blocksize,
++              struct ext4_dir_entry_2 *dot_de, unsigned infosize)
+ {
+-      unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(1) -
+-              EXT4_DIR_REC_LEN(2) - infosize;
++      struct ext4_dir_entry_2 *dotdot_de;
++      unsigned entry_space;
++
++      BUG_ON(dot_de->name_len != 1);
++      dotdot_de = ext4_next_entry(dot_de);
++      entry_space = blocksize - EXT4_DIR_REC_LEN(dot_de) -
++                       EXT4_DIR_REC_LEN(dotdot_de) - infosize;
++
+       return entry_space / sizeof(struct dx_entry);
+ }
+ 
+ static inline unsigned dx_node_limit(struct inode *dir)
+ {
+-      unsigned entry_space = dir->i_sb->s_blocksize - EXT4_DIR_REC_LEN(0);
++      unsigned entry_space = dir->i_sb->s_blocksize - __EXT4_DIR_REC_LEN(0);
+       return entry_space / sizeof(struct dx_entry);
+ }
+ 
+@@ -323,7 +332,7 @@
+                               printk(":%x.%u ", h.hash,
+                                      ((char *) de - base));
+                       }
+-                      space += EXT4_DIR_REC_LEN(de->name_len);
++                      space += EXT4_DIR_REC_LEN(de);
+                       names++;
+               }
+               de = ext4_next_entry(de);
+@@ -428,7 +437,8 @@
+ 
+       entries = (struct dx_entry *) (((char *)info) + info->info_length);
+ 
+-      if (dx_get_limit(entries) != dx_root_limit(dir,
++      if (dx_get_limit(entries) != dx_root_limit(dir->i_sb->s_blocksize,
++                                                 (struct 
ext4_dir_entry_2*)bh->b_data,
+                                                  info->info_length)) {
+               ext4_warning(dir->i_sb, __func__,
+                            "dx entry: limit != root limit");
+@@ -618,7 +628,7 @@
+       de = (struct ext4_dir_entry_2 *) bh->b_data;
+       top = (struct ext4_dir_entry_2 *) ((char *) de +
+                                          dir->i_sb->s_blocksize -
+-                                         EXT4_DIR_REC_LEN(0));
++                                         __EXT4_DIR_REC_LEN(0));
+       for (; de < top; de = ext4_next_entry(de)) {
+               if (!ext4_check_dir_entry("htree_dirblock_to_tree", dir, de, bh,
+                                       (block<<EXT4_BLOCK_SIZE_BITS(dir->i_sb))
+@@ -1030,7 +1040,7 @@
+                       goto errout;
+               de = (struct ext4_dir_entry_2 *) bh->b_data;
+               top = (struct ext4_dir_entry_2 *) ((char *) de + 
sb->s_blocksize -
+-                                     EXT4_DIR_REC_LEN(0));
++                                      __EXT4_DIR_REC_LEN(0));
+               for (; de < top; de = ext4_next_entry(de)) {
+                       int off = (block << EXT4_BLOCK_SIZE_BITS(sb))
+                                 + ((char *) de - bh->b_data);
+@@ -1196,7 +1206,7 @@
+ 
+       while (count--) {
+               struct ext4_dir_entry_2 *de = (struct ext4_dir_entry_2 *) (from 
+ map->offs);
+-              rec_len = EXT4_DIR_REC_LEN(de->name_len);
++              rec_len = EXT4_DIR_REC_LEN(de);
+               memcpy (to, de, rec_len);
+               ((struct ext4_dir_entry_2 *) to)->rec_len =
+                               ext4_rec_len_to_disk(rec_len);
+@@ -1220,7 +1230,7 @@
+       while ((char*)de < base + size) {
+               next = ext4_next_entry(de);
+               if (de->inode && de->name_len) {
+-                      rec_len = EXT4_DIR_REC_LEN(de->name_len);
++                      rec_len = EXT4_DIR_REC_LEN(de);
+                       if (de > to)
+                               memmove(to, de, rec_len);
+                       to->rec_len = ext4_rec_len_to_disk(rec_len);
+@@ -1347,10 +1357,16 @@
+       int             namelen = dentry->d_name.len;
+       unsigned int    offset = 0;
+       unsigned short  reclen;
+-      int             nlen, rlen, err;
++      int             nlen, rlen, err, dlen = 0;
++      unsigned char   *data;
+       char            *top;
+ 
+-      reclen = EXT4_DIR_REC_LEN(namelen);
++      data = ext4_dentry_get_data(inode->i_sb, (struct ext4_dentry_param *)
++                                              dentry->d_fsdata);
++      if (data)
++              dlen = (*data) + 1;
++
++      reclen = __EXT4_DIR_REC_LEN(namelen + dlen);
+       if (!de) {
+               de = (struct ext4_dir_entry_2 *)bh->b_data;
+               top = bh->b_data + dir->i_sb->s_blocksize - reclen;
+@@ -1360,7 +1376,7 @@
+                               return -EIO;
+                       if (ext4_match(namelen, name, de))
+                               return -EEXIST;
+-                      nlen = EXT4_DIR_REC_LEN(de->name_len);
++                      nlen = EXT4_DIR_REC_LEN(de);
+                       rlen = ext4_rec_len_from_disk(de->rec_len);
+                       if ((de->inode? rlen - nlen: rlen) >= reclen)
+                               break;
+@@ -1378,7 +1394,7 @@
+       }
+ 
+       /* By now the buffer is marked for journaling */
+-      nlen = EXT4_DIR_REC_LEN(de->name_len);
++      nlen = EXT4_DIR_REC_LEN(de);
+       rlen = ext4_rec_len_from_disk(de->rec_len);
+       if (de->inode) {
+               struct ext4_dir_entry_2 *de1 = (struct ext4_dir_entry_2 
*)((char *)de + nlen);
+@@ -1394,6 +1410,12 @@
+               de->inode = 0;
+       de->name_len = namelen;
+       memcpy(de->name, name, namelen);
++      if (data) {
++              de->name[namelen] = 0;
++              memcpy(&de->name[namelen + 1], data, *(char *) data);
++              de->file_type |= EXT4_DIRENT_LUFID;
++      }
++
+       /*
+        * XXX shouldn't update any times until successful
+        * completion of syscall, but too many callers depend
+@@ -1489,7 +1511,8 @@
+ 
+       dx_set_block(entries, 1);
+       dx_set_count(entries, 1);
+-      dx_set_limit(entries, dx_root_limit(dir, sizeof(*dx_info)));
++      dx_set_limit(entries, dx_root_limit(dir->i_sb->s_blocksize,
++                                       dot_de, sizeof(*dx_info)));
+ 
+       /* Initialize as for dx_probe */
+       hinfo.hash_version = dx_info->hash_version;
+@@ -1520,6 +1543,8 @@
+       struct buffer_head * dir_block;
+       struct ext4_dir_entry_2 * de;
+       int len, journal = 0, err = 0;
++      int dlen = 0;
++      char *data;
+ 
+       if (IS_ERR(handle))
+               return PTR_ERR(handle);
+@@ -1535,19 +1560,24 @@
+       /* the first item must be "." */
+       assert(de->name_len == 1 && de->name[0] == '.');
+       len = le16_to_cpu(de->rec_len);
+-      assert(len >= EXT4_DIR_REC_LEN(1));
+-      if (len > EXT4_DIR_REC_LEN(1)) {
++      assert(len >= __EXT4_DIR_REC_LEN(1));
++      if (len > __EXT4_DIR_REC_LEN(1)) {
+               BUFFER_TRACE(dir_block, "get_write_access");
+               err = ext4_journal_get_write_access(handle, dir_block);
+               if (err)
+                       goto out_journal;
+ 
+               journal = 1;
+-              de->rec_len = cpu_to_le16(EXT4_DIR_REC_LEN(1));
++              de->rec_len = cpu_to_le16(EXT4_DIR_REC_LEN(de));
+       }
+ 
+-      len -= EXT4_DIR_REC_LEN(1);
+-      assert(len == 0 || len >= EXT4_DIR_REC_LEN(2));
++      len -= EXT4_DIR_REC_LEN(de);
++      data = ext4_dentry_get_data(dir->i_sb,
++                      (struct ext4_dentry_param *) dentry->d_fsdata);
++      if (data)
++              dlen = *data + 1;
++      assert(len == 0 || len >= __EXT4_DIR_REC_LEN(2 + dlen));
++
+       de = (struct ext4_dir_entry_2 *)
+                       ((char *) de + le16_to_cpu(de->rec_len));
+       if (!journal) {
+@@ -1561,10 +1591,15 @@
+       if (len > 0)
+               de->rec_len = cpu_to_le16(len);
+       else
+-              assert(le16_to_cpu(de->rec_len) >= EXT4_DIR_REC_LEN(2));
++              assert(le16_to_cpu(de->rec_len) >= __EXT4_DIR_REC_LEN(2));
+       de->name_len = 2;
+       strcpy (de->name, "..");
+       ext4_set_de_type(dir->i_sb, de, S_IFDIR);
++      if (data) {
++              de->name[2] = 0;
++              memcpy(&de->name[2 + 1], data, dlen);
++              de->file_type |= EXT4_DIRENT_LUFID;
++      }
+ 
+ out_journal:
+       if (journal) {
+@@ -1980,11 +2015,12 @@
+ /* Initialize @inode as a subdirectory of @dir, and add the
+  * "." and ".." entries into the first directory block. */
+ int ext4_add_dot_dotdot(handle_t *handle, struct inode * dir,
+-                      struct inode *inode)
++                       struct inode *inode,
++                       const void *data1, const void *data2)
+ {
+       struct buffer_head * dir_block;
+       struct ext4_dir_entry_2 * de;
+-      int err = 0;
++      int err = 0, dot_reclen;
+ 
+       if (IS_ERR(handle))
+               return PTR_ERR(handle);
+@@ -2007,16 +2043,33 @@
+       de = (struct ext4_dir_entry_2 *) dir_block->b_data;
+       de->inode = cpu_to_le32(inode->i_ino);
+       de->name_len = 1;
+-      de->rec_len = ext4_rec_len_to_disk(EXT4_DIR_REC_LEN(de->name_len));
++      /* get packed fid data*/
++      data1 = ext4_dentry_get_data(dir->i_sb,
++                              (struct ext4_dentry_param *) data1);
++      if (data1) {
++              de->name[1] = 0;
++              memcpy(&de->name[2], data1, *(char *) data1);
++              de->file_type |= EXT4_DIRENT_LUFID;
++      }
++      de->rec_len = cpu_to_le16(EXT4_DIR_REC_LEN(de));
++      dot_reclen = cpu_to_le16(de->rec_len);
++ 
+       strcpy (de->name, ".");
+       ext4_set_de_type(dir->i_sb, de, S_IFDIR);
+       de = ext4_next_entry(de);
+       de->inode = cpu_to_le32(dir->i_ino);
+       de->rec_len = ext4_rec_len_to_disk(inode->i_sb->s_blocksize -
+-                                            EXT4_DIR_REC_LEN(1));
++                                            dot_reclen);
+       de->name_len = 2;
+       strcpy(de->name, "..");
+       ext4_set_de_type(dir->i_sb, de, S_IFDIR);
++      data2 = ext4_dentry_get_data(dir->i_sb,
++                      (struct ext4_dentry_param *) data2);
++      if (data2) {
++              de->name[2] = 0;
++              memcpy(&de->name[3], data2, *(char *) data2);
++              de->file_type |= EXT4_DIRENT_LUFID;
++      }
+       inode->i_nlink = 2;
+       BUFFER_TRACE(dir_block, "call ext4_journal_dirty_metadata");
+       ext4_journal_dirty_metadata(handle, dir_block);
+@@ -2053,7 +2106,7 @@
+       if (IS_ERR(inode))
+               goto out_stop;
+ 
+-      err = ext4_add_dot_dotdot(handle, dir, inode);
++      err = ext4_add_dot_dotdot(handle, dir, inode, NULL, NULL);
+       if (err)
+               goto out_stop;
+ 
+@@ -2087,7 +2140,7 @@
+       int err = 0;
+ 
+       sb = inode->i_sb;
+-      if (inode->i_size < EXT4_DIR_REC_LEN(1) + EXT4_DIR_REC_LEN(2) ||
++      if (inode->i_size < __EXT4_DIR_REC_LEN(1) + __EXT4_DIR_REC_LEN(2) ||
+           !(bh = ext4_bread(NULL, inode, 0, 0, &err))) {
+               if (err)
+                       ext4_error(inode->i_sb, __func__,
--- lustre/ldiskfs/kernel_patches/patches/ext4-kill-dx_root-sles11.patch        
1970-01-01 00:00:00.000000000 +0000
+++ lustre.sles11/ldiskfs/kernel_patches/patches/ext4-kill-dx_root-sles11.patch 
2010-05-12 15:37:21.171156157 +0000
@@ -0,0 +1,245 @@
+removes static definition of dx_root struct. so that "." and ".." dirent can
+have extra data. This patch does not change any functionality but is required 
for
+ext4_data_in_dirent patch.
+ 
+Index: b/fs/ext4/namei.c
+===================================================================
+--- a/fs/ext4/namei.c
++++ b/fs/ext4/namei.c
+@@ -121,22 +121,13 @@ struct dx_entry
+  * hash version mod 4 should never be 0.  Sincerely, the paranoia department.
+  */
+ 
+-struct dx_root
++struct dx_root_info
+ {
+-      struct fake_dirent dot;
+-      char dot_name[4];
+-      struct fake_dirent dotdot;
+-      char dotdot_name[4];
+-      struct dx_root_info
+-      {
+-              __le32 reserved_zero;
+-              u8 hash_version;
+-              u8 info_length; /* 8 */
+-              u8 indirect_levels;
+-              u8 unused_flags;
+-      }
+-      info;
+-      struct dx_entry entries[0];
++      __le32 reserved_zero;
++      u8 hash_version;
++      u8 info_length; /* 8 */
++      u8 indirect_levels;
++      u8 unused_flags;
+ };
+ 
+ struct dx_node
+@@ -225,6 +216,16 @@ ext4_next_entry(struct ext4_dir_entry_2 
+  * Future: use high four bits of block for coalesce-on-delete flags
+  * Mask them off for now.
+  */
++struct dx_root_info * dx_get_dx_info(struct ext4_dir_entry_2 *de)
++{
++       /* get dotdot first */
++       de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(1));
++
++       /* dx root info is after dotdot entry */
++       de = (struct ext4_dir_entry_2 *)((char *)de + EXT4_DIR_REC_LEN(2));
++
++       return (struct dx_root_info *) de;
++}
+ 
+ static inline ext4_lblk_t dx_get_block(struct dx_entry *entry)
+ {
+@@ -378,7 +379,7 @@ dx_probe(struct dentry *dentry, struct i
+ {
+       unsigned count, indirect;
+       struct dx_entry *at, *entries, *p, *q, *m;
+-      struct dx_root *root;
++      struct dx_root_info * info;
+       struct buffer_head *bh;
+       struct dx_frame *frame = frame_in;
+       u32 hash;
+@@ -388,18 +389,19 @@ dx_probe(struct dentry *dentry, struct i
+               dir = dentry->d_parent->d_inode;
+       if (!(bh = ext4_bread (NULL,dir, 0, 0, err)))
+               goto fail;
+-      root = (struct dx_root *) bh->b_data;
+-      if (root->info.hash_version != DX_HASH_TEA &&
+-          root->info.hash_version != DX_HASH_HALF_MD4 &&
+-          root->info.hash_version != DX_HASH_LEGACY) {
++
++      info = dx_get_dx_info((struct ext4_dir_entry_2*)bh->b_data);
++      if (info->hash_version != DX_HASH_TEA &&
++          info->hash_version != DX_HASH_HALF_MD4 &&
++          info->hash_version != DX_HASH_LEGACY) {
+               ext4_warning(dir->i_sb, __func__,
+                            "Unrecognised inode hash code %d for directory "
+-                           "#%lu", root->info.hash_version, dir->i_ino);
++                           "#%lu", info->hash_version, dir->i_ino);
+               brelse(bh);
+               *err = ERR_BAD_DX_DIR;
+               goto fail;
+       }
+-      hinfo->hash_version = root->info.hash_version;
++      hinfo->hash_version = info->hash_version;
+       if (hinfo->hash_version <= DX_HASH_TEA)
+               hinfo->hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;
+       hinfo->seed = EXT4_SB(dir->i_sb)->s_hash_seed;
+@@ -398,29 +399,28 @@ dx_probe(struct dentry *dentry, struct i
+               ext4fs_dirhash(dentry->d_name.name, dentry->d_name.len, hinfo);
+       hash = hinfo->hash;
+ 
+-      if (root->info.unused_flags & 1) {
++      if (info->unused_flags & 1) {
+               ext4_warning(dir->i_sb, __func__,
+                            "Unimplemented inode hash flags: %#06x",
+-                           root->info.unused_flags);
++                           info->unused_flags);
+               brelse(bh);
+               *err = ERR_BAD_DX_DIR;
+               goto fail;
+       }
+ 
+-      if ((indirect = root->info.indirect_levels) > 1) {
++      if ((indirect = info->indirect_levels) > 1) {
+               ext4_warning(dir->i_sb, __func__,
+                            "Unimplemented inode hash depth: %#06x",
+-                           root->info.indirect_levels);
++                           info->indirect_levels);
+               brelse(bh);
+               *err = ERR_BAD_DX_DIR;
+               goto fail;
+       }
+ 
+-      entries = (struct dx_entry *) (((char *)&root->info) +
+-                                     root->info.info_length);
++      entries = (struct dx_entry *) (((char *)info) + info->info_length);
+ 
+       if (dx_get_limit(entries) != dx_root_limit(dir,
+-                                                 root->info.info_length)) {
++                                                 info->info_length)) {
+               ext4_warning(dir->i_sb, __func__,
+                            "dx entry: limit != root limit");
+               brelse(bh);
+@@ -509,10 +510,12 @@ fail:
+ 
+ static void dx_release (struct dx_frame *frames)
+ {
++      struct dx_root_info *info;
+       if (frames[0].bh == NULL)
+               return;
+ 
+-      if (((struct dx_root *) frames[0].bh->b_data)->info.indirect_levels)
++      info = dx_get_dx_info((struct ext4_dir_entry_2*)frames[0].bh->b_data);
++      if (info->indirect_levels)
+               brelse(frames[1].bh);
+       brelse(frames[0].bh);
+ }
+@@ -1430,17 +1433,16 @@ static int make_indexed_dir(handle_t *ha
+       const char      *name = dentry->d_name.name;
+       int             namelen = dentry->d_name.len;
+       struct buffer_head *bh2;
+-      struct dx_root  *root;
+       struct dx_frame frames[2], *frame;
+       struct dx_entry *entries;
+-      struct ext4_dir_entry_2 *de, *de2;
++      struct ext4_dir_entry_2 *de, *de2, *dot_de, *dotdot_de;
+       char            *data1, *top;
+       unsigned        len;
+       int             retval;
+       unsigned        blocksize;
+       struct dx_hash_info hinfo;
+       ext4_lblk_t  block;
+-      struct fake_dirent *fde;
++      struct dx_root_info *dx_info;
+ 
+       blocksize =  dir->i_sb->s_blocksize;
+       dxtrace(printk("Creating index\n"));
+@@ -1450,7 +1452,6 @@ static int make_indexed_dir(handle_t *ha
+               brelse(bh);
+               return retval;
+       }
+-      root = (struct dx_root *) bh->b_data;
+ 
+       bh2 = ext4_append (handle, dir, &block, &retval);
+       if (!(bh2)) {
+@@ -1454,11 +1455,20 @@
+       EXT4_I(dir)->i_flags |= EXT4_INDEX_FL;
+       data1 = bh2->b_data;
+ 
++      dot_de = (struct ext4_dir_entry_2 *) bh->b_data;
++      dotdot_de = ext4_next_entry(dot_de);
++
+       /* The 0th block becomes the root, move the dirents out */
+-      fde = &root->dotdot;
+-      de = (struct ext4_dir_entry_2 *)((char *)fde +
+-              ext4_rec_len_from_disk(fde->rec_len));
+-      len = ((char *) root) + blocksize - (char *) de;
++      de = (struct ext4_dir_entry_2 *)((char *)dotdot_de +
++              ext4_rec_len_from_disk(dotdot_de->rec_len));
++      if ((char *) de >= (((char *) dot_de) + blocksize)) {
++              ext4_error(dir->i_sb, __func__,
++                      "invalid rec_len for '..' in inode %lu",
++                      dir->i_ino);
++              brelse(bh);
++              return -EIO;
++      }
++      len = ((char *) dot_de) + blocksize - (char *) de;
+       memcpy (data1, de, len);
+       de = (struct ext4_dir_entry_2 *) data1;
+       top = data1 + len;
+@@ -1472,18 +1475,23 @@ static int make_indexed_dir(handle_t *ha
+               de = de2;
+       de->rec_len = ext4_rec_len_to_disk(data1 + blocksize - (char *) de);
+       /* Initialize the root; the dot dirents already exist */
+-      de = (struct ext4_dir_entry_2 *) (&root->dotdot);
+-      de->rec_len = ext4_rec_len_to_disk(blocksize - EXT4_DIR_REC_LEN(2));
+-      memset (&root->info, 0, sizeof(root->info));
+-      root->info.info_length = sizeof(root->info);
+-      root->info.hash_version = EXT4_SB(dir->i_sb)->s_def_hash_version;
+-      entries = root->entries;
+-      dx_set_block(entries, 1);
+-      dx_set_count(entries, 1);
+-      dx_set_limit(entries, dx_root_limit(dir, sizeof(root->info)));
++      dotdot_de->rec_len = ext4_rec_len_to_disk(blocksize -
++                      le16_to_cpu(dot_de->rec_len));
++
++      /* initialize hashing info */
++      dx_info = dx_get_dx_info(dot_de);
++      memset (dx_info, 0, sizeof(*dx_info));
++      dx_info->info_length = sizeof(*dx_info);
++      dx_info->hash_version = EXT4_SB(dir->i_sb)->s_def_hash_version;
++
++      entries = (void *)dx_info + sizeof(*dx_info);
++
++      dx_set_block(entries, 1);
++      dx_set_count(entries, 1);
++      dx_set_limit(entries, dx_root_limit(dir, sizeof(*dx_info)));
+ 
+       /* Initialize as for dx_probe */
+-      hinfo.hash_version = root->info.hash_version;
++      hinfo.hash_version = dx_info->hash_version;
+       if (hinfo.hash_version <= DX_HASH_TEA)
+               hinfo.hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;
+       hinfo.seed = EXT4_SB(dir->i_sb)->s_hash_seed;
+@@ -1724,6 +1733,7 @@ static int ext4_dx_add_entry(handle_t *h
+                               goto journal_error;
+                       brelse (bh2);
+               } else {
++                      struct dx_root_info * info;
+                       dxtrace(printk(KERN_DEBUG
+                                      "Creating second level index...\n"));
+                       memcpy((char *) entries2, (char *) entries,
+@@ -1732,7 +1742,9 @@ static int ext4_dx_add_entry(handle_t *h
+                       /* Set up root */
+                       dx_set_count(entries, 1);
+                       dx_set_block(entries + 0, newblock);
+-                      ((struct dx_root *) 
frames[0].bh->b_data)->info.indirect_levels = 1;
++                      info = dx_get_dx_info((struct ext4_dir_entry_2*)
++                                      frames[0].bh->b_data);
++                      info->indirect_levels = 1;
+ 
+                       /* Add new access path frame */
+                       frame = frames + 1;
--- lustre/ldiskfs/kernel_patches/series/ldiskfs-2.6-sles11.series      
2010-03-01 02:06:58.000000000 +0000
+++ lustre.sles11/ldiskfs/kernel_patches/series/ldiskfs-2.6-sles11.series       
2010-05-12 15:45:12.323156131 +0000
@@ -34,3 +34,5 @@
 ext4-hash-indexed-dir-dotdot-update.patch
 ext4-disable-write-bar-by-default.patch
 ext4-mballoc-pa_free-mismatch.patch
+ext4-kill-dx_root-sles11.patch
+ext4_data_in_dirent-sles11.patch

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Future of lustre 1.8.3+

Reply via email to