status update for inode reservation

coly Tue, 20 Mar 2007 02:27:41 -0800

Andreas,

1, The size of files I created for benchmark is 0 byte. I created the
files by this script: 
> for i in `seq 1 50`;do for j in `seq 1 10000`;do touch
`/usr/bin/keygen | head -c 8`;done;done


2, using magic inode will not generate compatibility issues. for fsck do
not understand magic inode can ignore and remove the magic inodes. This
can only happen when fsck is performced and filesystem code can rebuild
the magic inode if it can not be found (this will take a some time for
reading inode table when mount).

Best regards.

Coly


P.S here is the result of my benchmark:
I created 500000 zero byte files in a directory named "sub", record
times for:
1, copy sub to another dir named "ordered1" in another harddisk.
2, copy dir "ordered1" to "ordered2" in another harddisk.
3, reboot the system and repeat 2 (change target to ordered3).
4, remove ordered3, ordered2, ordered 1.
5, remove sub.

>From the benchmark, I found no much performance improved for hash
ordered inode allocating when data=journal and data=ordered.



created 500000 new file in dir called "sub" by this script:
for i in `seq 1 50`;do for j in `seq 1 10000`;do touch `/usr/bin/keygen
| head -c 8`;done;done

==== data=writeback ====
copy sub to another dir named "ordered1":
real    7m17.616s
user    0m1.456s
sys     0m27.586s

copy dir "ordered1" to "ordered2":
real    0m45.231s
user    0m1.340s
sys     0m21.233s

reboot
copy dir "ordered2" to "ordered3":
real    1m8.764s
user    0m1.568s
sys     0m26.050s

remove ordered3 by rm -rf ordered3:
real    0m9.200s
user    0m0.168s
sys     0m8.893s

remove ordered2 by rm -rf ordered2:
real    0m12.225s
user    0m0.128s
sys     0m8.857s

remove ordered1 by rm -rf ordered1:
real    0m37.493s
user    0m0.076s
sys     0m11.089s

remove original dir "sub":
real    9m49.902s
user    0m0.220s
sys     0m14.377s

==== data=journal ====
copy sub to another dir named "ordered1":
real    6m54.151s
user    0m1.696s
sys     0m22.705s

copy dir "ordered1" to "ordered2":
real    7m7.696s
user    0m1.416s
sys     0m23.541s

reboot
copy dir "ordered1" to "ordered2":
real    10m46.649s
user    0m1.792s
sys     0m28.778s

remove ordered1 by rm -rf ordered1:
real    12m54.271s
user    0m0.192s
sys     0m15.353s

remove ordered2 by rm -rf ordered2:
real    13m37.035s
user    0m0.260s
sys     0m15.009s

remove ordered3 by rm -rf ordered3:
real    7m43.703s
user    0m0.216s
sys     0m12.117s

remove sub by rm -rf sub:
real    10m41.150s
user    0m0.188s
sys     0m13.781s

===== data=ordered ====
copy sub to another dir named "ordered1":
real    7m57.016s
user    0m1.632s
sys     0m25.558s

copy dir "ordered1" to "ordered2":
real    7m46.037s
user    0m1.604.s
sys     0m24.902s

reboot
copy dir "ordered2" to "ordered3":
real    8m21.952s
user    0m1.720s
sys     0m28.290s

remove ordered1 by rm -rf ordered1:
real    10m12.652s
user    0m0.272s
sys     0m15.049s

remove ordered2 by rm -rf ordered2:
real    9m21.770s
user    0m0.220s
sys     0m15.025s

remove ordered3 by rm -rf ordered3:
real    6m32.278s
user    0m0.176s
sys     0m12.093s

remove sub by rm -rf sub:
real    10m17.966s
user    0m0.236s
sys     0m14.453s




在 2007-03-20二的 03:51 -0600，Andreas Dilger写道：
> On Mar 20, 2007  17:22 +0800, coly wrote:
> > 1, I did benchmark on large number of file copy and remove. The method
> > is what you did and told me before (create many file in a dir, copy this
> > dir, remove the new and original dirs).
> >    * In data=journal and data=ordered, not much performance improve will
> > be gained from inode reservation. For every inode modification will be
> > submitted into journal at once, no chance to merge multiple inode
> > modification in one inode table into 1 journal submitting.
> 
> That shouldn't be true.  Whether operation is data=journal or data=writeback
> the filesystem metadata (i.e. inode table, directory) will always be in the
> journal.  Unless operation is always sync'd then it should still be possible
> to merge many filesystem operations into a single journal transaction (so
> that they can share the changes to the same blocks).
> 
> Now, whether the implementation matches the theory is a different question.
> It would be interesting to figure out why your test results are not showing
> the same performance between data=ordered and data=writeback.  How large
> are the files being unlinked?  Maybe if they are large the truncate time is
> long enough that the journal transaction is being committed?  Maybe with
> data=journal there is so much going into the journal that it also forces a
> commit because the journal is full?
> 
> > 2, In order to management reserved inode table for each directories,
> > especially when files number of a directory exceeded the current
> > reserved limitation, a list is needed to manage the reserved inode
> > tables. I want to use some inode on disk as pointer. I think only by
> > this way, we can avoid to change ext4 on disk meta data format.
> >    For some inodes used as pointers of list, I can assign MAGIC numbers
> > for them, identify them from normal inodes. But fsck and mkfs should be
> > modified to understand these MAGIC numbers.
> >   With helps for these pointers (inode with special MAGIC number), inode
> > reservation can be implemented more easy.
> 
> If you are making a magic inode, and it needs e2fsck and mke2fs support,
> then this by nature is a change to the filesystem format (though possibly
> one that allows an easy upgrade from existing filesystems).  If we need
> to change the on-disk format then there are a number of other changes we
> could make, including having "inode in directory" format, which will avoid
> this problem entirely because readdir and inode order are always the same.
> 
> I would suggest emailing to the linux-ext4 list with details of findings
> (performance, tests that have been run) so that everyone can read and
> comment on it.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

status update for inode reservation

Reply via email to