On Mon, 2005-10-24 at 01:06 +0100, Peter Grandi wrote:
> Thanks for the replies to my questions of some time ago, and now
> some more questions.
>
> I have become interested in the subject of preallocation, which
> I suspect is particularly important for a filesystem with
> extents and a buddy system free list.
>
> Having thought about it, the use of a buddy system for file
> space allocation seems a particularly good idea, because the one
> or two big problem of the buddy system for memory allocation are
> not relevant for files.
>
> But it can be less optimal if files are written piecemeal.
>
> What I want to know is how a common case like extracting a
> number of files from a 'tar' archive works, in particular
> in terms of preallocation and how many extents result.
>
> So some questions about JFS as it is now:
>
> * How to see how many extents of which size have been allocated
> to an inode from the command line?
You can use jfs_debugfs to see the xtree of an inode
> * How to list the free list from the command line?
There is no free list. The block map is a binary-buddy tree, where the
leaves contain bitmaps. Again, jfs_debugfs can be used to look at the
block map. Use the dmap, subcommand. Hopefully, it's not too hard to
navigate around.
> * Suppose a filesystem is empty, and 'tar' extracts to it an
> 8MiB file, writing 32KiB blocks. How many extents will it
> span? After it has been written, what will the free list
> look like?
I just tried creating a file with 'dd bs=32768 count=256' on a newly
formatted jfs volume, and it went into one extent.
> * How to preallocate space? As in, allocate to a file much more
> space than its size. For example, when writing to a file it
> may be known in advance that it will take 8MiB ('cp', 'tar',
> 'wget', ...), so create the file with a size of 0 but 8MiB
> allocated.
There's no code to do that now, but we could create an extent and mark
it ABNR (allocated but not recorded). This is a holdover from OS/2,
where the default behavior was to have dense files, rather than sparse
ones.
Ideally, generic_file_write would use the get_blocks file system
callback to allocate all of the blocks needed for a write at one time.
Currently, space is allocated one page at a time. Of course, jfs will
always try to allocate the next consecutive block and append it to an
existing extent if possible, so if the free space is not too fragmented,
we generally end up with large extents.
> * Would 'ftruncate'(2) (or the mythical 'posix_fallocate'(2))
> with an argument greater than the size of the file do a
> preallocation as per X/Open? It does not seem so now as in
> 'jfs_truncate_nolock()' the test looks like '(newsize >
> length)'.
No space will be allocated.
> http://WWW.OpenGroup.org/onlinepubs/009695399/functions/ftruncate.html
>
> http://WWW.OpenGroup.org/onlinepubs/009695399/functions/posix_fallocate.html
>
> * How hard would it be to add global and/or per-filesystem
> (permanent or at mount time) tunables to set:
>
> - a minimum extent size? for example to ensure that no extent
> smaller than 1MiB is allocated? The purpose is to ensure
> that files tend to be contiguous.
Non-trivial, but it shouldn't be a show-stopper. jfs_fsck would have to
be taught to not flag an error or blocks are allocated beyond the file's
size. How would we want to handle low-free-space conditions? Allow
shorter allocations if no 1 MB allocations are possible? Would we need
a mechanism to free unused preallocations?
> - alternatively, a default minimum extent size? So that the
> extents are initially allocated of that size, but can be
> reduced by 'close'(2) or 'ftruncate'(2) to the actual size
> of the file. For example so that when extracting from 'tar'
> a minimum extent size of say 256KiB is used, but when the
> file is closed or truncated the last extent can get chopped
> to less than that.
A jfs volume is logically divided into a number of allocation groups
(AGs). While a file is opened, jfs will always try to put allocations
to other files in a separate AG. This generally works pretty well, in
that sequential operations, like untarring archives will close one file
before opening the next, and groups of files are put near each other
with no wasted space between them. Where another file being created by
another task will be allocated somewhere else on disk.
> - a maximum extent size? For example to ensure that no extent
> larger than 256KiB is ever allocated? The purpose is to
> minimize internal fragmentation by allocating only at the
> lower levels of the buddy system.
I'm not sure what that would buy us.
> I hope that the rationales are fairly clear; part of that is to
> short circuit when possioble the somewhat hairy ''hint'' related
> logic in 'jfs_dmap.c' and that in 'jfs_open()' for example.
I don't understand the problem with the "hint". The hints are used to
attempt to allocated file data near the inode, or to append onto
existing extents when the following blocks are available.
> While these are common strategies, I suspect that preallocation
> of one form or another is better as he above may impair locality.
I think preallocation may be useful in some circumstances, i.e. when a
file is created non-sequentially, but I am concerned that leaving
preallocated, but unused, blocks between actual file data will result in
more fragmentation, or just wasted space on the disk.
> I have also noticed 'XAD_NOTRECORDED' that seems to indicate
> that preallocation is indeed being done or at least anticipated.
This dates back to the roots of jfs in OS/2 where files were dense by
default, so non-sequential writes would cause ABNR (as I mentioned
above) extents to be created where data had not yet been written. It
would be possible to add support for dense files in Linux, but it hasn't
really been asked for.
--
David Kleikamp
IBM Linux Technology Center
-------------------------------------------------------
This SF.Net email is sponsored by the JBoss Inc.
Get Certified Today * Register for a JBoss Training Course
Free Certification Exam for All Training Attendees Through End of 2005
Visit http://www.jboss.com/services/certification for more information
_______________________________________________
Jfs-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/jfs-discussion