On Saturday, December 30, 2000 06:28:39 PM -0800 Linus Torvalds
<[EMAIL PROTECTED]> wrote:
> There are only two real advantages to deferred writing:
>
> - not having to do get_block() at all for temp-files, as we never have to
>do the allocation if we end up removing the file.
>
>
Hi,
On Tue, 2 Jan 2001, Alexander Viro wrote:
> Depends on a filesystem. Generally you don't want asynchronous operations
> to grab semaphores shared with something else. kswapd knows to skip the locked
> pages, but that's it - if writepage() is going to block on a semaphore you
> will not know
Hi,
On Tue, 2 Jan 2001, Alexander Viro wrote:
Depends on a filesystem. Generally you don't want asynchronous operations
to grab semaphores shared with something else. kswapd knows to skip the locked
pages, but that's it - if writepage() is going to block on a semaphore you
will not know
On Saturday, December 30, 2000 06:28:39 PM -0800 Linus Torvalds
[EMAIL PROTECTED] wrote:
There are only two real advantages to deferred writing:
- not having to do get_block() at all for temp-files, as we never have to
do the allocation if we end up removing the file.
NOTE NOTE
On Tue, 2 Jan 2001, Roman Zippel wrote:
> Block allocation is not my problem right now (and even directory handling
> is not that difficult), but I will post somethings about this on fsdevel
> later.
> But one question is still open, I'd really like an answer for:
> Is it possible to use a
Hi,
On Mon, 1 Jan 2001, Alexander Viro wrote:
> But... But with AFFS you _have_ exclusion between block-allocation and
> truncate(). It has no sparse files, so pageout will never allocate
> anything. I.e. all allocations come from write(2). And both write(2) and
> truncate(2) hold i_sem.
>
>
Alexander Viro wrote:
> GFP_BUFFER _may_ become an issue if we move bitmaps into pagecache.
> Then we'll need a per-address_space gfp_mask. Again, patches exist
> and had been tested (not right now - I didn't port them to current
> tree yet). Bugger if I remember whether they were posted or not -
On Mon, 1 Jan 2001, Roman Zippel wrote:
> The other reason for the question is that I'm currently overwork the block
> handling in affs, especially the extended block handling, where I'm
> implementing a new extended block cache, where I would pretty much prefer
> to use a semaphore to protect
Hi,
On Sun, 31 Dec 2000, Alexander Viro wrote:
> Reread the original thread. GFP_BUFFER protects us from buffer cache
> allocations triggering pageouts. It has nothing to the deadlock scenario
> that would come from grabbing ->i_sem on pageout.
I don't want to grab i_sem. It was a very, very
Hi,
On Sun, 31 Dec 2000, Alexander Viro wrote:
Reread the original thread. GFP_BUFFER protects us from buffer cache
allocations triggering pageouts. It has nothing to the deadlock scenario
that would come from grabbing -i_sem on pageout.
I don't want to grab i_sem. It was a very, very early
On Mon, 1 Jan 2001, Roman Zippel wrote:
The other reason for the question is that I'm currently overwork the block
handling in affs, especially the extended block handling, where I'm
implementing a new extended block cache, where I would pretty much prefer
to use a semaphore to protect it.
Alexander Viro wrote:
GFP_BUFFER _may_ become an issue if we move bitmaps into pagecache.
Then we'll need a per-address_space gfp_mask. Again, patches exist
and had been tested (not right now - I didn't port them to current
tree yet). Bugger if I remember whether they were posted or not -
Hi,
On Mon, 1 Jan 2001, Alexander Viro wrote:
But... But with AFFS you _have_ exclusion between block-allocation and
truncate(). It has no sparse files, so pageout will never allocate
anything. I.e. all allocations come from write(2). And both write(2) and
truncate(2) hold i_sem.
Problem
On Tue, 2 Jan 2001, Roman Zippel wrote:
Block allocation is not my problem right now (and even directory handling
is not that difficult), but I will post somethings about this on fsdevel
later.
But one question is still open, I'd really like an answer for:
Is it possible to use a
On Mon, 1 Jan 2001, Roman Zippel wrote:
> I just rechecked that, but I don't see no superblock lock here, it uses
> the kernel_lock instead. Although Al could give the definitive answer for
> this, he wrote it. :)
No superblock lock in get_block() proper. Tons of it in the dungheap called
Hi,
On Sun, 31 Dec 2000, Linus Torvalds wrote:
> cached_allocation = NULL;
>
> repeat:
> spin_lock();
> result = try_to_find_existing();
> if (!result) {
> if (!cached_allocation) {
>
On Sun, 31 Dec 2000, Roman Zippel wrote:
>
> On Sun, 31 Dec 2000, Linus Torvalds wrote:
>
> > Let me repeat myself one more time:
> >
> > I do not believe that "get_block()" is as big of a problem as people make
> > it out to be.
>
> The real problem is that get_block() doesn't scale and
Hi,
On Sun, 31 Dec 2000, Linus Torvalds wrote:
> Let me repeat myself one more time:
>
> I do not believe that "get_block()" is as big of a problem as people make
> it out to be.
The real problem is that get_block() doesn't scale and it's very hard to
do. A recursive per inode-semaphore
On Sun, 31 Dec 2000, Daniel Phillips wrote:
> Linus Torvalds wrote:
> > I do not believe that "get_block()" is as big of a problem as people make
> > it out to be.
>
> I didn't mention get_block - disk accesses obviously far outweigh
> filesystem cpu/cache usage in overall impact. The
Linus Torvalds wrote:
> I do not believe that "get_block()" is as big of a problem as people make
> it out to be.
I didn't mention get_block - disk accesses obviously far outweigh
filesystem cpu/cache usage in overall impact. The question is, what
happens to disk access patterns when we do
On Sun, 31 Dec 2000, Daniel Phillips wrote:
>
> It's not that hard or inefficient to return the ENOSPC from the usual
> point. For example, make a gross overestimate of the space needed for
> the write, compare to a cached filesystem free space value less the
> amount deferred so far, and
Linus Torvalds wrote:
> There are only two real advantages to deferred writing:
>
> - not having to do get_block() at all for temp-files, as we never have to
>do the allocation if we end up removing the file.
>
>NOTE NOTE NOTE! The overhead for trying to get ENOSPC and quota errors
>
On Sun, 31 Dec 2000, Alexander Viro wrote:
>
> On Sun, 31 Dec 2000, Linus Torvalds wrote:
>
> > The other thing is that one of the common cases for writing is consecutive
> > writing to the end of the file. Now, you figure it out: if get_block()
> > really is a bottle-neck, why not cache the
On Sun, 31 Dec 2000, Linus Torvalds wrote:
> The other thing is that one of the common cases for writing is consecutive
> writing to the end of the file. Now, you figure it out: if get_block()
> really is a bottle-neck, why not cache the last tree lookup? You'd get a
> 99% hitrate for that
On Sun, Dec 31, 2000 at 08:33:01AM -0800, Linus Torvalds wrote:
> By doing a better job of caching stuff.
Caching can happen after we are been slow and we waited for I/O synchronously
the first time (bread).
How can we optimize the first time (when the indirect blocks are out of buffer
cache)
On Sun, 31 Dec 2000, Andrea Arcangeli wrote:
>
> get_block for large files can be improved using extents, but how can we
> implement a fast get_block without restructuring the on-disk format of the
> filesystem? (in turn using another filesystem instead of ext2?)
By doing a better job of
On Sat, Dec 30, 2000 at 06:28:39PM -0800, Linus Torvalds wrote:
> There are only two real advantages to deferred writing:
>
> - not having to do get_block() at all for temp-files, as we never have to
>do the allocation if we end up removing the file.
>
>NOTE NOTE NOTE! The overhead for
Hi,
On Sat, 30 Dec 2000, Linus Torvalds wrote:
> In fact, in a properly designed filesystem just a bit of low-level caching
> would easily make the average "get_block()" be very fast indeed. The fact
> that right now ext2 has not been optimized for this is _not_ a reason to
> design the VFS
Hi,
On Sat, 30 Dec 2000, Linus Torvalds wrote:
In fact, in a properly designed filesystem just a bit of low-level caching
would easily make the average "get_block()" be very fast indeed. The fact
that right now ext2 has not been optimized for this is _not_ a reason to
design the VFS layer
On Sat, Dec 30, 2000 at 06:28:39PM -0800, Linus Torvalds wrote:
There are only two real advantages to deferred writing:
- not having to do get_block() at all for temp-files, as we never have to
do the allocation if we end up removing the file.
NOTE NOTE NOTE! The overhead for
On Sun, 31 Dec 2000, Andrea Arcangeli wrote:
get_block for large files can be improved using extents, but how can we
implement a fast get_block without restructuring the on-disk format of the
filesystem? (in turn using another filesystem instead of ext2?)
By doing a better job of caching
On Sun, Dec 31, 2000 at 08:33:01AM -0800, Linus Torvalds wrote:
By doing a better job of caching stuff.
Caching can happen after we are been slow and we waited for I/O synchronously
the first time (bread).
How can we optimize the first time (when the indirect blocks are out of buffer
cache)
On Sun, 31 Dec 2000, Linus Torvalds wrote:
The other thing is that one of the common cases for writing is consecutive
writing to the end of the file. Now, you figure it out: if get_block()
really is a bottle-neck, why not cache the last tree lookup? You'd get a
99% hitrate for that common
On Sun, 31 Dec 2000, Alexander Viro wrote:
On Sun, 31 Dec 2000, Linus Torvalds wrote:
The other thing is that one of the common cases for writing is consecutive
writing to the end of the file. Now, you figure it out: if get_block()
really is a bottle-neck, why not cache the last tree
Linus Torvalds wrote:
There are only two real advantages to deferred writing:
- not having to do get_block() at all for temp-files, as we never have to
do the allocation if we end up removing the file.
NOTE NOTE NOTE! The overhead for trying to get ENOSPC and quota errors
right
On Sun, 31 Dec 2000, Daniel Phillips wrote:
It's not that hard or inefficient to return the ENOSPC from the usual
point. For example, make a gross overestimate of the space needed for
the write, compare to a cached filesystem free space value less the
amount deferred so far, and fail to
Linus Torvalds wrote:
I do not believe that "get_block()" is as big of a problem as people make
it out to be.
I didn't mention get_block - disk accesses obviously far outweigh
filesystem cpu/cache usage in overall impact. The question is, what
happens to disk access patterns when we do the
On Sun, 31 Dec 2000, Daniel Phillips wrote:
Linus Torvalds wrote:
I do not believe that "get_block()" is as big of a problem as people make
it out to be.
I didn't mention get_block - disk accesses obviously far outweigh
filesystem cpu/cache usage in overall impact. The question is,
Hi,
On Sun, 31 Dec 2000, Linus Torvalds wrote:
Let me repeat myself one more time:
I do not believe that "get_block()" is as big of a problem as people make
it out to be.
The real problem is that get_block() doesn't scale and it's very hard to
do. A recursive per inode-semaphore might
On Sun, 31 Dec 2000, Roman Zippel wrote:
On Sun, 31 Dec 2000, Linus Torvalds wrote:
Let me repeat myself one more time:
I do not believe that "get_block()" is as big of a problem as people make
it out to be.
The real problem is that get_block() doesn't scale and it's very
Hi,
On Sun, 31 Dec 2000, Linus Torvalds wrote:
cached_allocation = NULL;
repeat:
spin_lock();
result = try_to_find_existing();
if (!result) {
if (!cached_allocation) {
On Mon, 1 Jan 2001, Roman Zippel wrote:
I just rechecked that, but I don't see no superblock lock here, it uses
the kernel_lock instead. Although Al could give the definitive answer for
this, he wrote it. :)
No superblock lock in get_block() proper. Tons of it in the dungheap called
On Sat, Dec 30, 2000 at 08:50:52PM -0500, Alexander Viro wrote:
> And its meaning for 2/3 of filesystems would be?
It should stay in the private part of the in-core superblock of course.
> I _doubt_ it. If it is a pagecache issue it should apply to NFS. It should
> apply to ramfs. It should
On Sun, 31 Dec 2000, Roman Zippel wrote:
>
> On Sun, 31 Dec 2000, Andrea Arcangeli wrote:
>
> > > estimate than just the data blocks it should not be hard to add an
> > > extra callback to the filesystem.
> >
> > Yes, I was thinking at this callback too. Such a callback is nearly the only
Hi,
On Sun, 31 Dec 2000, Andrea Arcangeli wrote:
> > estimate than just the data blocks it should not be hard to add an
> > extra callback to the filesystem.
>
> Yes, I was thinking at this callback too. Such a callback is nearly the only
> support we need from the filesystem to provide
On Sun, 31 Dec 2000, Andrea Arcangeli wrote:
> On Sat, Dec 30, 2000 at 03:00:43PM -0700, Eric W. Biederman wrote:
> > To get ENOSPC handling 99% correct all we need to do is decrement a counter,
> > that remembers how many disks blocks are free. If we need a better
>
> Yes, we need to add
Linus Torvalds <[EMAIL PROTECTED]> writes:
> On 30 Dec 2000, Eric W. Biederman wrote:
> >
> > One other thing to think about for the VFS/MM layer is limiting the
> > total number of dirty pages in the system (to what disk pressure shows
> > the disk can handle), to keep system performance
On Sat, Dec 30, 2000 at 03:00:43PM -0700, Eric W. Biederman wrote:
> To get ENOSPC handling 99% correct all we need to do is decrement a counter,
> that remembers how many disks blocks are free. If we need a better
Yes, we need to add one field to the in-core superblock to do this accounting.
On Sat, 30 Dec 2000, Alexander Viro wrote:
> Well, see above. I'm pretty nervous about breaking the ordering of metadata
> allocation. For pageout() we don't have such ordering. For write() we
> certainly do. Notice that reserving disk space upon write() and eating it
> later is _very_ messy job
On 30 Dec 2000, Eric W. Biederman wrote:
>
> One other thing to think about for the VFS/MM layer is limiting the
> total number of dirty pages in the system (to what disk pressure shows
> the disk can handle), to keep system performance smooth when swapping.
This is a separate issue, and I
Linus Torvalds <[EMAIL PROTECTED]> writes:
> In short, I don't see _those_ kinds of issues. I do see error reporting as
> a major issue, though. If we need to do proper low-level block allocation
> in order to get correct ENOSPC handling, then the win from doing deferred
> writes is not very
On Sat, 30 Dec 2000, Linus Torvalds wrote:
>
>
> On Sat, 30 Dec 2000, Alexander Viro wrote:
> >
> > Except that we've got file-expanding writes outside of ->i_sem. Thanks, but
> > no thanks.
>
> No, Al, the file size is still updated inside i_sem.
Then we are screwed. Look: we call
Linus writes:
> In short, I don't see _those_ kinds of issues. I do see error reporting as
> a major issue, though. If we need to do proper low-level block allocation
> in order to get correct ENOSPC handling, then the win from doing deferred
> writes is not very big.
It should be relatively
On Sat, 30 Dec 2000, Alexander Viro wrote:
>
> Except that we've got file-expanding writes outside of ->i_sem. Thanks, but
> no thanks.
No, Al, the file size is still updated inside i_sem.
Yes, it will do actual block allocation outside i_sem, but that is already
true of any mmap'ed writes,
On Sat, 30 Dec 2000, Daniel Phillips wrote:
> When I saw you put in the if (PageDirty) -->writepage and related code
> over the last couple of weeks I was wondering if you realize how close
> we are to having generic deferred file writing in the VFS. I took some
> time today to code this
On Sat, 30 Dec 2000, Daniel Phillips wrote:
>
> When I saw you put in the if (PageDirty) -->writepage and related code
> over the last couple of weeks I was wondering if you realize how close
> we are to having generic deferred file writing in the VFS.
I'm very aware of it indeed.
However,
When I saw you put in the if (PageDirty) -->writepage and related code
over the last couple of weeks I was wondering if you realize how close
we are to having generic deferred file writing in the VFS. I took some
time today to code this little hack and it comes awfully close to doing
the job.
When I saw you put in the if (PageDirty) --writepage and related code
over the last couple of weeks I was wondering if you realize how close
we are to having generic deferred file writing in the VFS. I took some
time today to code this little hack and it comes awfully close to doing
the job.
On Sat, 30 Dec 2000, Daniel Phillips wrote:
When I saw you put in the if (PageDirty) --writepage and related code
over the last couple of weeks I was wondering if you realize how close
we are to having generic deferred file writing in the VFS.
I'm very aware of it indeed.
However, it
On Sat, 30 Dec 2000, Daniel Phillips wrote:
When I saw you put in the if (PageDirty) --writepage and related code
over the last couple of weeks I was wondering if you realize how close
we are to having generic deferred file writing in the VFS. I took some
time today to code this little
On Sat, 30 Dec 2000, Alexander Viro wrote:
Except that we've got file-expanding writes outside of -i_sem. Thanks, but
no thanks.
No, Al, the file size is still updated inside i_sem.
Yes, it will do actual block allocation outside i_sem, but that is already
true of any mmap'ed writes, and
Linus writes:
In short, I don't see _those_ kinds of issues. I do see error reporting as
a major issue, though. If we need to do proper low-level block allocation
in order to get correct ENOSPC handling, then the win from doing deferred
writes is not very big.
It should be relatively
On Sat, 30 Dec 2000, Linus Torvalds wrote:
On Sat, 30 Dec 2000, Alexander Viro wrote:
Except that we've got file-expanding writes outside of -i_sem. Thanks, but
no thanks.
No, Al, the file size is still updated inside i_sem.
Then we are screwed. Look: we call write(). Twice.
Linus Torvalds [EMAIL PROTECTED] writes:
In short, I don't see _those_ kinds of issues. I do see error reporting as
a major issue, though. If we need to do proper low-level block allocation
in order to get correct ENOSPC handling, then the win from doing deferred
writes is not very big.
To
On 30 Dec 2000, Eric W. Biederman wrote:
One other thing to think about for the VFS/MM layer is limiting the
total number of dirty pages in the system (to what disk pressure shows
the disk can handle), to keep system performance smooth when swapping.
This is a separate issue, and I think
On Sat, 30 Dec 2000, Alexander Viro wrote:
Well, see above. I'm pretty nervous about breaking the ordering of metadata
allocation. For pageout() we don't have such ordering. For write() we
certainly do. Notice that reserving disk space upon write() and eating it
later is _very_ messy job -
On Sat, Dec 30, 2000 at 03:00:43PM -0700, Eric W. Biederman wrote:
To get ENOSPC handling 99% correct all we need to do is decrement a counter,
that remembers how many disks blocks are free. If we need a better
Yes, we need to add one field to the in-core superblock to do this accounting.
Linus Torvalds [EMAIL PROTECTED] writes:
On 30 Dec 2000, Eric W. Biederman wrote:
One other thing to think about for the VFS/MM layer is limiting the
total number of dirty pages in the system (to what disk pressure shows
the disk can handle), to keep system performance smooth when
On Sun, 31 Dec 2000, Andrea Arcangeli wrote:
On Sat, Dec 30, 2000 at 03:00:43PM -0700, Eric W. Biederman wrote:
To get ENOSPC handling 99% correct all we need to do is decrement a counter,
that remembers how many disks blocks are free. If we need a better
Yes, we need to add one field
Hi,
On Sun, 31 Dec 2000, Andrea Arcangeli wrote:
estimate than just the data blocks it should not be hard to add an
extra callback to the filesystem.
Yes, I was thinking at this callback too. Such a callback is nearly the only
support we need from the filesystem to provide allocate on
On Sun, 31 Dec 2000, Roman Zippel wrote:
On Sun, 31 Dec 2000, Andrea Arcangeli wrote:
estimate than just the data blocks it should not be hard to add an
extra callback to the filesystem.
Yes, I was thinking at this callback too. Such a callback is nearly the only
support we
On Sat, Dec 30, 2000 at 08:50:52PM -0500, Alexander Viro wrote:
And its meaning for 2/3 of filesystems would be?
It should stay in the private part of the in-core superblock of course.
I _doubt_ it. If it is a pagecache issue it should apply to NFS. It should
apply to ramfs. It should apply
72 matches
Mail list logo