----- Original Message ----- | Hi, | | On Wed, 2012-07-11 at 14:07 -0400, Bob Peterson wrote: | [snip] | > | > | What is the difference between rs_free and rs_blks? Shouldn't | > | these | > | two | > | always be identical, since there is no point in reserving blocks | > | which | > | are not free. | > | > I guess I used a misleading variable name. The two variables have | > two meanings and both are needed. I renamed variable rs_blks to | > rs_len | > because it represents the length of the reservation. | > | Thats not really answering the question though... all the blocks in | the | reservation must be free, otherwise there is no point in reserving | them. | So rs_free should be identical to rs_len or whatever it is called. | Either that or maybe I'm not understanding why there are two | different | variables?
The variables and their meanings are as follows: 1. rs_start - this is where the block reservation originally started. This never changes during the life of the reservation. 2. rs_len - this is the length of the reservation, in blocks. This never changes during the life of the reservation. 3. rs_free - this is how many of those blocks are free. This is decremented every time a block is claimed from the reservation. So the number of blocks "used" or "claimed" from the reservation is len - free. We could likely accomplish the same thing with only two variables, by bumping rs_start and subtracting rs_len when every block is claimed from the reservation, but I'm also using rs_start as a means of keeping the reservations aligned in the bitmap. I think keeping the reservations on u64 boundaries gives us the best performance for function memchr_inv, which I think is optimized to use word compares where it can. Doing it my way also makes it easier to read the trace points: You can see where the reservation started, what's been claimed and what's free. It's easier to detect problems with overlapping reservations and such. | [snip] | > | | > | I'm not sure that I understand this comment at all. Currently | > | with | > | directories we never deallocate any blocks at all until the | > | directory | > | is | > | deallocated when it is unlinked. We will want to extend this to | > | directories eventually, even if we don't do that immediately. | > | > I clarified the comment to make it more clear what's going on. | > I'm talking about gaps in the _reservation_ not gaps in the blocks. | > The current algorithm makes assumptions based on the fact that | > block | > reservations don't have gaps, and the "next" free block will be the | > successor to the last claimed. If you use reservations for | > directories, | > what can happen is that two files may be created, which claims two | > blocks in the reservation. If the first file is deleted from the | > directory, that block becomes a "hole" in the reservation, which | > breaks | > the code with its current assumptions. We either have to: | > (a) keep the current assumptions which make block claims faster, or | > (b) Make no such assumptions and implement a bitmap-like search of | > the | > reservation that can fill holes. It wouldn't be too tough to | > do, | > especially since we already have nicely tuned functions to do | > it. | > I'm just worried that it's going to hurt performance. | > | That is just a bug in the way we are doing the allocations. The | allocation of new inodes should be done based on the inode's own | reservation, and not on the reservation of its parent directory. That | is | something else on the "to fix" list, but it is complicated to do, No, it goes beyond that. It has to do with the way the block accounting is done for the reservations. If a big write is trying to write 7 blocks (let's say in a multi-page write, which isn't implemented yet, but similar things happen today) and rs_free says there are 7 free blocks in the reservation, it claims them all starting with rs_start + (rs_len - rs_free). If there are "holes" in the reservation, that would throw the whole thing off. It would have to do a bitmap search for the reservation to figure out where the first available block is. If there are 7 free blocks, but one of them is a "hole" and six are contiguous, it has to bitmap-search of the reservation to find each of the 7. On the other hand, if we allow holes and adjust the algorithm appropriately, I think the file system will end up being more fragmented than the current algorithm. This is written with the thought that files will have larger runs of data blocks and metadata blocks can fill in the holes left behind. The other approach that I talked about above (incrementing the starting block and decrementing the size) would solve this problem, but the deleted file would force a block to be "left behind" for a future reservation to find, which would likely add to the fragmentation of the file system. I could be wrong about that, and we could prototype it to find out for sure. (IOW, it may not be any worse, since we're talking about directories which bypass the reservations and do individual searches anyway). Regards, Bob Peterson Red Hat File Systems