On 2021/2/18 下午9:39, Matthew Wilcox wrote:
On Thu, Feb 18, 2021 at 08:42:14PM +0800, Qu Wenruo wrote:
On 2021/2/18 下午8:15, Matthew Wilcox wrote:
Yes, this is a known limitation. Some vendors have gone to the trouble
of introducing a new page_index_t. I'm not convinced this is a problem
worth solving. There are very few 32-bit systems with this much storage
on a single partition (everything should work fine if you take a 20TB
drive and partition it into two 10TB partitions).
What would happen if a user just tries to write 4K at file offset 16T
fir a sparse file?
Would it be blocked by other checks before reaching the underlying fs?
/* Page cache limit. The filesystems should put that into their s_maxbytes
limits, otherwise bad things can happen in VM. */
#define MAX_LFS_FILESIZE ((loff_t)ULONG_MAX << PAGE_SHIFT)
#define MAX_LFS_FILESIZE ((loff_t)LLONG_MAX)
This is especially true for btrfs, which has its internal address space
(and it can be any aligned U64 value).
Even 1T btrfs can have its metadata at its internal bytenr way larger
than 1T. (although those ranges still needs to be mapped inside the device).
Sounds like btrfs has a problem to fix.
You're kinda right. Btrfs metadata uses an inode to organize the whole
metadata as a file, but that doesn't take the limit into consideration.
Although to fix it there will be tons of new problems.
We will have cases like the initial fs meets the limit, but when user
wants to do something like balance, then it may go beyond the limit and
And when such problem happens, users won't be happy anyway.
And considering the reporter is already using 32bit with 10T+ storage, I
doubt if it's really not worthy.
BTW, what would be the extra cost by converting page::index to u64?
I know tons of printk() would cause warning, but most 64bit systems
should not be affected anyway.
No effect for 64-bit systems, other than the churn.
For 32-bit systems, it'd have some pretty horrible overhead. You don't
just have to touch the page cache, you have to convert the XArray.
It's doable (I mean, it's been done), but it's very costly for all the
32-bit systems which don't use a humongous filesystem. And we could
minimise that overhead with a typedef, but then the source code gets
harder to work with.
So it means the 32bit archs are already 2nd tier targets for at least
upstream linux kernel?
Or would it be possible to make it an option to make the index u64?
So guys who really wants large file support can enable it while most
other 32bit guys can just keep the existing behavior?