Hi,
On 04/22/2014 05:16 PM, chru...@suse.cz wrote:
> Hi!
>> I observe that mmap_11-4 fails in my x86 environment with:
>>
>> Test FAILED: Modification of the partial page at the end of an object is
>> written out
> I've seen some failures lately too but haven't had time to look into
> these yet.
>
>> I did some googling and found that you rose this problem in 2012:
>> http://linux-kernel.2935.n7.nabble.com/Partialy-mapped-page-stays-in-page-cache-after-unmap-td379857.html
>>
>> I'm interested what you meant in your patch for the mmap's manual page
>> by "In some cases, this could be fixed by calling msync(2) before the
>> unmap takes place;"
>>
>> What are these "some cases"?
> That depends on whether is the mmaped file backed up by a disk based
> filesystem or not.
>
> Mapped pages are stored in a cache, so if you modify content of a page
> that is beyond the mapping but inside the last page the data stays there
> till they are written back to the disk and reloaded into the cache
> (which can be forced by the msync()). For memory backed filesystems
> (tmpfs etc) msync() is no-op because there is no permanent storage to
> write the data to, so the rest of partial page is never cleared.
I have also met this fail in RHEL7U0RC and looked into some kernel code, here
is the possible reason.
When you call msync() in an ext4 file system, ext4_bio_write_page will be
called to write back. Here is the source code in RHEL7.0RC:
--------------------------------------------------------------------------------------------
int ext4_bio_write_page(struct ext4_io_submit *io, struct page *page, int len,
struct writeback_control *wbc)
{
struct inode *inode = page->mapping->host;
unsigned block_start, blocksize;
struct buffer_head *bh, *head;
int ret = 0;
int nr_submitted = 0;
blocksize = 1 << inode->i_blkbits;
BUG_ON(!PageLocked(page));
BUG_ON(PageWriteback(page));
set_page_writeback(page);
ClearPageError(page);
......
bh = head = page_buffers(page);
do {
block_start = bh_offset(bh);
if (block_start >= len) {
/*
* Comments copied from block_write_full_page_endio:
*
* The page straddles i_size. It must be zeroed out on
* each and every writepage invocation because it may
* be mmapped. "A file is mapped in multiples of the
* page size. For a file that is not a multiple of
* the page size, the remaining memory is zeroed when
* mapped, and writes to that region are not written
* out to the file."
*/
zero_user_segment(page, block_start,
block_start + blocksize);
clear_buffer_dirty(bh);
set_buffer_uptodate(bh);
continue;
}
......
} while ((bh = bh->b_this_page) != head);
--------------------------------------------------------------------------------------------
I deleted some irrelevant code.
The variable len is computed by the following code:
loff_t size = i_size_read(inode); // file's length
if (index == size >> PAGE_CACHE_SHIFT)
len = size & ~PAGE_CACHE_MASK;
else
len = PAGE_CACHE_SIZE;
That means len is the valid file length in every page.
When ext4 file system's block size is 1024, then there will be 4 struct buffer
head attached to
this page. See the above "do... while ..." statements in ext4_bio_write_page().
"block_start = bh_offset(bh);" will make block_start be 0 for the first buffer
head, 1024 for the second,
2048 for the third, 3072 for the forth.
So in the reproduce program written by Cyril, in this case, len is 2048, so
the "if (block_start >= len) "
condition will be satisfied in the third and forth iteration, so
"zero_user_segment(page, block_start, block_start + blocksize);" will
be called, then the content beyond the file's end will be zeroed, so the
reproduce program will succeed.
But when the ext4 file system's block size if 4096, then there will only on
buffer head attached to
this page, then len is 2048, "while ((bh = bh->b_this_page) != head);"
statement will make the "do ... while..."
statement execute once. In the first iteration, "block_start = bh_offset(bh); "
will make
block_start is 0, then in the first iteration, " if (block_start >= len) "
won't be satisfied,
zero_user_segment() won't be called, so the content in current page beyond
the file's end will not be zeroed, so
the reproduce program fails.
I haven't check the upstream kernel code yet. It seem that this is a ext4 bug.
In RHEL6.5GA,
block_write_full_page() will be called to do work similar to
ext4_bio_write_page, this function does
not do the zero work in unit of struct buffer head, so this bug is not exist.
To: Stanislav Kholmanskikh
Would you please check the ext4 file system's block size when you have tests,
thanks.
Have a comparison between block size 1024 and 4096, thanks.
Regards,
Xiaoguang Wang
>
>> I took your test program from the man thread and executed it on KVM and
>> on an Sun Ultra 45.
>>
>> Here are the results:
>>
>> 1. Debian 7 + KVM + 3.2.0-4-amd64
>> * without msync() it fails
>> * with msync() it fails
>>
>> 2. Debian 7 + Ultra + 3.2.0-4-sparc64-smp
>> * without msync() it fails
>> * with msync() it passes (!!!)
>>
>> So it looks like msync() on x86 doesn't help us. The question here if
>> it's a linux+x86 bug or not...
> Is the /tmp/ filesystem the same?
>
>> Could you check your test program and mmap_11-4 in your environment?
> Added to my TODO but I'm not sure if I can get to this till the end of
> the week.
>
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list