Hi,
On 05/06/2014 05:24 PM, Xiaoguang Wang wrote:
>>> I did some googling and found that you rose this problem in 2012:
>>> http://linux-kernel.2935.n7.nabble.com/Partialy-mapped-page-stays-in-page-cache-after-unmap-td379857.html
>>>
>>> I'm interested what you meant in your patch for the mmap's manual page
>>> by "In some cases, this could be fixed by calling msync(2) before the
>>> unmap takes place;"
>>>
>>> What are these "some cases"?
>> That depends on whether is the mmaped file backed up by a disk based
>> filesystem or not.
>>
>> Mapped pages are stored in a cache, so if you modify content of a page
>> that is beyond the mapping but inside the last page the data stays there
>> till they are written back to the disk and reloaded into the cache
>> (which can be forced by the msync()). For memory backed filesystems
>> (tmpfs etc) msync() is no-op because there is no permanent storage to
>> write the data to, so the rest of partial page is never cleared.
>
I'd like to make a summary about this discussion.
Ext4 community has confirmed that this fail is caused by a ext4 bug,
please see this thread's discussion:
http://www.spinics.net/lists/linux-ext4/msg43560.html
And this bug has been fixed by Jan Kara, please see this url:
http://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=ce734add9a545cbe23584b20b6bb8ac3c2f53b34
I have also tested this patch, it really works. The mmap_11-4 will succeed.
Note, please ensure you have test in a ext4 file system, sometimes /tmp's
file system is tmpfs.
Regards,
Xiaoguang Wang
> I have also met this fail in RHEL7U0RC and looked into some kernel code, here
> is the possible reason.
> When you call msync() in an ext4 file system, ext4_bio_write_page will be
> called to write back. Here is the source code in RHEL7.0RC:
> --------------------------------------------------------------------------------------------
> int ext4_bio_write_page(struct ext4_io_submit *io, struct page *page, int
> len, struct writeback_control *wbc)
> {
> struct inode *inode = page->mapping->host;
> unsigned block_start, blocksize;
> struct buffer_head *bh, *head;
> int ret = 0;
> int nr_submitted = 0;
>
> blocksize = 1 << inode->i_blkbits;
>
> BUG_ON(!PageLocked(page));
> BUG_ON(PageWriteback(page));
>
> set_page_writeback(page);
> ClearPageError(page);
>
> ......
>
> bh = head = page_buffers(page);
> do {
> block_start = bh_offset(bh);
> if (block_start >= len) {
> /*
> * Comments copied from block_write_full_page_endio:
> *
> * The page straddles i_size. It must be zeroed out
> on
> * each and every writepage invocation because it may
> * be mmapped. "A file is mapped in multiples of the
> * page size. For a file that is not a multiple of
> * the page size, the remaining memory is zeroed
> when
> * mapped, and writes to that region are not written
> * out to the file."
> */
> zero_user_segment(page, block_start,
> block_start + blocksize);
> clear_buffer_dirty(bh);
> set_buffer_uptodate(bh);
> continue;
> }
> ......
> } while ((bh = bh->b_this_page) != head);
> --------------------------------------------------------------------------------------------
> I deleted some irrelevant code.
>
> The variable len is computed by the following code:
> loff_t size = i_size_read(inode); // file's length
> if (index == size >> PAGE_CACHE_SHIFT)
> len = size & ~PAGE_CACHE_MASK;
> else
> len = PAGE_CACHE_SIZE;
>
> That means len is the valid file length in every page.
>
> When ext4 file system's block size is 1024, then there will be 4 struct
> buffer head attached to
> this page. See the above "do... while ..." statements in
> ext4_bio_write_page().
>
> "block_start = bh_offset(bh);" will make block_start be 0 for the first
> buffer head, 1024 for the second,
> 2048 for the third, 3072 for the forth.
>
> So in the reproduce program written by Cyril, in this case, len is 2048, so
> the "if (block_start >= len) "
> condition will be satisfied in the third and forth iteration, so
> "zero_user_segment(page, block_start, block_start + blocksize);" will
> be called, then the content beyond the file's end will be zeroed, so the
> reproduce program will succeed.
>
> But when the ext4 file system's block size if 4096, then there will only on
> buffer head attached to
> this page, then len is 2048, "while ((bh = bh->b_this_page) != head);"
> statement will make the "do ... while..."
> statement execute once. In the first iteration, "block_start = bh_offset(bh);
> " will make
> block_start is 0, then in the first iteration, " if (block_start >= len) "
> won't be satisfied,
> zero_user_segment() won't be called, so the content in current page beyond
> the file's end will not be zeroed, so
> the reproduce program fails.
>
> I haven't check the upstream kernel code yet. It seem that this is a ext4
> bug. In RHEL6.5GA,
> block_write_full_page() will be called to do work similar to
> ext4_bio_write_page, this function does
> not do the zero work in unit of struct buffer head, so this bug is not exist.
>
> To: Stanislav Kholmanskikh
> Would you please check the ext4 file system's block size when you have tests,
> thanks.
> Have a comparison between block size 1024 and 4096, thanks.
>
>
> Regards,
> Xiaoguang Wang
>>> I took your test program from the man thread and executed it on KVM and
>>> on an Sun Ultra 45.
>>>
>>> Here are the results:
>>>
>>> 1. Debian 7 + KVM + 3.2.0-4-amd64
>>> * without msync() it fails
>>> * with msync() it fails
>>>
>>> 2. Debian 7 + Ultra + 3.2.0-4-sparc64-smp
>>> * without msync() it fails
>>> * with msync() it passes (!!!)
>>>
>>> So it looks like msync() on x86 doesn't help us. The question here if
>>> it's a linux+x86 bug or not...
>> Is the /tmp/ filesystem the same?
>>
>>> Could you check your test program and mmap_11-4 in your environment?
>> Added to my TODO but I'm not sure if I can get to this till the end of
>> the week.
>>
>
>
>
> ------------------------------------------------------------------------------
> Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
> • 3 signs your SCM is hindering your productivity
> • Requirements for releasing software faster
> • Expert tips and advice for migrating your SCM now
> http://p.sf.net/sfu/perforce
>
>
> _______________________________________________
> Ltp-list mailing list
> Ltp-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ltp-list
------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list