Hi,

On 05/06/2014 05:24 PM, Xiaoguang Wang wrote:
>>> I did some googling and found that you rose this problem in 2012:
>>> http://linux-kernel.2935.n7.nabble.com/Partialy-mapped-page-stays-in-page-cache-after-unmap-td379857.html
>>>
>>> I'm interested what you meant in your patch for the mmap's manual page 
>>> by "In some cases, this could be fixed by calling msync(2) before the 
>>> unmap takes place;"
>>>
>>> What are these "some cases"?
>> That depends on whether is the mmaped file backed up by a disk based
>> filesystem or not.
>>
>> Mapped pages are stored in a cache, so if you modify content of a page
>> that is beyond the mapping but inside the last page the data stays there
>> till they are written back to the disk and reloaded into the cache
>> (which can be forced by the msync()). For memory backed filesystems
>> (tmpfs etc) msync() is no-op because there is no permanent storage to
>> write the data to, so the rest of partial page is never cleared.
>

I'd like to make a summary about this discussion.
Ext4 community has confirmed that this fail is caused by a ext4 bug,
please see this thread's discussion: 
http://www.spinics.net/lists/linux-ext4/msg43560.html

And this bug has been fixed by Jan Kara, please see this url:
http://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/commit/?h=dev&id=ce734add9a545cbe23584b20b6bb8ac3c2f53b34

I have also tested this patch, it really works.  The mmap_11-4 will succeed.
Note, please ensure you have test in a ext4 file system, sometimes /tmp's
file system is tmpfs.

Regards,
Xiaoguang Wang

> I have also met this fail in RHEL7U0RC and looked into some kernel code, here
> is the possible reason.
> When you call msync() in an ext4 file system, ext4_bio_write_page will be
> called to write back. Here is the source code in RHEL7.0RC:
> --------------------------------------------------------------------------------------------
> int ext4_bio_write_page(struct ext4_io_submit *io, struct page *page, int 
> len, struct writeback_control *wbc)
>  {
>          struct inode *inode = page->mapping->host;
>          unsigned block_start, blocksize;
>          struct buffer_head *bh, *head;
>          int ret = 0;
>          int nr_submitted = 0;
>  
>          blocksize = 1 << inode->i_blkbits;
>  
>          BUG_ON(!PageLocked(page));
>          BUG_ON(PageWriteback(page));
>  
>          set_page_writeback(page);
>          ClearPageError(page);
>  
>          ......
>
>          bh = head = page_buffers(page);
>          do {
>                  block_start = bh_offset(bh);
>                  if (block_start >= len) {
>                          /*
>                           * Comments copied from block_write_full_page_endio:
>                           *
>                           * The page straddles i_size.  It must be zeroed out 
> on
>                           * each and every writepage invocation because it may
>                           * be mmapped.  "A file is mapped in multiples of the
>                           * page size.  For a file that is not a multiple of
>                           * the  page size, the remaining memory is zeroed 
> when
>                           * mapped, and writes to that region are not written
>                           * out to the file."
>                           */
>                          zero_user_segment(page, block_start,
>                                            block_start + blocksize);
>                          clear_buffer_dirty(bh);
>                          set_buffer_uptodate(bh);
>                          continue;
>                  }
>                  ......
>          } while ((bh = bh->b_this_page) != head);
> --------------------------------------------------------------------------------------------
> I deleted some irrelevant code.
>
> The variable len is computed by the following code:
> loff_t size = i_size_read(inode); // file's length
> if (index == size >> PAGE_CACHE_SHIFT)
>         len = size & ~PAGE_CACHE_MASK;
> else
>         len = PAGE_CACHE_SIZE;
>
> That means len is the valid file length in every page.
>
> When ext4 file system's block size is 1024, then there will be 4 struct 
> buffer head attached to
> this page. See the above "do... while ..." statements in 
> ext4_bio_write_page().
>
> "block_start = bh_offset(bh);" will make  block_start be 0 for the first 
> buffer head, 1024 for the second,
> 2048 for the third, 3072 for the forth.
>
> So in the reproduce program written by Cyril, in this case, len is 2048,  so 
> the  "if (block_start >= len) "
> condition will be satisfied in the third and forth iteration, so 
> "zero_user_segment(page, block_start, block_start + blocksize);" will
> be called, then the content beyond the file's end will be zeroed, so the 
> reproduce program will succeed.
>
> But when the ext4 file system's block size if 4096, then there will only on 
> buffer head attached to
> this page, then len is 2048,  "while ((bh = bh->b_this_page) != head);" 
> statement  will make the "do ... while..."
> statement execute once. In the first iteration, "block_start = bh_offset(bh); 
> " will make
> block_start is 0, then in the first iteration, " if (block_start >= len) "  
> won't be satisfied,
> zero_user_segment() won't be called,  so the content in current page  beyond 
> the file's end will not be zeroed, so
> the reproduce program fails.
>
> I haven't check the upstream kernel code yet. It seem that this is a ext4 
> bug. In RHEL6.5GA,
> block_write_full_page() will be called to do work similar to 
> ext4_bio_write_page, this function does
> not do the zero work in unit of struct buffer head, so this bug is not exist.
>
> To: Stanislav Kholmanskikh
> Would you please check the ext4 file system's block size when you have tests, 
> thanks.
> Have a comparison between block size 1024 and 4096, thanks.
>
>
> Regards,
> Xiaoguang Wang
>>> I took your test program from the man thread and executed it on KVM and 
>>> on an Sun Ultra 45.
>>>
>>> Here are the results:
>>>
>>> 1. Debian 7 + KVM + 3.2.0-4-amd64
>>>   * without msync() it fails
>>>   * with msync() it fails
>>>
>>> 2. Debian 7 + Ultra + 3.2.0-4-sparc64-smp
>>>   * without msync() it fails
>>>   * with msync() it passes (!!!)
>>>
>>> So it looks like msync() on x86 doesn't help us. The question here if 
>>> it's a linux+x86 bug or not...
>> Is the /tmp/ filesystem the same?
>>
>>> Could you check your test program and mmap_11-4 in your environment?
>> Added to my TODO but I'm not sure if I can get to this till the end of
>> the week.
>>
>
>
>
> ------------------------------------------------------------------------------
> Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
> &#149; 3 signs your SCM is hindering your productivity
> &#149; Requirements for releasing software faster
> &#149; Expert tips and advice for migrating your SCM now
> http://p.sf.net/sfu/perforce
>
>
> _______________________________________________
> Ltp-list mailing list
> Ltp-list@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/ltp-list

------------------------------------------------------------------------------
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
_______________________________________________
Ltp-list mailing list
Ltp-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ltp-list

Reply via email to