date:20171223

Re: [PATCH 4.4 00/78] 4.4.108-stable review

2017-12-23 Thread Greg Kroah-Hartman

On Sat, Dec 23, 2017 at 05:30:42PM -0800, Guenter Roeck wrote:
> On 12/22/2017 12:45 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.108 release.
> > There are 78 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sun Dec 24 08:45:30 UTC 2017.
> > Anything received after that time might be too late.
> > 
> 
> For v4.4.107-79-g0989207:
> 
> Build results:
>   total: 145 pass: 144 fail: 1
> Failed builds:
>   alpha:allmodconfig
> Qemu test results:
>   total: 118 pass: 118 fail: 0
> 
> The alpha fix is missing from 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> branch linux-4.4.y; it points to the same sha.

Ah, I only updated my quilt tree, sorry about that, let me go push this
out now...

greg k-h

Re: [PATCH 4.4 00/78] 4.4.108-stable review

2017-12-23 Thread Greg Kroah-Hartman

On Sat, Dec 23, 2017 at 05:30:42PM -0800, Guenter Roeck wrote:
> On 12/22/2017 12:45 AM, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.4.108 release.
> > There are 78 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sun Dec 24 08:45:30 UTC 2017.
> > Anything received after that time might be too late.
> > 
> 
> For v4.4.107-79-g0989207:
> 
> Build results:
>   total: 145 pass: 144 fail: 1
> Failed builds:
>   alpha:allmodconfig
> Qemu test results:
>   total: 118 pass: 118 fail: 0
> 
> The alpha fix is missing from 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> branch linux-4.4.y; it points to the same sha.

Ah, I only updated my quilt tree, sorry about that, let me go push this
out now...

greg k-h

Re: [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-12-23 Thread Wei Wang


On 12/24/2017 12:45 PM, Tetsuo Handa wrote:

Matthew Wilcox wrote:

+   unsigned long pfn = page_to_pfn(page);
+   int ret;
+
+   *pfn_min = min(pfn, *pfn_min);
+   *pfn_max = max(pfn, *pfn_max);
+
+   do {
+   if (xb_preload(GFP_NOWAIT | __GFP_NOWARN) < 0)
+   return -ENOMEM;
+
+   ret = xb_set_bit(>page_xb, pfn);
+   xb_preload_end();
+   } while (unlikely(ret == -EAGAIN));

OK, so you don't need a spinlock because you're under a mutex?  But you
can't allocate memory because you're in the balloon driver, and so a
GFP_KERNEL allocation might recurse into your driver?

Right. We can't (directly or indirectly) depend on __GFP_DIRECT_RECLAIM && 
!__GFP_NORETRY
allocations because the balloon driver needs to handle OOM notifier callback.


Would GFP_NOIO
do the job?  I'm a little hazy on exactly how the balloon driver works.

GFP_NOIO implies __GFP_DIRECT_RECLAIM. In the worst case, it can lockup due to
the too small to fail memory allocation rule. GFP_NOIO | __GFP_NORETRY would 
work
if there is really a guarantee that GFP_NOIO | __GFP_NORETRY never depend on
__GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations, which is too subtle for me 
to
validate. The direct reclaim dependency is too complicated to validate.
I consider that !__GFP_DIRECT_RECLAIM is the future-safe choice.


What's the problem with (or how is it better than) the "GFP_NOWAIT | 
__GFP_NOWARN" we are using here?




If you can't preload with anything better than that, I think that
xb_set_bit() should attempt an allocation with GFP_NOWAIT | __GFP_NOWARN,
and then you can skip the preload; it has no value for you.

Yes, that's why I suggest directly using kzalloc() at xb_set_bit().


It has some possibilities to remove that preload if we also do the 
bitmap allocation in the xb_set_bit():


bitmap = rcu_dereference_raw(*slot);
if (!bitmap) {
bitmap = this_cpu_xchg(ida_bitmap, NULL);
if (!bitmap) {
bitmap = kmalloc(sizeof(*bitmap), gfp);
if (!bitmap)
return -ENOMEM;
}
}

But why not just follow the radix tree implementation style that puts 
the allocation in preload, which would be invoked with a more relaxed 
gfp in other use cases?
Its usage in virtio_balloon is just a little special that we need to put 
the allocation within the balloon_lock, which doesn't give us the 
benefit of using a relaxed gfp in preload, but it doesn't prevent us 
from living with the current APIs (i.e. the preload + xb_set pattern).
On the other side, if we do it as above, we have more things that need 
to consider. For example, what if the a use case just want the radix 
tree implementation style, which means it doesn't want allocation within 
xb_set(), then would we be troubled with how to avoid the allocation 
path in that case?


So, I think it is better to stick with the convention by putting the 
allocation in preload. Breaking the convention should show obvious 
advantages, IMHO.






@@ -173,8 +292,15 @@ static unsigned fill_balloon(struct virtio_balloon *vb, 
size_t num)
  
  	while ((page = balloon_page_pop())) {

balloon_page_enqueue(>vb_dev_info, page);
+   if (use_sg) {
+   if (xb_set_page(vb, page, _min, _max) < 0) {
+   __free_page(page);
+   continue;
+   }
+   } else {
+   set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+   }

Is this the right behaviour?

I don't think so. In the worst case, we can set no bit using xb_set_page().





   If we can't record the page in the xb,
wouldn't we rather send it across as a single page?


I think that we need to be able to fallback to !use_sg path when OOM.


I also have different thoughts:

1) For OOM, we have leak_balloon_sg_oom (oom has nothing to do with 
fill_balloon), which does not use xbitmap to record pages, thus no 
memory allocation.


2) If the memory is already under pressure, it is pointless to continue 
inflating memory to the host. We need to give thanks to the memory 
allocation failure reported by xbitmap, which gets us a chance to 
release the inflated pages that have been demonstrated to cause the 
memory pressure of the guest.



Best,
Wei

Re: [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-12-23 Thread Wei Wang


On 12/24/2017 12:45 PM, Tetsuo Handa wrote:

Matthew Wilcox wrote:

+   unsigned long pfn = page_to_pfn(page);
+   int ret;
+
+   *pfn_min = min(pfn, *pfn_min);
+   *pfn_max = max(pfn, *pfn_max);
+
+   do {
+   if (xb_preload(GFP_NOWAIT | __GFP_NOWARN) < 0)
+   return -ENOMEM;
+
+   ret = xb_set_bit(>page_xb, pfn);
+   xb_preload_end();
+   } while (unlikely(ret == -EAGAIN));

OK, so you don't need a spinlock because you're under a mutex?  But you
can't allocate memory because you're in the balloon driver, and so a
GFP_KERNEL allocation might recurse into your driver?

Right. We can't (directly or indirectly) depend on __GFP_DIRECT_RECLAIM && 
!__GFP_NORETRY
allocations because the balloon driver needs to handle OOM notifier callback.


Would GFP_NOIO
do the job?  I'm a little hazy on exactly how the balloon driver works.

GFP_NOIO implies __GFP_DIRECT_RECLAIM. In the worst case, it can lockup due to
the too small to fail memory allocation rule. GFP_NOIO | __GFP_NORETRY would 
work
if there is really a guarantee that GFP_NOIO | __GFP_NORETRY never depend on
__GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations, which is too subtle for me 
to
validate. The direct reclaim dependency is too complicated to validate.
I consider that !__GFP_DIRECT_RECLAIM is the future-safe choice.


What's the problem with (or how is it better than) the "GFP_NOWAIT | 
__GFP_NOWARN" we are using here?




If you can't preload with anything better than that, I think that
xb_set_bit() should attempt an allocation with GFP_NOWAIT | __GFP_NOWARN,
and then you can skip the preload; it has no value for you.

Yes, that's why I suggest directly using kzalloc() at xb_set_bit().


It has some possibilities to remove that preload if we also do the 
bitmap allocation in the xb_set_bit():


bitmap = rcu_dereference_raw(*slot);
if (!bitmap) {
bitmap = this_cpu_xchg(ida_bitmap, NULL);
if (!bitmap) {
bitmap = kmalloc(sizeof(*bitmap), gfp);
if (!bitmap)
return -ENOMEM;
}
}

But why not just follow the radix tree implementation style that puts 
the allocation in preload, which would be invoked with a more relaxed 
gfp in other use cases?
Its usage in virtio_balloon is just a little special that we need to put 
the allocation within the balloon_lock, which doesn't give us the 
benefit of using a relaxed gfp in preload, but it doesn't prevent us 
from living with the current APIs (i.e. the preload + xb_set pattern).
On the other side, if we do it as above, we have more things that need 
to consider. For example, what if the a use case just want the radix 
tree implementation style, which means it doesn't want allocation within 
xb_set(), then would we be troubled with how to avoid the allocation 
path in that case?


So, I think it is better to stick with the convention by putting the 
allocation in preload. Breaking the convention should show obvious 
advantages, IMHO.






@@ -173,8 +292,15 @@ static unsigned fill_balloon(struct virtio_balloon *vb, 
size_t num)
  
  	while ((page = balloon_page_pop())) {

balloon_page_enqueue(>vb_dev_info, page);
+   if (use_sg) {
+   if (xb_set_page(vb, page, _min, _max) < 0) {
+   __free_page(page);
+   continue;
+   }
+   } else {
+   set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
+   }

Is this the right behaviour?

I don't think so. In the worst case, we can set no bit using xb_set_page().





   If we can't record the page in the xb,
wouldn't we rather send it across as a single page?


I think that we need to be able to fallback to !use_sg path when OOM.


I also have different thoughts:

1) For OOM, we have leak_balloon_sg_oom (oom has nothing to do with 
fill_balloon), which does not use xbitmap to record pages, thus no 
memory allocation.


2) If the memory is already under pressure, it is pointless to continue 
inflating memory to the host. We need to give thanks to the memory 
allocation failure reported by xbitmap, which gets us a chance to 
release the inflated pages that have been demonstrated to cause the 
memory pressure of the guest.



Best,
Wei

Re: [PATCH v20 3/7 RESEND] xbitmap: add more operations

2017-12-23 Thread Wei Wang


On 12/23/2017 10:33 PM, Tetsuo Handa wrote:

+   bitmap = rcu_dereference_raw(*slot);
+   if (!bitmap) {
+   bitmap = this_cpu_xchg(ida_bitmap, NULL);
+   if (!bitmap)
+   return -ENOMEM;

I can't understand this. I can understand if it were

   BUG_ON(!bitmap);

because you called xb_preload().

But

/*
 * Regular test 2
 * set bit 2000, 2001, 2040
 * Next 1 in [0, 2048)  --> 2000
 * Next 1 in [2000, 2002)   --> 2000
 * Next 1 in [2002, 2041)   --> 2040
 * Next 1 in [2002, 2040)   --> none
 * Next 0 in [2000, 2048)   --> 2002
 * Next 0 in [2048, 2060)   --> 2048
 */
xb_preload(GFP_KERNEL);
assert(!xb_set_bit(, 2000));
assert(!xb_set_bit(, 2001));
assert(!xb_set_bit(, 2040));

[...]

xb_preload_end();

you are not calling xb_preload() prior to each xb_set_bit() call.
This means that, if each xb_set_bit() is not surrounded with
xb_preload()/xb_preload_end(), there is possibility of hitting
this_cpu_xchg(ida_bitmap, NULL) == NULL.

This is just a lazy test.  We "know" that the bits in the range 1024-2047
will all land in the same bitmap, so there's no need to preload for each
of them.

Testcases also serves as how to use that API.
Assuming such thing leads to incorrect usage.


If callers are aware that the bits that they going to record locate in 
the same bitmap, I think they should also perform the xb_ APIs with only 
one preload. So the test cases here have shown them a correct example. 
We can probably add some comments above to explain this.



Best,
Wei

Re: [PATCH v20 3/7 RESEND] xbitmap: add more operations

2017-12-23 Thread Wei Wang


On 12/23/2017 10:33 PM, Tetsuo Handa wrote:

+   bitmap = rcu_dereference_raw(*slot);
+   if (!bitmap) {
+   bitmap = this_cpu_xchg(ida_bitmap, NULL);
+   if (!bitmap)
+   return -ENOMEM;

I can't understand this. I can understand if it were

   BUG_ON(!bitmap);

because you called xb_preload().

But

/*
 * Regular test 2
 * set bit 2000, 2001, 2040
 * Next 1 in [0, 2048)  --> 2000
 * Next 1 in [2000, 2002)   --> 2000
 * Next 1 in [2002, 2041)   --> 2040
 * Next 1 in [2002, 2040)   --> none
 * Next 0 in [2000, 2048)   --> 2002
 * Next 0 in [2048, 2060)   --> 2048
 */
xb_preload(GFP_KERNEL);
assert(!xb_set_bit(, 2000));
assert(!xb_set_bit(, 2001));
assert(!xb_set_bit(, 2040));

[...]

xb_preload_end();

you are not calling xb_preload() prior to each xb_set_bit() call.
This means that, if each xb_set_bit() is not surrounded with
xb_preload()/xb_preload_end(), there is possibility of hitting
this_cpu_xchg(ida_bitmap, NULL) == NULL.

This is just a lazy test.  We "know" that the bits in the range 1024-2047
will all land in the same bitmap, so there's no need to preload for each
of them.

Testcases also serves as how to use that API.
Assuming such thing leads to incorrect usage.


If callers are aware that the bits that they going to record locate in 
the same bitmap, I think they should also perform the xb_ APIs with only 
one preload. So the test cases here have shown them a correct example. 
We can probably add some comments above to explain this.



Best,
Wei

Re: PROBLEM: consolidated IDT invalidation causes kexec to reboot

2017-12-23 Thread Alexandru Chirvasitu

Thank you for the swift reply!

On Sat, Dec 23, 2017 at 07:30:21PM -0800, Linus Torvalds wrote:
> On Sat, Dec 23, 2017 at 5:44 PM, Alexandru Chirvasitu
>  wrote:
> >
> > For testing purposes, I've altered machine_kexec_32.c making the
> > following toy commit. It naively undoes part of e802a51, solely to
> > confirm that's where it goes awry in my setup.
> 
> That's really funky.
> 
> The idt_invalidate() seems to do *exactly* the same thing. It uses
> "load_idt()" on an IDT with size 0, and the supplied address.
> 
> Can you disassemble your "set_idt()" code vs the "idt_invalidate()"?
>

I seem to have done some such thing just now, but please excuse some
poking around in the dark here (I've disassembled code exactly once
before: yesterday, in answering a similar request for more info in
another lkml thread..).

I'm actually not even certain the sequel is what you are asking.

I couldn't find the set_idt symbol in
arch/x86/kernel/machine_kexec_32.o. Google seemed to think this has
something to do with the 'static' marker for that function, so I made
another commit differing from the previous one only in that it removes
that marker (i.e. set_idt is now 'void' rather than 'static void').

I can now see that function with objdump. The relevant sections of
objdump -D on the two files are:

--- arch/x86/kernel/machine_kexec_32.o ---

0180 :
 180:   e8 fc ff ff ff  call   181 
 185:   55  push   %ebp
 186:   89 e5   mov%esp,%ebp
 188:   83 ec 0csub$0xc,%esp
 18b:   89 45 f8mov%eax,-0x8(%ebp)
 18e:   66 89 55 f6 mov%dx,-0xa(%ebp)
 192:   8d 45 f6lea-0xa(%ebp),%eax
 195:   65 8b 0d 14 00 00 00mov%gs:0x14,%ecx
 19c:   89 4d fcmov%ecx,-0x4(%ebp)
 19f:   31 c9   xor%ecx,%ecx
 1a1:   ff 15 20 00 00 00   call   *0x20
 1a7:   8b 45 fcmov-0x4(%ebp),%eax
 1aa:   65 33 05 14 00 00 00xor%gs:0x14,%eax
 1b1:   75 02   jne1b5 
 1b3:   c9  leave  
 1b4:   c3  ret
 1b5:   e8 fc ff ff ff  call   1b6 
 1ba:   8d b6 00 00 00 00   lea0x0(%esi),%esi

--

and

--- arch/x86/kernel/idt.o  ---

 :
   0:   e8 fc ff ff ff  call   1 
   5:   55  push   %ebp
   6:   89 e5   mov%esp,%ebp
   8:   83 ec 0csub$0xc,%esp
   b:   65 8b 15 14 00 00 00mov%gs:0x14,%edx
  12:   89 55 fcmov%edx,-0x4(%ebp)
  15:   31 d2   xor%edx,%edx
  17:   31 d2   xor%edx,%edx
  19:   89 45 f8mov%eax,-0x8(%ebp)
  1c:   8d 45 f6lea-0xa(%ebp),%eax
  1f:   66 89 55 f6 mov%dx,-0xa(%ebp)
  23:   ff 15 20 00 00 00   call   *0x20
  29:   8b 45 fcmov-0x4(%ebp),%eax
  2c:   65 33 05 14 00 00 00xor%gs:0x14,%eax
  33:   75 02   jne37 
  35:   c9  leave  
  36:   c3  ret
  37:   e8 fc ff ff ff  call   38 

---

I've also checked again that this newer compilation still gives a
well-behaved kexec.

> > Is this expected behaviour?
> 
> No. The code literally seems identical. The only difference is
> 
>  (a) where the 0 limit comes from
> 
>  (b) perhaps build flags and whether it is inlined or not due to being
> in a different file
> 
> and neither of those should matter, but maybe they do.
> 
> Which is why I'd like you to actually look at the generated code and
> see if you can see any difference..
> 
> Linus

Re: PROBLEM: consolidated IDT invalidation causes kexec to reboot

2017-12-23 Thread Alexandru Chirvasitu

Thank you for the swift reply!

On Sat, Dec 23, 2017 at 07:30:21PM -0800, Linus Torvalds wrote:
> On Sat, Dec 23, 2017 at 5:44 PM, Alexandru Chirvasitu
>  wrote:
> >
> > For testing purposes, I've altered machine_kexec_32.c making the
> > following toy commit. It naively undoes part of e802a51, solely to
> > confirm that's where it goes awry in my setup.
> 
> That's really funky.
> 
> The idt_invalidate() seems to do *exactly* the same thing. It uses
> "load_idt()" on an IDT with size 0, and the supplied address.
> 
> Can you disassemble your "set_idt()" code vs the "idt_invalidate()"?
>

I seem to have done some such thing just now, but please excuse some
poking around in the dark here (I've disassembled code exactly once
before: yesterday, in answering a similar request for more info in
another lkml thread..).

I'm actually not even certain the sequel is what you are asking.

I couldn't find the set_idt symbol in
arch/x86/kernel/machine_kexec_32.o. Google seemed to think this has
something to do with the 'static' marker for that function, so I made
another commit differing from the previous one only in that it removes
that marker (i.e. set_idt is now 'void' rather than 'static void').

I can now see that function with objdump. The relevant sections of
objdump -D on the two files are:

--- arch/x86/kernel/machine_kexec_32.o ---

0180 :
 180:   e8 fc ff ff ff  call   181 
 185:   55  push   %ebp
 186:   89 e5   mov%esp,%ebp
 188:   83 ec 0csub$0xc,%esp
 18b:   89 45 f8mov%eax,-0x8(%ebp)
 18e:   66 89 55 f6 mov%dx,-0xa(%ebp)
 192:   8d 45 f6lea-0xa(%ebp),%eax
 195:   65 8b 0d 14 00 00 00mov%gs:0x14,%ecx
 19c:   89 4d fcmov%ecx,-0x4(%ebp)
 19f:   31 c9   xor%ecx,%ecx
 1a1:   ff 15 20 00 00 00   call   *0x20
 1a7:   8b 45 fcmov-0x4(%ebp),%eax
 1aa:   65 33 05 14 00 00 00xor%gs:0x14,%eax
 1b1:   75 02   jne1b5 
 1b3:   c9  leave  
 1b4:   c3  ret
 1b5:   e8 fc ff ff ff  call   1b6 
 1ba:   8d b6 00 00 00 00   lea0x0(%esi),%esi

--

and

--- arch/x86/kernel/idt.o  ---

 :
   0:   e8 fc ff ff ff  call   1 
   5:   55  push   %ebp
   6:   89 e5   mov%esp,%ebp
   8:   83 ec 0csub$0xc,%esp
   b:   65 8b 15 14 00 00 00mov%gs:0x14,%edx
  12:   89 55 fcmov%edx,-0x4(%ebp)
  15:   31 d2   xor%edx,%edx
  17:   31 d2   xor%edx,%edx
  19:   89 45 f8mov%eax,-0x8(%ebp)
  1c:   8d 45 f6lea-0xa(%ebp),%eax
  1f:   66 89 55 f6 mov%dx,-0xa(%ebp)
  23:   ff 15 20 00 00 00   call   *0x20
  29:   8b 45 fcmov-0x4(%ebp),%eax
  2c:   65 33 05 14 00 00 00xor%gs:0x14,%eax
  33:   75 02   jne37 
  35:   c9  leave  
  36:   c3  ret
  37:   e8 fc ff ff ff  call   38 

---

I've also checked again that this newer compilation still gives a
well-behaved kexec.

> > Is this expected behaviour?
> 
> No. The code literally seems identical. The only difference is
> 
>  (a) where the 0 limit comes from
> 
>  (b) perhaps build flags and whether it is inlined or not due to being
> in a different file
> 
> and neither of those should matter, but maybe they do.
> 
> Which is why I'd like you to actually look at the generated code and
> see if you can see any difference..
> 
> Linus

Re: [PATCH 11/11] evm: Don't update hmacs in user ns mounts

2017-12-23 Thread Mimi Zohar

On Sun, 2017-12-24 at 00:12 -0500, Mimi Zohar wrote:
> Hi Serge,
> 
> On Fri, 2017-12-22 at 22:03 -0600, Serge E. Hallyn wrote:
> > On Fri, Dec 22, 2017 at 03:32:35PM +0100, Dongsu Park wrote:
> > > From: Seth Forshee 
> > > 
> > > The kernel should not calculate new hmacs for mounts done by
> > > non-root users. Update evm_calc_hmac_or_hash() to refuse to
> > > calculate new hmacs for mounts for non-init user namespaces.
> > > 
> > > Cc: linux-integr...@vger.kernel.org
> > > Cc: linux-security-mod...@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: James Morris 
> > > Cc: Mimi Zohar 
> > 
> > Hi Mimi,
> > 
> > does this change seem sufficient to you?
> 
> I think this is the correct behavior in the context of fuse file
> systems.  This patch, the "ima: define a new policy option named
> force" patch, and an updated IMA policy should be upstreamed together.
>  The cover letter should provide the motivation for these patches.

Ah, this patch is being upstreamed with the fuse mounts patches.  I
guess Seth is planning on posting the IMA policy changes for fuse
separately.

Mimi

Re: [PATCH 11/11] evm: Don't update hmacs in user ns mounts

2017-12-23 Thread Mimi Zohar

On Sun, 2017-12-24 at 00:12 -0500, Mimi Zohar wrote:
> Hi Serge,
> 
> On Fri, 2017-12-22 at 22:03 -0600, Serge E. Hallyn wrote:
> > On Fri, Dec 22, 2017 at 03:32:35PM +0100, Dongsu Park wrote:
> > > From: Seth Forshee 
> > > 
> > > The kernel should not calculate new hmacs for mounts done by
> > > non-root users. Update evm_calc_hmac_or_hash() to refuse to
> > > calculate new hmacs for mounts for non-init user namespaces.
> > > 
> > > Cc: linux-integr...@vger.kernel.org
> > > Cc: linux-security-mod...@vger.kernel.org
> > > Cc: linux-kernel@vger.kernel.org
> > > Cc: James Morris 
> > > Cc: Mimi Zohar 
> > 
> > Hi Mimi,
> > 
> > does this change seem sufficient to you?
> 
> I think this is the correct behavior in the context of fuse file
> systems.  This patch, the "ima: define a new policy option named
> force" patch, and an updated IMA policy should be upstreamed together.
>  The cover letter should provide the motivation for these patches.

Ah, this patch is being upstreamed with the fuse mounts patches.  I
guess Seth is planning on posting the IMA policy changes for fuse
separately.

Mimi

[PATCH 11/12] staging: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/staging/vboxvideo/vbox_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/vboxvideo/vbox_ttm.c 
b/drivers/staging/vboxvideo/vbox_ttm.c
index 4eb410a..4da1723 100644
--- a/drivers/staging/vboxvideo/vbox_ttm.c
+++ b/drivers/staging/vboxvideo/vbox_ttm.c
@@ -241,7 +241,6 @@ static struct ttm_bo_driver vbox_bo_driver = {
.verify_access = vbox_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int vbox_mm_init(struct vbox_private *vbox)
-- 
2.7.4

[PATCH 03/12] drm/bochs: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/bochs/bochs_mm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index c4cadb6..857755a 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -205,7 +205,6 @@ struct ttm_bo_driver bochs_bo_driver = {
.verify_access = bochs_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int bochs_mm_init(struct bochs_device *bochs)
-- 
2.7.4

[PATCH 01/12] drm/ttm: Use ttm_bo_default_io_mem_pfn if io_mem_pfn is NULL

2017-12-23 Thread Tan Xiaojun

From: Michal Srb 

The io_mem_pfn field was added in commit 
ea642c3216cb2a60d1c0e760ae47ee85c9c16447
and is called unconditionally. However, not all drivers were updated to set it.

Use the ttm_bo_default_io_mem_pfn function if a driver did not set its own.

Signed-off-by: Michal Srb 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index c8ebb75..e25a99b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -234,7 +234,10 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
if (bo->mem.bus.is_iomem) {
/* Iomem should not be marked encrypted */
cvma.vm_page_prot = pgprot_decrypted(cvma.vm_page_prot);
-   pfn = bdev->driver->io_mem_pfn(bo, page_offset);
+   if (bdev->driver->io_mem_pfn)
+   pfn = bdev->driver->io_mem_pfn(bo, page_offset);
+   else
+   pfn = ttm_bo_default_io_mem_pfn(bo, 
page_offset);
} else {
page = ttm->pages[page_offset];
if (unlikely(!page && i == 0)) {
-- 
2.7.4

[PATCH 11/12] staging: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/staging/vboxvideo/vbox_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/vboxvideo/vbox_ttm.c 
b/drivers/staging/vboxvideo/vbox_ttm.c
index 4eb410a..4da1723 100644
--- a/drivers/staging/vboxvideo/vbox_ttm.c
+++ b/drivers/staging/vboxvideo/vbox_ttm.c
@@ -241,7 +241,6 @@ static struct ttm_bo_driver vbox_bo_driver = {
.verify_access = vbox_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int vbox_mm_init(struct vbox_private *vbox)
-- 
2.7.4

[PATCH 03/12] drm/bochs: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/bochs/bochs_mm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index c4cadb6..857755a 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -205,7 +205,6 @@ struct ttm_bo_driver bochs_bo_driver = {
.verify_access = bochs_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int bochs_mm_init(struct bochs_device *bochs)
-- 
2.7.4

[PATCH 01/12] drm/ttm: Use ttm_bo_default_io_mem_pfn if io_mem_pfn is NULL

2017-12-23 Thread Tan Xiaojun

From: Michal Srb 

The io_mem_pfn field was added in commit 
ea642c3216cb2a60d1c0e760ae47ee85c9c16447
and is called unconditionally. However, not all drivers were updated to set it.

Use the ttm_bo_default_io_mem_pfn function if a driver did not set its own.

Signed-off-by: Michal Srb 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index c8ebb75..e25a99b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -234,7 +234,10 @@ static int ttm_bo_vm_fault(struct vm_fault *vmf)
if (bo->mem.bus.is_iomem) {
/* Iomem should not be marked encrypted */
cvma.vm_page_prot = pgprot_decrypted(cvma.vm_page_prot);
-   pfn = bdev->driver->io_mem_pfn(bo, page_offset);
+   if (bdev->driver->io_mem_pfn)
+   pfn = bdev->driver->io_mem_pfn(bo, page_offset);
+   else
+   pfn = ttm_bo_default_io_mem_pfn(bo, 
page_offset);
} else {
page = ttm->pages[page_offset];
if (unlikely(!page && i == 0)) {
-- 
2.7.4

[PATCH 12/12] drm/ttm: unexport ttm_bo_default_io_mem_pfn and make it static

2017-12-23 Thread Tan Xiaojun

No one will use this function except in ttm_bo_vm.c now. So unexport it
and make it static.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 23 +++
 include/drm/ttm/ttm_bo_api.h| 11 ---
 2 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index e25a99b..ff70fc96 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -92,6 +92,21 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
return ret;
 }
 
+/**
+ * ttm_bo_default_iomem_pfn - get a pfn for a page offset
+ *
+ * @bo: the BO we need to look up the pfn for
+ * @page_offset: offset inside the BO to look up.
+ *
+ * Calculate the PFN for iomem based mappings during page fault
+ */
+static unsigned long ttm_bo_default_io_mem_pfn(struct ttm_buffer_object *bo,
+  unsigned long page_offset)
+{
+   return ((bo->mem.bus.base + bo->mem.bus.offset) >> PAGE_SHIFT)
+   + page_offset;
+}
+
 static int ttm_bo_vm_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
@@ -407,14 +422,6 @@ static struct ttm_buffer_object *ttm_bo_vm_lookup(struct 
ttm_bo_device *bdev,
return bo;
 }
 
-unsigned long ttm_bo_default_io_mem_pfn(struct ttm_buffer_object *bo,
-   unsigned long page_offset)
-{
-   return ((bo->mem.bus.base + bo->mem.bus.offset) >> PAGE_SHIFT)
-   + page_offset;
-}
-EXPORT_SYMBOL(ttm_bo_default_io_mem_pfn);
-
 int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
struct ttm_bo_device *bdev)
 {
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index fa07be1..0b1ce05 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -711,17 +711,6 @@ extern int ttm_fbdev_mmap(struct vm_area_struct *vma,
  struct ttm_buffer_object *bo);
 
 /**
- * ttm_bo_default_iomem_pfn - get a pfn for a page offset
- *
- * @bo: the BO we need to look up the pfn for
- * @page_offset: offset inside the BO to look up.
- *
- * Calculate the PFN for iomem based mappings during page fault
- */
-unsigned long ttm_bo_default_io_mem_pfn(struct ttm_buffer_object *bo,
-   unsigned long page_offset);
-
-/**
  * ttm_bo_mmap - mmap out of the ttm device address space.
  *
  * @filp:  filp as input from the mmap method.
-- 
2.7.4

[PATCH 10/12] drm/vmwgfx: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index c705632..828dd59 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -859,5 +859,4 @@ struct ttm_bo_driver vmw_bo_driver = {
.fault_reserve_notify = _ttm_fault_reserve_notify,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
-- 
2.7.4

[PATCH 10/12] drm/vmwgfx: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
index c705632..828dd59 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c
@@ -859,5 +859,4 @@ struct ttm_bo_driver vmw_bo_driver = {
.fault_reserve_notify = _ttm_fault_reserve_notify,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
-- 
2.7.4

[PATCH 12/12] drm/ttm: unexport ttm_bo_default_io_mem_pfn and make it static

2017-12-23 Thread Tan Xiaojun

No one will use this function except in ttm_bo_vm.c now. So unexport it
and make it static.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 23 +++
 include/drm/ttm/ttm_bo_api.h| 11 ---
 2 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index e25a99b..ff70fc96 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -92,6 +92,21 @@ static int ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
return ret;
 }
 
+/**
+ * ttm_bo_default_iomem_pfn - get a pfn for a page offset
+ *
+ * @bo: the BO we need to look up the pfn for
+ * @page_offset: offset inside the BO to look up.
+ *
+ * Calculate the PFN for iomem based mappings during page fault
+ */
+static unsigned long ttm_bo_default_io_mem_pfn(struct ttm_buffer_object *bo,
+  unsigned long page_offset)
+{
+   return ((bo->mem.bus.base + bo->mem.bus.offset) >> PAGE_SHIFT)
+   + page_offset;
+}
+
 static int ttm_bo_vm_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
@@ -407,14 +422,6 @@ static struct ttm_buffer_object *ttm_bo_vm_lookup(struct 
ttm_bo_device *bdev,
return bo;
 }
 
-unsigned long ttm_bo_default_io_mem_pfn(struct ttm_buffer_object *bo,
-   unsigned long page_offset)
-{
-   return ((bo->mem.bus.base + bo->mem.bus.offset) >> PAGE_SHIFT)
-   + page_offset;
-}
-EXPORT_SYMBOL(ttm_bo_default_io_mem_pfn);
-
 int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
struct ttm_bo_device *bdev)
 {
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index fa07be1..0b1ce05 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -711,17 +711,6 @@ extern int ttm_fbdev_mmap(struct vm_area_struct *vma,
  struct ttm_buffer_object *bo);
 
 /**
- * ttm_bo_default_iomem_pfn - get a pfn for a page offset
- *
- * @bo: the BO we need to look up the pfn for
- * @page_offset: offset inside the BO to look up.
- *
- * Calculate the PFN for iomem based mappings during page fault
- */
-unsigned long ttm_bo_default_io_mem_pfn(struct ttm_buffer_object *bo,
-   unsigned long page_offset);
-
-/**
  * ttm_bo_mmap - mmap out of the ttm device address space.
  *
  * @filp:  filp as input from the mmap method.
-- 
2.7.4

[PATCH 08/12] drm/radeon: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 6ada64d..8595c76 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -844,7 +844,6 @@ static struct ttm_bo_driver radeon_bo_driver = {
.fault_reserve_notify = _bo_fault_reserve_notify,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int radeon_ttm_init(struct radeon_device *rdev)
-- 
2.7.4

[PATCH 08/12] drm/radeon: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 6ada64d..8595c76 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -844,7 +844,6 @@ static struct ttm_bo_driver radeon_bo_driver = {
.fault_reserve_notify = _bo_fault_reserve_notify,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int radeon_ttm_init(struct radeon_device *rdev)
-- 
2.7.4

[PATCH 06/12] drm/nouveau: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 435ff86..8de82a3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1667,5 +1667,4 @@ struct ttm_bo_driver nouveau_bo_driver = {
.fault_reserve_notify = _ttm_fault_reserve_notify,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
-- 
2.7.4

[PATCH 06/12] drm/nouveau: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 435ff86..8de82a3 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1667,5 +1667,4 @@ struct ttm_bo_driver nouveau_bo_driver = {
.fault_reserve_notify = _ttm_fault_reserve_notify,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
-- 
2.7.4

[PATCH 07/12] drm/qxl: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/qxl/qxl_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index ab48238..a86eaf9 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -393,7 +393,6 @@ static struct ttm_bo_driver qxl_bo_driver = {
.verify_access = _verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
.move_notify = _bo_move_notify,
 };
 
-- 
2.7.4

[PATCH 07/12] drm/qxl: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/qxl/qxl_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index ab48238..a86eaf9 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -393,7 +393,6 @@ static struct ttm_bo_driver qxl_bo_driver = {
.verify_access = _verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
.move_notify = _bo_move_notify,
 };
 
-- 
2.7.4

[PATCH 00/12] drm: add check if io_mem_pfn is NULL and cleanup

2017-12-23 Thread Tan Xiaojun

I found an OOPS when I used the mainline kernel for graphical tests in Hisilicon
D05, I do not know how to solve this problem until I saw your discussion on this
issue a month ago:

https://lists.freedesktop.org/archives/dri-devel/2017-November/159046.html

And my problem can be solved perfectly by your solution.

This is important for me, I want to solve this problem as soon as possible. So
I follow the result of your discussion, make and send these patches below.

If anything is not good, please point it out, thanks.

Michal Srb (1):
  drm/ttm: Use ttm_bo_default_io_mem_pfn if io_mem_pfn is NULL

Tan Xiaojun (11):
  drm/ast: remove the default io_mem_pfn set
  drm/bochs: remove the default io_mem_pfn set
  drm/cirrus: remove the default io_mem_pfn set
  drm/mgag200: remove the default io_mem_pfn set
  drm/nouveau: remove the default io_mem_pfn set
  drm/qxl: remove the default io_mem_pfn set
  drm/radeon: remove the default io_mem_pfn set
  drm/virtio: remove the default io_mem_pfn set
  drm/vmwgfx: remove the default io_mem_pfn set
  staging: remove the default io_mem_pfn set
  drm/ttm: unexport ttm_bo_default_io_mem_pfn and make it static

 drivers/gpu/drm/ast/ast_ttm.c  |  1 -
 drivers/gpu/drm/bochs/bochs_mm.c   |  1 -
 drivers/gpu/drm/cirrus/cirrus_ttm.c|  1 -
 drivers/gpu/drm/mgag200/mgag200_ttm.c  |  1 -
 drivers/gpu/drm/nouveau/nouveau_bo.c   |  1 -
 drivers/gpu/drm/qxl/qxl_ttm.c  |  1 -
 drivers/gpu/drm/radeon/radeon_ttm.c|  1 -
 drivers/gpu/drm/ttm/ttm_bo_vm.c| 28 +++-
 drivers/gpu/drm/virtio/virtgpu_ttm.c   |  1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |  1 -
 drivers/staging/vboxvideo/vbox_ttm.c   |  1 -
 include/drm/ttm/ttm_bo_api.h   | 11 ---
 12 files changed, 19 insertions(+), 30 deletions(-)

-- 
2.7.4

[PATCH 00/12] drm: add check if io_mem_pfn is NULL and cleanup

2017-12-23 Thread Tan Xiaojun

I found an OOPS when I used the mainline kernel for graphical tests in Hisilicon
D05, I do not know how to solve this problem until I saw your discussion on this
issue a month ago:

https://lists.freedesktop.org/archives/dri-devel/2017-November/159046.html

And my problem can be solved perfectly by your solution.

This is important for me, I want to solve this problem as soon as possible. So
I follow the result of your discussion, make and send these patches below.

If anything is not good, please point it out, thanks.

Michal Srb (1):
  drm/ttm: Use ttm_bo_default_io_mem_pfn if io_mem_pfn is NULL

Tan Xiaojun (11):
  drm/ast: remove the default io_mem_pfn set
  drm/bochs: remove the default io_mem_pfn set
  drm/cirrus: remove the default io_mem_pfn set
  drm/mgag200: remove the default io_mem_pfn set
  drm/nouveau: remove the default io_mem_pfn set
  drm/qxl: remove the default io_mem_pfn set
  drm/radeon: remove the default io_mem_pfn set
  drm/virtio: remove the default io_mem_pfn set
  drm/vmwgfx: remove the default io_mem_pfn set
  staging: remove the default io_mem_pfn set
  drm/ttm: unexport ttm_bo_default_io_mem_pfn and make it static

 drivers/gpu/drm/ast/ast_ttm.c  |  1 -
 drivers/gpu/drm/bochs/bochs_mm.c   |  1 -
 drivers/gpu/drm/cirrus/cirrus_ttm.c|  1 -
 drivers/gpu/drm/mgag200/mgag200_ttm.c  |  1 -
 drivers/gpu/drm/nouveau/nouveau_bo.c   |  1 -
 drivers/gpu/drm/qxl/qxl_ttm.c  |  1 -
 drivers/gpu/drm/radeon/radeon_ttm.c|  1 -
 drivers/gpu/drm/ttm/ttm_bo_vm.c| 28 +++-
 drivers/gpu/drm/virtio/virtgpu_ttm.c   |  1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c |  1 -
 drivers/staging/vboxvideo/vbox_ttm.c   |  1 -
 include/drm/ttm/ttm_bo_api.h   | 11 ---
 12 files changed, 19 insertions(+), 30 deletions(-)

-- 
2.7.4

[PATCH 05/12] drm/mgag200: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/mgag200/mgag200_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/mgag200/mgag200_ttm.c 
b/drivers/gpu/drm/mgag200/mgag200_ttm.c
index 3e7e1cd..89b550f 100644
--- a/drivers/gpu/drm/mgag200/mgag200_ttm.c
+++ b/drivers/gpu/drm/mgag200/mgag200_ttm.c
@@ -237,7 +237,6 @@ struct ttm_bo_driver mgag200_bo_driver = {
.verify_access = mgag200_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int mgag200_mm_init(struct mga_device *mdev)
-- 
2.7.4

[PATCH 09/12] drm/virtio: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/virtio/virtgpu_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ttm.c 
b/drivers/gpu/drm/virtio/virtgpu_ttm.c
index cd389c5..4a12434 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ttm.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ttm.c
@@ -431,7 +431,6 @@ static struct ttm_bo_driver virtio_gpu_bo_driver = {
.verify_access = _gpu_verify_access,
.io_mem_reserve = _gpu_ttm_io_mem_reserve,
.io_mem_free = _gpu_ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
.move_notify = _gpu_bo_move_notify,
.swap_notify = _gpu_bo_swap_notify,
 };
-- 
2.7.4

[PATCH 05/12] drm/mgag200: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/mgag200/mgag200_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/mgag200/mgag200_ttm.c 
b/drivers/gpu/drm/mgag200/mgag200_ttm.c
index 3e7e1cd..89b550f 100644
--- a/drivers/gpu/drm/mgag200/mgag200_ttm.c
+++ b/drivers/gpu/drm/mgag200/mgag200_ttm.c
@@ -237,7 +237,6 @@ struct ttm_bo_driver mgag200_bo_driver = {
.verify_access = mgag200_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int mgag200_mm_init(struct mga_device *mdev)
-- 
2.7.4

[PATCH 09/12] drm/virtio: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/virtio/virtgpu_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_ttm.c 
b/drivers/gpu/drm/virtio/virtgpu_ttm.c
index cd389c5..4a12434 100644
--- a/drivers/gpu/drm/virtio/virtgpu_ttm.c
+++ b/drivers/gpu/drm/virtio/virtgpu_ttm.c
@@ -431,7 +431,6 @@ static struct ttm_bo_driver virtio_gpu_bo_driver = {
.verify_access = _gpu_verify_access,
.io_mem_reserve = _gpu_ttm_io_mem_reserve,
.io_mem_free = _gpu_ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
.move_notify = _gpu_bo_move_notify,
.swap_notify = _gpu_bo_swap_notify,
 };
-- 
2.7.4

[PATCH 04/12] drm/cirrus: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/cirrus/cirrus_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c 
b/drivers/gpu/drm/cirrus/cirrus_ttm.c
index 1ff1838..2c652af 100644
--- a/drivers/gpu/drm/cirrus/cirrus_ttm.c
+++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c
@@ -237,7 +237,6 @@ struct ttm_bo_driver cirrus_bo_driver = {
.verify_access = cirrus_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int cirrus_mm_init(struct cirrus_device *cirrus)
-- 
2.7.4

[PATCH 04/12] drm/cirrus: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/cirrus/cirrus_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/cirrus/cirrus_ttm.c 
b/drivers/gpu/drm/cirrus/cirrus_ttm.c
index 1ff1838..2c652af 100644
--- a/drivers/gpu/drm/cirrus/cirrus_ttm.c
+++ b/drivers/gpu/drm/cirrus/cirrus_ttm.c
@@ -237,7 +237,6 @@ struct ttm_bo_driver cirrus_bo_driver = {
.verify_access = cirrus_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int cirrus_mm_init(struct cirrus_device *cirrus)
-- 
2.7.4

[PATCH 02/12] drm/ast: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/ast/ast_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
index 696a15d..fdd521d 100644
--- a/drivers/gpu/drm/ast/ast_ttm.c
+++ b/drivers/gpu/drm/ast/ast_ttm.c
@@ -237,7 +237,6 @@ struct ttm_bo_driver ast_bo_driver = {
.verify_access = ast_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int ast_mm_init(struct ast_private *ast)
-- 
2.7.4

[PATCH 02/12] drm/ast: remove the default io_mem_pfn set

2017-12-23 Thread Tan Xiaojun

The default interface situation has been taken into the framework, so
remove the default set of each module.

Signed-off-by: Tan Xiaojun 
---
 drivers/gpu/drm/ast/ast_ttm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/ast/ast_ttm.c b/drivers/gpu/drm/ast/ast_ttm.c
index 696a15d..fdd521d 100644
--- a/drivers/gpu/drm/ast/ast_ttm.c
+++ b/drivers/gpu/drm/ast/ast_ttm.c
@@ -237,7 +237,6 @@ struct ttm_bo_driver ast_bo_driver = {
.verify_access = ast_bo_verify_access,
.io_mem_reserve = _ttm_io_mem_reserve,
.io_mem_free = _ttm_io_mem_free,
-   .io_mem_pfn = ttm_bo_default_io_mem_pfn,
 };
 
 int ast_mm_init(struct ast_private *ast)
-- 
2.7.4

[PATCH v3] ARM: sun8i: h2+: add support for Banana Pi M2 Zero board

2017-12-23 Thread Icenowy Zheng

Banana Pi M2 Zero board is a H2+-based board by Sinovoip, with a form
factor and GPIO holes similar to Raspberry Pi Zero.

It features:
- Allwinner H2+ SoC
- Single-chip (16-bit) 512MiB DDR3 DRAM
- Ampak AP6212 Wi-Fi/Bluetooth module
- MicroSD slot
- Two MicroUSB Type-B ports (one can only be used to power the board and
  the other features OTG functionality)
- Two keys, a reset and a GPIO-connected key.
- HDMI Type-C (miniHDMI) connector connected to the HDMI part of H2+.
- CSI connector to connect the camera sensor provided by Sinovoip.

Signed-off-by: Icenowy Zheng 
---
Changes in v3:
- Add comments about Vbus problem in  node.

Changes in v2:
- Use high active SD card detect on the production batch.

 arch/arm/boot/dts/Makefile |   1 +
 .../boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts| 162 +
 2 files changed, 163 insertions(+)
 create mode 100644 arch/arm/boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 0bb8db33704a..937a8768671f 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -962,6 +962,7 @@ dtb-$(CONFIG_MACH_SUN8I) += \
sun8i-a83t-cubietruck-plus.dtb \
sun8i-a83t-tbs-a711.dtb \
sun8i-h2-plus-orangepi-r1.dtb \
+   sun8i-h2-plus-bananapi-m2-zero.dtb \
sun8i-h2-plus-orangepi-zero.dtb \
sun8i-h3-bananapi-m2-plus.dtb \
sun8i-h3-beelink-x2.dtb \
diff --git a/arch/arm/boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts 
b/arch/arm/boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts
new file mode 100644
index ..5bc182ddc5f7
--- /dev/null
+++ b/arch/arm/boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts
@@ -0,0 +1,162 @@
+/*
+ * Copyright (C) 2017 Icenowy Zheng 
+ *
+ * Based on sun8i-h3-bananapi-m2-plus.dts, which is:
+ *   Copyright (C) 2016 Chen-Yu Tsai 
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include "sun8i-h3.dtsi"
+#include "sunxi-common-regulators.dtsi"
+
+#include 
+#include 
+
+/ {
+   model = "Banana Pi BPI-M2-Zero";
+   compatible = "sinovoip,bpi-m2-zero", "allwinner,sun8i-h2-plus";
+
+   aliases {
+   serial0 = 
+   serial1 = 
+   };
+
+   chosen {
+   stdout-path = "serial0:115200n8";
+   };
+
+   leds {
+   compatible = "gpio-leds";
+   pinctrl-names = "default";
+
+   pwr_led {
+   label = "bananapi-m2-zero:red:pwr";
+   gpios = <_pio 0 10 GPIO_ACTIVE_HIGH>; /* PL10 */
+   default-state = "on";
+   };
+   };
+
+   gpio_keys {
+   compatible = "gpio-keys";
+   pinctrl-names = "default";
+
+   sw4 {
+   label = "power";
+   linux,code = ;
+   gpios = <_pio 0 3 GPIO_ACTIVE_LOW>;
+   };
+   };
+
+   wifi_pwrseq: wifi_pwrseq {
+   compatible =

[PATCH v3] ARM: sun8i: h2+: add support for Banana Pi M2 Zero board

2017-12-23 Thread Icenowy Zheng

Banana Pi M2 Zero board is a H2+-based board by Sinovoip, with a form
factor and GPIO holes similar to Raspberry Pi Zero.

It features:
- Allwinner H2+ SoC
- Single-chip (16-bit) 512MiB DDR3 DRAM
- Ampak AP6212 Wi-Fi/Bluetooth module
- MicroSD slot
- Two MicroUSB Type-B ports (one can only be used to power the board and
  the other features OTG functionality)
- Two keys, a reset and a GPIO-connected key.
- HDMI Type-C (miniHDMI) connector connected to the HDMI part of H2+.
- CSI connector to connect the camera sensor provided by Sinovoip.

Signed-off-by: Icenowy Zheng 
---
Changes in v3:
- Add comments about Vbus problem in  node.

Changes in v2:
- Use high active SD card detect on the production batch.

 arch/arm/boot/dts/Makefile |   1 +
 .../boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts| 162 +
 2 files changed, 163 insertions(+)
 create mode 100644 arch/arm/boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 0bb8db33704a..937a8768671f 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -962,6 +962,7 @@ dtb-$(CONFIG_MACH_SUN8I) += \
sun8i-a83t-cubietruck-plus.dtb \
sun8i-a83t-tbs-a711.dtb \
sun8i-h2-plus-orangepi-r1.dtb \
+   sun8i-h2-plus-bananapi-m2-zero.dtb \
sun8i-h2-plus-orangepi-zero.dtb \
sun8i-h3-bananapi-m2-plus.dtb \
sun8i-h3-beelink-x2.dtb \
diff --git a/arch/arm/boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts 
b/arch/arm/boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts
new file mode 100644
index ..5bc182ddc5f7
--- /dev/null
+++ b/arch/arm/boot/dts/sun8i-h2-plus-bananapi-m2-zero.dts
@@ -0,0 +1,162 @@
+/*
+ * Copyright (C) 2017 Icenowy Zheng 
+ *
+ * Based on sun8i-h3-bananapi-m2-plus.dts, which is:
+ *   Copyright (C) 2016 Chen-Yu Tsai 
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This file is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of the
+ * License, or (at your option) any later version.
+ *
+ * This file is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ * obtaining a copy of this software and associated documentation
+ * files (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use,
+ * copy, modify, merge, publish, distribute, sublicense, and/or
+ * sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following
+ * conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+ * included in all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ * OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ * WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+#include "sun8i-h3.dtsi"
+#include "sunxi-common-regulators.dtsi"
+
+#include 
+#include 
+
+/ {
+   model = "Banana Pi BPI-M2-Zero";
+   compatible = "sinovoip,bpi-m2-zero", "allwinner,sun8i-h2-plus";
+
+   aliases {
+   serial0 = 
+   serial1 = 
+   };
+
+   chosen {
+   stdout-path = "serial0:115200n8";
+   };
+
+   leds {
+   compatible = "gpio-leds";
+   pinctrl-names = "default";
+
+   pwr_led {
+   label = "bananapi-m2-zero:red:pwr";
+   gpios = <_pio 0 10 GPIO_ACTIVE_HIGH>; /* PL10 */
+   default-state = "on";
+   };
+   };
+
+   gpio_keys {
+   compatible = "gpio-keys";
+   pinctrl-names = "default";
+
+   sw4 {
+   label = "power";
+   linux,code = ;
+   gpios = <_pio 0 3 GPIO_ACTIVE_LOW>;
+   };
+   };
+
+   wifi_pwrseq: wifi_pwrseq {
+   compatible = "mmc-pwrseq-simple";
+   pinctrl-names =

Re: [PATCH 11/11] evm: Don't update hmacs in user ns mounts

2017-12-23 Thread Mimi Zohar

Hi Serge,

On Fri, 2017-12-22 at 22:03 -0600, Serge E. Hallyn wrote:
> On Fri, Dec 22, 2017 at 03:32:35PM +0100, Dongsu Park wrote:
> > From: Seth Forshee 
> > 
> > The kernel should not calculate new hmacs for mounts done by
> > non-root users. Update evm_calc_hmac_or_hash() to refuse to
> > calculate new hmacs for mounts for non-init user namespaces.
> > 
> > Cc: linux-integr...@vger.kernel.org
> > Cc: linux-security-mod...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: James Morris 
> > Cc: Mimi Zohar 
> 
> Hi Mimi,
> 
> does this change seem sufficient to you?

I think this is the correct behavior in the context of fuse file
systems.  This patch, the "ima: define a new policy option named
force" patch, and an updated IMA policy should be upstreamed together.
 The cover letter should provide the motivation for these patches.

Mimi

> 
> > Cc: "Serge E. Hallyn" 
> > Signed-off-by: Seth Forshee 
> > Signed-off-by: Dongsu Park 
> > ---
> >  security/integrity/evm/evm_crypto.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/security/integrity/evm/evm_crypto.c 
> > b/security/integrity/evm/evm_crypto.c
> > index bcd64baf..729f4545 100644
> > --- a/security/integrity/evm/evm_crypto.c
> > +++ b/security/integrity/evm/evm_crypto.c
> > @@ -190,7 +190,8 @@ static int evm_calc_hmac_or_hash(struct dentry *dentry,
> > int error;
> > int size;
> >  
> > -   if (!(inode->i_opflags & IOP_XATTR))
> > +   if (!(inode->i_opflags & IOP_XATTR) ||
> > +   inode->i_sb->s_user_ns != _user_ns)
> > return -EOPNOTSUPP;
> >  
> > desc = init_desc(type);
> > -- 
> > 2.13.6
>

Re: [PATCH 11/11] evm: Don't update hmacs in user ns mounts

2017-12-23 Thread Mimi Zohar

Hi Serge,

On Fri, 2017-12-22 at 22:03 -0600, Serge E. Hallyn wrote:
> On Fri, Dec 22, 2017 at 03:32:35PM +0100, Dongsu Park wrote:
> > From: Seth Forshee 
> > 
> > The kernel should not calculate new hmacs for mounts done by
> > non-root users. Update evm_calc_hmac_or_hash() to refuse to
> > calculate new hmacs for mounts for non-init user namespaces.
> > 
> > Cc: linux-integr...@vger.kernel.org
> > Cc: linux-security-mod...@vger.kernel.org
> > Cc: linux-kernel@vger.kernel.org
> > Cc: James Morris 
> > Cc: Mimi Zohar 
> 
> Hi Mimi,
> 
> does this change seem sufficient to you?

I think this is the correct behavior in the context of fuse file
systems.  This patch, the "ima: define a new policy option named
force" patch, and an updated IMA policy should be upstreamed together.
 The cover letter should provide the motivation for these patches.

Mimi

> 
> > Cc: "Serge E. Hallyn" 
> > Signed-off-by: Seth Forshee 
> > Signed-off-by: Dongsu Park 
> > ---
> >  security/integrity/evm/evm_crypto.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/security/integrity/evm/evm_crypto.c 
> > b/security/integrity/evm/evm_crypto.c
> > index bcd64baf..729f4545 100644
> > --- a/security/integrity/evm/evm_crypto.c
> > +++ b/security/integrity/evm/evm_crypto.c
> > @@ -190,7 +190,8 @@ static int evm_calc_hmac_or_hash(struct dentry *dentry,
> > int error;
> > int size;
> >  
> > -   if (!(inode->i_opflags & IOP_XATTR))
> > +   if (!(inode->i_opflags & IOP_XATTR) ||
> > +   inode->i_sb->s_user_ns != _user_ns)
> > return -EOPNOTSUPP;
> >  
> > desc = init_desc(type);
> > -- 
> > 2.13.6
>

Re: [PATCH v11 2/6] mailbox: qcom: Create APCS child device for clock controller

2017-12-23 Thread Bjorn Andersson

On Fri 22 Dec 20:57 PST 2017, Jassi Brar wrote:

> On Tue, Dec 5, 2017 at 9:16 PM, Georgi Djakov  
> wrote:
> > There is a clock controller functionality provided by the APCS hardware
> > block of msm8916 devices. The device-tree would represent an APCS node
> > with both mailbox and clock provider properties.
> >
> The spec might depict a 'clock' box and 'mailbox' box inside the
> bigger APCS box. However, from the code I see in this patchset, they
> are orthogonal and can & should be represented as independent DT
> nodes.

The APCS consists of a number of different hardware blocks, one of them
being the "APCS global" block, which is what this node and drivers
relate to. On 8916 this contains both the IPC register and clock
control. But it's still just one block according to the hardware
specification.

As such DT should describe the one hardware block by one node IMHO. The
fact that we in Linux split this into two different drivers is an
unrelated implementation detail.

Regards,
Bjorn

Re: [PATCH v11 2/6] mailbox: qcom: Create APCS child device for clock controller

2017-12-23 Thread Bjorn Andersson

On Fri 22 Dec 20:57 PST 2017, Jassi Brar wrote:

> On Tue, Dec 5, 2017 at 9:16 PM, Georgi Djakov  
> wrote:
> > There is a clock controller functionality provided by the APCS hardware
> > block of msm8916 devices. The device-tree would represent an APCS node
> > with both mailbox and clock provider properties.
> >
> The spec might depict a 'clock' box and 'mailbox' box inside the
> bigger APCS box. However, from the code I see in this patchset, they
> are orthogonal and can & should be represented as independent DT
> nodes.

The APCS consists of a number of different hardware blocks, one of them
being the "APCS global" block, which is what this node and drivers
relate to. On 8916 this contains both the IPC register and clock
control. But it's still just one block according to the hardware
specification.

As such DT should describe the one hardware block by one node IMHO. The
fact that we in Linux split this into two different drivers is an
unrelated implementation detail.

Regards,
Bjorn

Linux 4.15-rc5

2017-12-23 Thread Linus Torvalds

Ok, so it's not Sunday yet, but tomorrow is Christmas Eve, and while
I've been in the US for over two decades, we still celebrate Christmas
the only _right_ way - with Christmas Eve being the big day, and
Christmas Day being just for recovery.

So I'm doing the rc5 release a day early, in order to not have to do
it during the actual Christmas festivities at our house.

And it's not like I expect to see a lot of patches or pull requests
tomorrow anyway, so I guess it doesn't really much matter when I do
the rc release, the end result would look very similar even if I had
done it on my normal Sunday schedule.

This (shortened) week ended up being fairly normal for rc5, with the
exception of the ongoing merging of the x86 low-level prep for kernel
page table isolation that continues and is noticeable. In fact, about
a third of the rc5 patch is x86 updates due to that. It all looks like
very good cleanups, though, and it's been through about two hundred
iterations by now (no, seriously, Thomas has been keeping track of his
iterative updates of the PTI series, and it apparently hit 196 in the
last two months).

Outside of that, things look fairly normal: mostly drivers and some
networking updates (bpf fixes and selftests are notable). Some xfs
fixes also stand out.

The appended shortlog gives an overview of the details,

  Linus

---

Adiel Aloni (1):
  mac80211_hwsim: enable TODS BIT in null data frame

Albert Hsieh (1):
  mtd: nand: brcmnand: Zero bitflip is not an error

Alexander Kochetkov (2):
  net: arc_emac: fix arc_emac_rx() error paths
  net: arc_emac: restart stalled EMAC

Alexei Starovoitov (2):
  bpf: fix integer overflows
  bpf: do not allow root to mangle valid pointers

Alexey Khoroshilov (1):
  net: phy: xgene: disable clk on error paths

Alexey Kodanev (1):
  vxlan: restore dev->mtu setting based on lower device

Andy Lutomirski (24):
  x86/unwinder/orc: Dont bail on stack overflow
  x86/irq: Remove an old outdated comment about context tracking races
  x86/irq/64: Print the offending IP in the stack overflow warning
  x86/entry/64: Allocate and enable the SYSENTER stack
  x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
  x86/entry/gdt: Put per-CPU GDT remaps in ascending order
  x86/mm/fixmap: Generalize the GDT fixmap mechanism, introduce
struct cpu_entry_area
  x86/kasan/64: Teach KASAN about the cpu_entry_area
  x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  x86/dumpstack: Handle stack overflow on all stacks
  x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct
  x86/entry: Remap the TSS into the CPU entry area
  x86/entry/64: Separate cpu_current_top_of_stack from TSS.sp0
  x86/espfix/64: Stop assuming that pt_regs is on the entry stack
  x86/entry/64: Use a per-CPU trampoline stack for IDT entries
  x86/entry/64: Return to userspace from the trampoline stack
  x86/entry/64: Create a per-CPU SYSCALL entry trampoline
  x86/entry/64: Move the IST stacks into struct cpu_entry_area
  x86/entry/64: Remove the SYSENTER stack canary
  x86/entry: Clean up the SYSENTER_stack code
  x86/entry/64: Make cpu_entry_area.tss read-only
  x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
  x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
  x86/mm/64: Improve the memory map documentation

Anju T Sudhakar (2):
  powerpc/perf/imc: Fix nest-imc cpuhotplug callback failure
  powerpc/perf: Fix kfree memory allocated for nest pmus

Bart Van Assche (1):
  scsi: core: Use blist_flags_t consistently

Ben Skeggs (6):
  drm/nouveau/bios/dp: support DP Info Table 2.0
  drm/nouveau/imem/nv50: fix refcount_t warning
  drm/nouveau/mmu/gp10b: use correct implementation
  drm/nouveau: avoid GPU page sizes > PAGE_SIZE for buffer objects
in host memory
  drm/nouveau: use alternate memory type for system-memory buffers
with kind != 0
  drm/nouveau: fix obvious memory leak

Bhawanpreet Lakha (1):
  drm/amd/display: add pipe locking before front end programing

Boris Ostrovsky (2):
  x86/entry/64/paravirt: Use paravirt-safe macro to access eflags
  xen/balloon: Mark unallocated host memory as UNUSABLE

Brendan McGrath (1):
  ipv6: icmp6: Allow icmp messages to be looped back

Brian King (1):
  tg3: Fix rx hang on MTU change with 5717/5719

Cai Li (1):
  clk: fix a panic error caused by accessing NULL pointer

Chen-Yu Tsai (1):
  clk: sunxi: sun9i-mmc: Implement reset callback for reset controls

Chris Wilson (6):
  drm/i915: Flush pending GTT writes before unbinding
  drm/i915: Drop fb reference on load_detect_pipe failure path
  drm/i915: Stop listening to request resubmission from the signaler kthread
  drm/i915/fence: Use rcu to defer freeing of irq_work
  drm/i915/lpe: Remove double-encapsulation of

Linux 4.15-rc5

2017-12-23 Thread Linus Torvalds

Ok, so it's not Sunday yet, but tomorrow is Christmas Eve, and while
I've been in the US for over two decades, we still celebrate Christmas
the only _right_ way - with Christmas Eve being the big day, and
Christmas Day being just for recovery.

So I'm doing the rc5 release a day early, in order to not have to do
it during the actual Christmas festivities at our house.

And it's not like I expect to see a lot of patches or pull requests
tomorrow anyway, so I guess it doesn't really much matter when I do
the rc release, the end result would look very similar even if I had
done it on my normal Sunday schedule.

This (shortened) week ended up being fairly normal for rc5, with the
exception of the ongoing merging of the x86 low-level prep for kernel
page table isolation that continues and is noticeable. In fact, about
a third of the rc5 patch is x86 updates due to that. It all looks like
very good cleanups, though, and it's been through about two hundred
iterations by now (no, seriously, Thomas has been keeping track of his
iterative updates of the PTI series, and it apparently hit 196 in the
last two months).

Outside of that, things look fairly normal: mostly drivers and some
networking updates (bpf fixes and selftests are notable). Some xfs
fixes also stand out.

The appended shortlog gives an overview of the details,

  Linus

---

Adiel Aloni (1):
  mac80211_hwsim: enable TODS BIT in null data frame

Albert Hsieh (1):
  mtd: nand: brcmnand: Zero bitflip is not an error

Alexander Kochetkov (2):
  net: arc_emac: fix arc_emac_rx() error paths
  net: arc_emac: restart stalled EMAC

Alexei Starovoitov (2):
  bpf: fix integer overflows
  bpf: do not allow root to mangle valid pointers

Alexey Khoroshilov (1):
  net: phy: xgene: disable clk on error paths

Alexey Kodanev (1):
  vxlan: restore dev->mtu setting based on lower device

Andy Lutomirski (24):
  x86/unwinder/orc: Dont bail on stack overflow
  x86/irq: Remove an old outdated comment about context tracking races
  x86/irq/64: Print the offending IP in the stack overflow warning
  x86/entry/64: Allocate and enable the SYSENTER stack
  x86/dumpstack: Add get_stack_info() support for the SYSENTER stack
  x86/entry/gdt: Put per-CPU GDT remaps in ascending order
  x86/mm/fixmap: Generalize the GDT fixmap mechanism, introduce
struct cpu_entry_area
  x86/kasan/64: Teach KASAN about the cpu_entry_area
  x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss
  x86/dumpstack: Handle stack overflow on all stacks
  x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct
  x86/entry: Remap the TSS into the CPU entry area
  x86/entry/64: Separate cpu_current_top_of_stack from TSS.sp0
  x86/espfix/64: Stop assuming that pt_regs is on the entry stack
  x86/entry/64: Use a per-CPU trampoline stack for IDT entries
  x86/entry/64: Return to userspace from the trampoline stack
  x86/entry/64: Create a per-CPU SYSCALL entry trampoline
  x86/entry/64: Move the IST stacks into struct cpu_entry_area
  x86/entry/64: Remove the SYSENTER stack canary
  x86/entry: Clean up the SYSENTER_stack code
  x86/entry/64: Make cpu_entry_area.tss read-only
  x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy
  x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
  x86/mm/64: Improve the memory map documentation

Anju T Sudhakar (2):
  powerpc/perf/imc: Fix nest-imc cpuhotplug callback failure
  powerpc/perf: Fix kfree memory allocated for nest pmus

Bart Van Assche (1):
  scsi: core: Use blist_flags_t consistently

Ben Skeggs (6):
  drm/nouveau/bios/dp: support DP Info Table 2.0
  drm/nouveau/imem/nv50: fix refcount_t warning
  drm/nouveau/mmu/gp10b: use correct implementation
  drm/nouveau: avoid GPU page sizes > PAGE_SIZE for buffer objects
in host memory
  drm/nouveau: use alternate memory type for system-memory buffers
with kind != 0
  drm/nouveau: fix obvious memory leak

Bhawanpreet Lakha (1):
  drm/amd/display: add pipe locking before front end programing

Boris Ostrovsky (2):
  x86/entry/64/paravirt: Use paravirt-safe macro to access eflags
  xen/balloon: Mark unallocated host memory as UNUSABLE

Brendan McGrath (1):
  ipv6: icmp6: Allow icmp messages to be looped back

Brian King (1):
  tg3: Fix rx hang on MTU change with 5717/5719

Cai Li (1):
  clk: fix a panic error caused by accessing NULL pointer

Chen-Yu Tsai (1):
  clk: sunxi: sun9i-mmc: Implement reset callback for reset controls

Chris Wilson (6):
  drm/i915: Flush pending GTT writes before unbinding
  drm/i915: Drop fb reference on load_detect_pipe failure path
  drm/i915: Stop listening to request resubmission from the signaler kthread
  drm/i915/fence: Use rcu to defer freeing of irq_work
  drm/i915/lpe: Remove double-encapsulation of

Re: [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-12-23 Thread Tetsuo Handa

Matthew Wilcox wrote:
> > +   unsigned long pfn = page_to_pfn(page);
> > +   int ret;
> > +
> > +   *pfn_min = min(pfn, *pfn_min);
> > +   *pfn_max = max(pfn, *pfn_max);
> > +
> > +   do {
> > +   if (xb_preload(GFP_NOWAIT | __GFP_NOWARN) < 0)
> > +   return -ENOMEM;
> > +
> > +   ret = xb_set_bit(>page_xb, pfn);
> > +   xb_preload_end();
> > +   } while (unlikely(ret == -EAGAIN));
> 
> OK, so you don't need a spinlock because you're under a mutex?  But you
> can't allocate memory because you're in the balloon driver, and so a
> GFP_KERNEL allocation might recurse into your driver?

Right. We can't (directly or indirectly) depend on __GFP_DIRECT_RECLAIM && 
!__GFP_NORETRY
allocations because the balloon driver needs to handle OOM notifier callback.

>Would GFP_NOIO
> do the job?  I'm a little hazy on exactly how the balloon driver works.

GFP_NOIO implies __GFP_DIRECT_RECLAIM. In the worst case, it can lockup due to
the too small to fail memory allocation rule. GFP_NOIO | __GFP_NORETRY would 
work
if there is really a guarantee that GFP_NOIO | __GFP_NORETRY never depend on
__GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations, which is too subtle for me 
to
validate. The direct reclaim dependency is too complicated to validate.
I consider that !__GFP_DIRECT_RECLAIM is the future-safe choice.

> 
> If you can't preload with anything better than that, I think that
> xb_set_bit() should attempt an allocation with GFP_NOWAIT | __GFP_NOWARN,
> and then you can skip the preload; it has no value for you.

Yes, that's why I suggest directly using kzalloc() at xb_set_bit().

> 
> > @@ -173,8 +292,15 @@ static unsigned fill_balloon(struct virtio_balloon 
> > *vb, size_t num)
> >  
> > while ((page = balloon_page_pop())) {
> > balloon_page_enqueue(>vb_dev_info, page);
> > +   if (use_sg) {
> > +   if (xb_set_page(vb, page, _min, _max) < 0) {
> > +   __free_page(page);
> > +   continue;
> > +   }
> > +   } else {
> > +   set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> > +   }
> 
> Is this the right behaviour?

I don't think so. In the worst case, we can set no bit using xb_set_page().

>   If we can't record the page in the xb,
> wouldn't we rather send it across as a single page?
> 

I think that we need to be able to fallback to !use_sg path when OOM.

Re: [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-12-23 Thread Tetsuo Handa

Matthew Wilcox wrote:
> > +   unsigned long pfn = page_to_pfn(page);
> > +   int ret;
> > +
> > +   *pfn_min = min(pfn, *pfn_min);
> > +   *pfn_max = max(pfn, *pfn_max);
> > +
> > +   do {
> > +   if (xb_preload(GFP_NOWAIT | __GFP_NOWARN) < 0)
> > +   return -ENOMEM;
> > +
> > +   ret = xb_set_bit(>page_xb, pfn);
> > +   xb_preload_end();
> > +   } while (unlikely(ret == -EAGAIN));
> 
> OK, so you don't need a spinlock because you're under a mutex?  But you
> can't allocate memory because you're in the balloon driver, and so a
> GFP_KERNEL allocation might recurse into your driver?

Right. We can't (directly or indirectly) depend on __GFP_DIRECT_RECLAIM && 
!__GFP_NORETRY
allocations because the balloon driver needs to handle OOM notifier callback.

>Would GFP_NOIO
> do the job?  I'm a little hazy on exactly how the balloon driver works.

GFP_NOIO implies __GFP_DIRECT_RECLAIM. In the worst case, it can lockup due to
the too small to fail memory allocation rule. GFP_NOIO | __GFP_NORETRY would 
work
if there is really a guarantee that GFP_NOIO | __GFP_NORETRY never depend on
__GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations, which is too subtle for me 
to
validate. The direct reclaim dependency is too complicated to validate.
I consider that !__GFP_DIRECT_RECLAIM is the future-safe choice.

> 
> If you can't preload with anything better than that, I think that
> xb_set_bit() should attempt an allocation with GFP_NOWAIT | __GFP_NOWARN,
> and then you can skip the preload; it has no value for you.

Yes, that's why I suggest directly using kzalloc() at xb_set_bit().

> 
> > @@ -173,8 +292,15 @@ static unsigned fill_balloon(struct virtio_balloon 
> > *vb, size_t num)
> >  
> > while ((page = balloon_page_pop())) {
> > balloon_page_enqueue(>vb_dev_info, page);
> > +   if (use_sg) {
> > +   if (xb_set_page(vb, page, _min, _max) < 0) {
> > +   __free_page(page);
> > +   continue;
> > +   }
> > +   } else {
> > +   set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> > +   }
> 
> Is this the right behaviour?

I don't think so. In the worst case, we can set no bit using xb_set_page().

>   If we can't record the page in the xb,
> wouldn't we rather send it across as a single page?
> 

I think that we need to be able to fallback to !use_sg path when OOM.

[PATCH 1/3] mtd: spi-nor: add optional DMA-safe bounce buffer for data transfer

2017-12-23 Thread Cyrille Pitchen

This patch has two purposes:

1 - To fix the compatible issue between the MTD and SPI sub-systems

The MTD sub-system has no particular requirement about the memory areas it
uses. Especially, ubifs is well known for using vmalloc'ed buffers, which
then are not DMA-safe. There are reasons behind that, so we have to deal
with it.

On the other hand, the SPI sub-system clearly states in the kernel doc for
'struct spi-transfer' (include/linux/spi/spi.h) that both .tx_buf and
.rx_buf must point into "dma-safe memory". This requirement has not been
taken into account by the m25p80.c driver, at the border between MTD and
SPI, for a long time now. So it's high time to fix this issue.

2 - To help other SPI flash controller drivers to perform DMA transfers

Those controller drivers suffer the same issue as those behind the
m25p80.c driver in the SPI sub-system: They may be provided by the MTD
sub-system with buffers not suited to DMA operations. We want to avoid
each controller to implement its own bounce buffer so we offer them some
optional bounce buffer, allocated and managed by the spi-nor framework
itself, as an helper to add support to DMA transfers.

Then the patch adds two hardware capabilities for SPI flash controllers,
SNOR_HWCAPS_WR_BOUNCE and SNOR_HWCAPS_RD_BOUNCE.

SPI flash controller drivers are supposed to use them to request the
spi-nor framework to allocate an optional bounce buffer in some
DMA-safe memory area then to use it in some cases during (Fast) Read
and/or Page Program operations.

More precisely, the bounce buffer is used if and only if two conditions
are met:
1 - The SPI flash controller driver has declared the
SNOR_HWCAPS_RD_BOUNCE, resp. SNOR_HWCAPS_WR_BOUNCE for (Fast) Read,
resp. Page Program operations.
2 - The buffer provided by the above MTD layer is not already in a
DMA-safe area.

This policy avoid using the bounce buffer when not explicitly requested
or when not needed, hence limiting the performance penalty.

Besides, the bounce buffer is allocated once for all at the very first
time it is actually needed. This means that as long as all buffers
provided by the above MTD layer are allocated in some DMA-safe areas, the
bounce buffer itself is never allocated.

Finally, the bounce buffer size is limited to 256KiB, the currently known
maximum erase sector size. This tradeoff should still provide good
performances even if new memory parts come with even larger erase sector
sizes, limiting the memory footprint at the same time.

Signed-off-by: Cyrille Pitchen 
---
 drivers/mtd/devices/m25p80.c  |   4 +-
 drivers/mtd/spi-nor/spi-nor.c | 136 +++---
 include/linux/mtd/spi-nor.h   |   8 +++
 3 files changed, 139 insertions(+), 9 deletions(-)

diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c
index a4e18f6aaa33..60878c62a654 100644
--- a/drivers/mtd/devices/m25p80.c
+++ b/drivers/mtd/devices/m25p80.c
@@ -239,7 +239,9 @@ static int m25p_probe(struct spi_device *spi)
struct spi_nor_hwcaps hwcaps = {
.mask = SNOR_HWCAPS_READ |
SNOR_HWCAPS_READ_FAST |
-   SNOR_HWCAPS_PP,
+   SNOR_HWCAPS_PP |
+   SNOR_HWCAPS_RD_BOUNCE |
+   SNOR_HWCAPS_WR_BOUNCE,
};
char *flash_name;
int ret;
diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index 8bafd462f0ae..59f9fbd45234 100644
--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -14,8 +14,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1232,6 +1234,56 @@ static const struct flash_info spi_nor_ids[] = {
{ },
 };
 
+static bool spi_nor_is_dma_safe(const void *buf)
+{
+   if (is_vmalloc_addr(buf))
+   return false;
+
+#ifdef CONFIG_HIGHMEM
+   if ((unsigned long)buf >= PKMAP_BASE &&
+   (unsigned long)buf < (PKMAP_BASE + (LAST_PKMAP * PAGE_SIZE)))
+   return false;
+#endif
+
+   return true;
+}
+
+static int spi_nor_get_bounce_buffer(struct spi_nor *nor,
+u_char **buffer,
+size_t *buffer_size)
+{
+   const struct flash_info *info = nor->info;
+   /*
+* Limit the size of the bounce buffer to 256KB: this is currently
+* the largest known erase sector size (> page size) and should be
+* enough to still reach good performances if some day new memory
+* parts use even larger erase sector sizes.
+*/
+   size_t size = min_t(size_t, info->sector_size, SZ_256K);
+
+   /*
+* Allocate the bounce buffer once for all at the first time it is
+* actually needed. This prevents wasting some precious memory
+* in cases where it would never be needed.
+*/
+   if (unlikely(!nor->bounce_buffer)) {
+

[PATCH 1/3] mtd: spi-nor: add optional DMA-safe bounce buffer for data transfer

2017-12-23 Thread Cyrille Pitchen

This patch has two purposes:

1 - To fix the compatible issue between the MTD and SPI sub-systems

The MTD sub-system has no particular requirement about the memory areas it
uses. Especially, ubifs is well known for using vmalloc'ed buffers, which
then are not DMA-safe. There are reasons behind that, so we have to deal
with it.

On the other hand, the SPI sub-system clearly states in the kernel doc for
'struct spi-transfer' (include/linux/spi/spi.h) that both .tx_buf and
.rx_buf must point into "dma-safe memory". This requirement has not been
taken into account by the m25p80.c driver, at the border between MTD and
SPI, for a long time now. So it's high time to fix this issue.

2 - To help other SPI flash controller drivers to perform DMA transfers

Those controller drivers suffer the same issue as those behind the
m25p80.c driver in the SPI sub-system: They may be provided by the MTD
sub-system with buffers not suited to DMA operations. We want to avoid
each controller to implement its own bounce buffer so we offer them some
optional bounce buffer, allocated and managed by the spi-nor framework
itself, as an helper to add support to DMA transfers.

Then the patch adds two hardware capabilities for SPI flash controllers,
SNOR_HWCAPS_WR_BOUNCE and SNOR_HWCAPS_RD_BOUNCE.

SPI flash controller drivers are supposed to use them to request the
spi-nor framework to allocate an optional bounce buffer in some
DMA-safe memory area then to use it in some cases during (Fast) Read
and/or Page Program operations.

More precisely, the bounce buffer is used if and only if two conditions
are met:
1 - The SPI flash controller driver has declared the
SNOR_HWCAPS_RD_BOUNCE, resp. SNOR_HWCAPS_WR_BOUNCE for (Fast) Read,
resp. Page Program operations.
2 - The buffer provided by the above MTD layer is not already in a
DMA-safe area.

This policy avoid using the bounce buffer when not explicitly requested
or when not needed, hence limiting the performance penalty.

Besides, the bounce buffer is allocated once for all at the very first
time it is actually needed. This means that as long as all buffers
provided by the above MTD layer are allocated in some DMA-safe areas, the
bounce buffer itself is never allocated.

Finally, the bounce buffer size is limited to 256KiB, the currently known
maximum erase sector size. This tradeoff should still provide good
performances even if new memory parts come with even larger erase sector
sizes, limiting the memory footprint at the same time.

Signed-off-by: Cyrille Pitchen 
---
 drivers/mtd/devices/m25p80.c  |   4 +-
 drivers/mtd/spi-nor/spi-nor.c | 136 +++---
 include/linux/mtd/spi-nor.h   |   8 +++
 3 files changed, 139 insertions(+), 9 deletions(-)

diff --git a/drivers/mtd/devices/m25p80.c b/drivers/mtd/devices/m25p80.c
index a4e18f6aaa33..60878c62a654 100644
--- a/drivers/mtd/devices/m25p80.c
+++ b/drivers/mtd/devices/m25p80.c
@@ -239,7 +239,9 @@ static int m25p_probe(struct spi_device *spi)
struct spi_nor_hwcaps hwcaps = {
.mask = SNOR_HWCAPS_READ |
SNOR_HWCAPS_READ_FAST |
-   SNOR_HWCAPS_PP,
+   SNOR_HWCAPS_PP |
+   SNOR_HWCAPS_RD_BOUNCE |
+   SNOR_HWCAPS_WR_BOUNCE,
};
char *flash_name;
int ret;
diff --git a/drivers/mtd/spi-nor/spi-nor.c b/drivers/mtd/spi-nor/spi-nor.c
index 8bafd462f0ae..59f9fbd45234 100644
--- a/drivers/mtd/spi-nor/spi-nor.c
+++ b/drivers/mtd/spi-nor/spi-nor.c
@@ -14,8 +14,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1232,6 +1234,56 @@ static const struct flash_info spi_nor_ids[] = {
{ },
 };
 
+static bool spi_nor_is_dma_safe(const void *buf)
+{
+   if (is_vmalloc_addr(buf))
+   return false;
+
+#ifdef CONFIG_HIGHMEM
+   if ((unsigned long)buf >= PKMAP_BASE &&
+   (unsigned long)buf < (PKMAP_BASE + (LAST_PKMAP * PAGE_SIZE)))
+   return false;
+#endif
+
+   return true;
+}
+
+static int spi_nor_get_bounce_buffer(struct spi_nor *nor,
+u_char **buffer,
+size_t *buffer_size)
+{
+   const struct flash_info *info = nor->info;
+   /*
+* Limit the size of the bounce buffer to 256KB: this is currently
+* the largest known erase sector size (> page size) and should be
+* enough to still reach good performances if some day new memory
+* parts use even larger erase sector sizes.
+*/
+   size_t size = min_t(size_t, info->sector_size, SZ_256K);
+
+   /*
+* Allocate the bounce buffer once for all at the first time it is
+* actually needed. This prevents wasting some precious memory
+* in cases where it would never be needed.
+*/
+   if (unlikely(!nor->bounce_buffer)) {
+   nor->bounce_buffer =

[PATCH 2/3] dt-bindings: mtd: atmel-quadspi: add an optional property 'dmacap,memcpy'

2017-12-23 Thread Cyrille Pitchen

The optional 'dmacap,memcpy' DT property tells the Atmel QSPI controller
driver to reserve some DMA channel then to use it to perform DMA
memcpy() during data transfers. This feature relies on the generic
bounce buffer helper from spi-nor.c.

Signed-off-by: Cyrille Pitchen 
---
 Documentation/devicetree/bindings/mtd/atmel-quadspi.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/mtd/atmel-quadspi.txt 
b/Documentation/devicetree/bindings/mtd/atmel-quadspi.txt
index b93c1e2f25dd..002d3f0a445b 100644
--- a/Documentation/devicetree/bindings/mtd/atmel-quadspi.txt
+++ b/Documentation/devicetree/bindings/mtd/atmel-quadspi.txt
@@ -12,6 +12,10 @@ Required properties:
 - #address-cells: Should be <1>.
 - #size-cells:Should be <0>.
 
+Optional properties:
+- dmacap,memcpy:  Reserve a DMA channel to perform DMA memcpy() between the
+  system memory and the QSPI mapped memory.
+
 Example:
 
 spi@f002 {
@@ -24,6 +28,7 @@ spi@f002 {
#size-cells = <0>;
pinctrl-names = "default";
pinctrl-0 = <_spi0_default>;
+   dmacap,memcpy;
 
m25p80@0 {
...
-- 
2.11.0

[PATCH 2/3] dt-bindings: mtd: atmel-quadspi: add an optional property 'dmacap,memcpy'

2017-12-23 Thread Cyrille Pitchen

The optional 'dmacap,memcpy' DT property tells the Atmel QSPI controller
driver to reserve some DMA channel then to use it to perform DMA
memcpy() during data transfers. This feature relies on the generic
bounce buffer helper from spi-nor.c.

Signed-off-by: Cyrille Pitchen 
---
 Documentation/devicetree/bindings/mtd/atmel-quadspi.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/mtd/atmel-quadspi.txt 
b/Documentation/devicetree/bindings/mtd/atmel-quadspi.txt
index b93c1e2f25dd..002d3f0a445b 100644
--- a/Documentation/devicetree/bindings/mtd/atmel-quadspi.txt
+++ b/Documentation/devicetree/bindings/mtd/atmel-quadspi.txt
@@ -12,6 +12,10 @@ Required properties:
 - #address-cells: Should be <1>.
 - #size-cells:Should be <0>.
 
+Optional properties:
+- dmacap,memcpy:  Reserve a DMA channel to perform DMA memcpy() between the
+  system memory and the QSPI mapped memory.
+
 Example:
 
 spi@f002 {
@@ -24,6 +28,7 @@ spi@f002 {
#size-cells = <0>;
pinctrl-names = "default";
pinctrl-0 = <_spi0_default>;
+   dmacap,memcpy;
 
m25p80@0 {
...
-- 
2.11.0

[PATCH 3/3] mtd: atmel-quadspi: add support of DMA memcpy()

2017-12-23 Thread Cyrille Pitchen

This patch takes advantage of the new bounce buffer helper from the
spi-nor framework to add support of memcpy() operations using the DMA
controller.

Since the number of DMA channels is limited and to avoid changing how
those DMA channels are used in existing boards, this new DMA memcpy()
feature is enabled only if the "dmacap,memcpy" boolean property is set in
the device-tree.

Signed-off-by: Cyrille Pitchen 
---
 drivers/mtd/spi-nor/atmel-quadspi.c | 132 +++-
 1 file changed, 129 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/spi-nor/atmel-quadspi.c 
b/drivers/mtd/spi-nor/atmel-quadspi.c
index 6c5708bacad8..5443e4dba416 100644
--- a/drivers/mtd/spi-nor/atmel-quadspi.c
+++ b/drivers/mtd/spi-nor/atmel-quadspi.c
@@ -22,6 +22,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +38,8 @@
 #include 
 #include 
 
+#define QSPI_DMA_THRESHOLD 32
+
 /* QSPI register offsets */
 #define QSPI_CR  0x  /* Control Register */
 #define QSPI_MR  0x0004  /* Mode Register */
@@ -161,6 +165,11 @@ struct atmel_qspi {
struct spi_nor  nor;
u32 clk_rate;
struct completion   cmd_completion;
+
+   /* DMA transfers */
+   struct dma_chan *chan;
+   dma_addr_t  dma_mem;
+   struct completion   transfer_complete;
 };
 
 struct atmel_qspi_command {
@@ -197,11 +206,96 @@ static inline void qspi_writel(struct atmel_qspi *aq, u32 
reg, u32 value)
writel_relaxed(value, aq->regs + reg);
 }
 
+static void atmel_qspi_dma_callback(void *param)
+{
+   struct atmel_qspi *aq = param;
+
+   complete(>transfer_complete);
+}
+
+static int atmel_qspi_run_dma_transfer(struct atmel_qspi *aq,
+  const struct atmel_qspi_command *cmd)
+{
+   struct device *dev = >pdev->dev;
+   enum dma_data_direction dir;
+   struct dma_async_tx_descriptor *desc;
+   dma_addr_t src, dst, dma_addr;
+   dma_cookie_t cookie;
+   u32 offset;
+   void *ptr;
+   int err = 0;
+
+   if (cmd->tx_buf) {
+   ptr = (void *)cmd->tx_buf;
+   dir = DMA_TO_DEVICE;
+   } else if (cmd->rx_buf) {
+   ptr = cmd->rx_buf;
+   dir = DMA_FROM_DEVICE;
+   } else {
+   return -EINVAL;
+   }
+
+   dma_addr = dma_map_single(dev, ptr, cmd->buf_len, dir);
+   if (dma_mapping_error(dev, dma_addr))
+   return -ENOMEM;
+
+   offset = cmd->enable.bits.address ? cmd->address : 0;
+   if (cmd->tx_buf) {
+   dst = aq->dma_mem + offset;
+   src = dma_addr;
+   } else {
+   dst = dma_addr;
+   src = aq->dma_mem + offset;
+   }
+
+   desc = dmaengine_prep_dma_memcpy(aq->chan, dst, src, cmd->buf_len,
+DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
+   if (!desc) {
+   err = -ENOMEM;
+   goto unmap;
+   }
+
+   reinit_completion(>transfer_complete);
+   desc->callback = atmel_qspi_dma_callback;
+   desc->callback_param = aq;
+
+   cookie = dmaengine_submit(desc);
+   err = dma_submit_error(cookie);
+   if (err) {
+   err = -EIO;
+   goto unmap;
+   }
+
+   dma_async_issue_pending(aq->chan);
+   err = wait_for_completion_timeout(>transfer_complete,
+ msecs_to_jiffies(cmd->buf_len));
+   if (err <= 0) {
+   dmaengine_terminate_sync(aq->chan);
+   err = -ETIMEDOUT;
+   goto unmap;
+   }
+
+   err = 0;
+
+ unmap:
+   dma_unmap_single(dev, dma_addr, cmd->buf_len, dir);
+   return err;
+}
+
 static int atmel_qspi_run_transfer(struct atmel_qspi *aq,
   const struct atmel_qspi_command *cmd)
 {
void __iomem *ahb_mem;
 
+   /* Try DMA transfer first. */
+   if (aq->chan && cmd->buf_len >= QSPI_DMA_THRESHOLD) {
+   int err = atmel_qspi_run_dma_transfer(aq, cmd);
+
+   /* If the DMA transfer has started, stop here. */
+   if (err != -ENOMEM)
+   return err;
+   }
+
/* Then fallback to a PIO transfer (memcpy() DOES NOT work!) */
ahb_mem = aq->mem;
if (cmd->enable.bits.address)
@@ -604,7 +698,7 @@ static irqreturn_t atmel_qspi_interrupt(int irq, void 
*dev_id)
 
 static int atmel_qspi_probe(struct platform_device *pdev)
 {
-   const struct spi_nor_hwcaps hwcaps = {
+   struct spi_nor_hwcaps hwcaps = {
.mask = SNOR_HWCAPS_READ |
SNOR_HWCAPS_READ_FAST |
SNOR_HWCAPS_READ_1_1_2 |
@@ -657,19 +751,44 @@ static int atmel_qspi_probe(struct platform_device *pdev)
goto exit;
}
 
+   aq->dma_mem = (dma_addr_t)res->start;
+
+

[PATCH 3/3] mtd: atmel-quadspi: add support of DMA memcpy()

2017-12-23 Thread Cyrille Pitchen

This patch takes advantage of the new bounce buffer helper from the
spi-nor framework to add support of memcpy() operations using the DMA
controller.

Since the number of DMA channels is limited and to avoid changing how
those DMA channels are used in existing boards, this new DMA memcpy()
feature is enabled only if the "dmacap,memcpy" boolean property is set in
the device-tree.

Signed-off-by: Cyrille Pitchen 
---
 drivers/mtd/spi-nor/atmel-quadspi.c | 132 +++-
 1 file changed, 129 insertions(+), 3 deletions(-)

diff --git a/drivers/mtd/spi-nor/atmel-quadspi.c 
b/drivers/mtd/spi-nor/atmel-quadspi.c
index 6c5708bacad8..5443e4dba416 100644
--- a/drivers/mtd/spi-nor/atmel-quadspi.c
+++ b/drivers/mtd/spi-nor/atmel-quadspi.c
@@ -22,6 +22,8 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -36,6 +38,8 @@
 #include 
 #include 
 
+#define QSPI_DMA_THRESHOLD 32
+
 /* QSPI register offsets */
 #define QSPI_CR  0x  /* Control Register */
 #define QSPI_MR  0x0004  /* Mode Register */
@@ -161,6 +165,11 @@ struct atmel_qspi {
struct spi_nor  nor;
u32 clk_rate;
struct completion   cmd_completion;
+
+   /* DMA transfers */
+   struct dma_chan *chan;
+   dma_addr_t  dma_mem;
+   struct completion   transfer_complete;
 };
 
 struct atmel_qspi_command {
@@ -197,11 +206,96 @@ static inline void qspi_writel(struct atmel_qspi *aq, u32 
reg, u32 value)
writel_relaxed(value, aq->regs + reg);
 }
 
+static void atmel_qspi_dma_callback(void *param)
+{
+   struct atmel_qspi *aq = param;
+
+   complete(>transfer_complete);
+}
+
+static int atmel_qspi_run_dma_transfer(struct atmel_qspi *aq,
+  const struct atmel_qspi_command *cmd)
+{
+   struct device *dev = >pdev->dev;
+   enum dma_data_direction dir;
+   struct dma_async_tx_descriptor *desc;
+   dma_addr_t src, dst, dma_addr;
+   dma_cookie_t cookie;
+   u32 offset;
+   void *ptr;
+   int err = 0;
+
+   if (cmd->tx_buf) {
+   ptr = (void *)cmd->tx_buf;
+   dir = DMA_TO_DEVICE;
+   } else if (cmd->rx_buf) {
+   ptr = cmd->rx_buf;
+   dir = DMA_FROM_DEVICE;
+   } else {
+   return -EINVAL;
+   }
+
+   dma_addr = dma_map_single(dev, ptr, cmd->buf_len, dir);
+   if (dma_mapping_error(dev, dma_addr))
+   return -ENOMEM;
+
+   offset = cmd->enable.bits.address ? cmd->address : 0;
+   if (cmd->tx_buf) {
+   dst = aq->dma_mem + offset;
+   src = dma_addr;
+   } else {
+   dst = dma_addr;
+   src = aq->dma_mem + offset;
+   }
+
+   desc = dmaengine_prep_dma_memcpy(aq->chan, dst, src, cmd->buf_len,
+DMA_CTRL_ACK | DMA_PREP_INTERRUPT);
+   if (!desc) {
+   err = -ENOMEM;
+   goto unmap;
+   }
+
+   reinit_completion(>transfer_complete);
+   desc->callback = atmel_qspi_dma_callback;
+   desc->callback_param = aq;
+
+   cookie = dmaengine_submit(desc);
+   err = dma_submit_error(cookie);
+   if (err) {
+   err = -EIO;
+   goto unmap;
+   }
+
+   dma_async_issue_pending(aq->chan);
+   err = wait_for_completion_timeout(>transfer_complete,
+ msecs_to_jiffies(cmd->buf_len));
+   if (err <= 0) {
+   dmaengine_terminate_sync(aq->chan);
+   err = -ETIMEDOUT;
+   goto unmap;
+   }
+
+   err = 0;
+
+ unmap:
+   dma_unmap_single(dev, dma_addr, cmd->buf_len, dir);
+   return err;
+}
+
 static int atmel_qspi_run_transfer(struct atmel_qspi *aq,
   const struct atmel_qspi_command *cmd)
 {
void __iomem *ahb_mem;
 
+   /* Try DMA transfer first. */
+   if (aq->chan && cmd->buf_len >= QSPI_DMA_THRESHOLD) {
+   int err = atmel_qspi_run_dma_transfer(aq, cmd);
+
+   /* If the DMA transfer has started, stop here. */
+   if (err != -ENOMEM)
+   return err;
+   }
+
/* Then fallback to a PIO transfer (memcpy() DOES NOT work!) */
ahb_mem = aq->mem;
if (cmd->enable.bits.address)
@@ -604,7 +698,7 @@ static irqreturn_t atmel_qspi_interrupt(int irq, void 
*dev_id)
 
 static int atmel_qspi_probe(struct platform_device *pdev)
 {
-   const struct spi_nor_hwcaps hwcaps = {
+   struct spi_nor_hwcaps hwcaps = {
.mask = SNOR_HWCAPS_READ |
SNOR_HWCAPS_READ_FAST |
SNOR_HWCAPS_READ_1_1_2 |
@@ -657,19 +751,44 @@ static int atmel_qspi_probe(struct platform_device *pdev)
goto exit;
}
 
+   aq->dma_mem = (dma_addr_t)res->start;
+
+   /* Reserve a DMA channel for

[PATCH 0/3] mtd: spi-nor: fix DMA-unsafe buffer issue between MTD and SPI

2017-12-23 Thread Cyrille Pitchen

Hi all,

this series tries to solve a long time issue of compatibility between the
MTD and SPI sub-systems about whether we should use DMA-safe memory.

This issue is visible espcecially when using a UBI file-system on a SPI
NOR memory accessed through a SPI controller behind the m25p80 driver.

The SPI sub-system has already implemented a work-around on its side,
based on the spi_map_buf() function. However this function has its own
limitation too. Especially, even if it builds a 'struct scatterlist' from
a vmalloc'ed buffer, calling dma_map_sg() is still not safe on all
architectures. Especially, on ARM cores using either VIPT or VIVT data
caches, dma_map_sg() doesn't take the cache aliases issue into account.
Then numerous crashes were reported for such architectures.

If think it's high time to provide a reliable solution. So let's try to
work together!

The proposed solution here is based on an former series from Vignesh R and
relies on a bounce buffer.

For this series, I will need some pieces of advice on how to implement
a reliable [spi_nor_]is_dma_safe() function. The proposed implementation
in patch 1 is based on the tests performed by spi_map_buf() but I was
wondring whether it would be more cautious to consider:
DMA-safe <=> allocated by kmalloc'ed.

Actually, it's better using the bounce buffer when not needed than not
using it when we should.

Also the bounce buffer solution in spi-nor is designed so it could be
used as an helper so spi flash controller drivers other than m25p80.c
could now easily add support to DMA transfers for (Fast) Read and/or
Page Program operations.

I've implemented and tested it on a sama5d2 xplained board + Macronix
mx25l25673g SPI NOR memory reading from and writing into some UBI
file-system. I've also tested with mtd_debug to write then read back
a raw data into the SPI NOR memory, later checking the data integrity with
sha1sum.

For the atmel-quadspi.c driver, DMA memcpy() transfers are enabled only if
the "dmacap,mempcy" boolean property is set in the device-tree.
I found this name in some other device-trees already using it for a
boolean property.

Best regards,

Cyrille

Cyrille Pitchen (3):
  mtd: spi-nor: add optional DMA-safe bounce buffer for data transfer
  dt-bindings: mtd: atmel-quadspi: add an optional property
'dmacap,memcpy'
  mtd: atmel-quadspi: add support of DMA memcpy()

 .../devicetree/bindings/mtd/atmel-quadspi.txt  |   5 +
 drivers/mtd/devices/m25p80.c   |   4 +-
 drivers/mtd/spi-nor/atmel-quadspi.c| 132 +++-
 drivers/mtd/spi-nor/spi-nor.c  | 136 +++--
 include/linux/mtd/spi-nor.h|   8 ++
 5 files changed, 273 insertions(+), 12 deletions(-)

-- 
2.11.0

[PATCH 0/3] mtd: spi-nor: fix DMA-unsafe buffer issue between MTD and SPI

2017-12-23 Thread Cyrille Pitchen

Hi all,

this series tries to solve a long time issue of compatibility between the
MTD and SPI sub-systems about whether we should use DMA-safe memory.

This issue is visible espcecially when using a UBI file-system on a SPI
NOR memory accessed through a SPI controller behind the m25p80 driver.

The SPI sub-system has already implemented a work-around on its side,
based on the spi_map_buf() function. However this function has its own
limitation too. Especially, even if it builds a 'struct scatterlist' from
a vmalloc'ed buffer, calling dma_map_sg() is still not safe on all
architectures. Especially, on ARM cores using either VIPT or VIVT data
caches, dma_map_sg() doesn't take the cache aliases issue into account.
Then numerous crashes were reported for such architectures.

If think it's high time to provide a reliable solution. So let's try to
work together!

The proposed solution here is based on an former series from Vignesh R and
relies on a bounce buffer.

For this series, I will need some pieces of advice on how to implement
a reliable [spi_nor_]is_dma_safe() function. The proposed implementation
in patch 1 is based on the tests performed by spi_map_buf() but I was
wondring whether it would be more cautious to consider:
DMA-safe <=> allocated by kmalloc'ed.

Actually, it's better using the bounce buffer when not needed than not
using it when we should.

Also the bounce buffer solution in spi-nor is designed so it could be
used as an helper so spi flash controller drivers other than m25p80.c
could now easily add support to DMA transfers for (Fast) Read and/or
Page Program operations.

I've implemented and tested it on a sama5d2 xplained board + Macronix
mx25l25673g SPI NOR memory reading from and writing into some UBI
file-system. I've also tested with mtd_debug to write then read back
a raw data into the SPI NOR memory, later checking the data integrity with
sha1sum.

For the atmel-quadspi.c driver, DMA memcpy() transfers are enabled only if
the "dmacap,mempcy" boolean property is set in the device-tree.
I found this name in some other device-trees already using it for a
boolean property.

Best regards,

Cyrille

Cyrille Pitchen (3):
  mtd: spi-nor: add optional DMA-safe bounce buffer for data transfer
  dt-bindings: mtd: atmel-quadspi: add an optional property
'dmacap,memcpy'
  mtd: atmel-quadspi: add support of DMA memcpy()

 .../devicetree/bindings/mtd/atmel-quadspi.txt  |   5 +
 drivers/mtd/devices/m25p80.c   |   4 +-
 drivers/mtd/spi-nor/atmel-quadspi.c| 132 +++-
 drivers/mtd/spi-nor/spi-nor.c  | 136 +++--
 include/linux/mtd/spi-nor.h|   8 ++
 5 files changed, 273 insertions(+), 12 deletions(-)

-- 
2.11.0

[PATCH] b43: Replace mdelay with msleep in b43_radio_2057_init_post

2017-12-23 Thread Jia-Ju Bai

b43_radio_2057_init_post is not called in an interrupt handler 
nor holding a spinlock.
The function mdelay in it can be replaced with msleep, to reduce busy wait.

Signed-off-by: Jia-Ju Bai 
---
 drivers/net/wireless/broadcom/b43/phy_n.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/b43/phy_n.c 
b/drivers/net/wireless/broadcom/b43/phy_n.c
index a5557d7..5bc838e 100644
--- a/drivers/net/wireless/broadcom/b43/phy_n.c
+++ b/drivers/net/wireless/broadcom/b43/phy_n.c
@@ -1031,7 +1031,7 @@ static void b43_radio_2057_init_post(struct b43_wldev 
*dev)
 
b43_radio_set(dev, R2057_RFPLL_MISC_CAL_RESETN, 0x78);
b43_radio_set(dev, R2057_XTAL_CONFIG2, 0x80);
-   mdelay(2);
+   msleep(2);
b43_radio_mask(dev, R2057_RFPLL_MISC_CAL_RESETN, ~0x78);
b43_radio_mask(dev, R2057_XTAL_CONFIG2, ~0x80);
 
-- 
1.7.9.5

[PATCH] b43: Replace mdelay with msleep in b43_radio_2057_init_post

2017-12-23 Thread Jia-Ju Bai

b43_radio_2057_init_post is not called in an interrupt handler 
nor holding a spinlock.
The function mdelay in it can be replaced with msleep, to reduce busy wait.

Signed-off-by: Jia-Ju Bai 
---
 drivers/net/wireless/broadcom/b43/phy_n.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/b43/phy_n.c 
b/drivers/net/wireless/broadcom/b43/phy_n.c
index a5557d7..5bc838e 100644
--- a/drivers/net/wireless/broadcom/b43/phy_n.c
+++ b/drivers/net/wireless/broadcom/b43/phy_n.c
@@ -1031,7 +1031,7 @@ static void b43_radio_2057_init_post(struct b43_wldev 
*dev)
 
b43_radio_set(dev, R2057_RFPLL_MISC_CAL_RESETN, 0x78);
b43_radio_set(dev, R2057_XTAL_CONFIG2, 0x80);
-   mdelay(2);
+   msleep(2);
b43_radio_mask(dev, R2057_RFPLL_MISC_CAL_RESETN, ~0x78);
b43_radio_mask(dev, R2057_XTAL_CONFIG2, ~0x80);
 
-- 
1.7.9.5

[PATCH] sky2: Replace mdelay with msleep in sky2_vpd_wait

2017-12-23 Thread Jia-Ju Bai

sky2_vpd_wait is not called in an interrupt handler nor holding a spinlock.
The function mdelay in it can be replaced with msleep, to reduce busy wait.

Signed-off-by: Jia-Ju Bai 
---
 drivers/net/ethernet/marvell/sky2.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c 
b/drivers/net/ethernet/marvell/sky2.c
index 9efe177..9fe8530 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -4287,7 +4287,7 @@ static int sky2_vpd_wait(const struct sky2_hw *hw, int 
cap, u16 busy)
dev_err(>pdev->dev, "VPD cycle timed out\n");
return -ETIMEDOUT;
}
-   mdelay(1);
+   msleep(1);
}
 
return 0;
-- 
1.7.9.5

[PATCH] sky2: Replace mdelay with msleep in sky2_vpd_wait

2017-12-23 Thread Jia-Ju Bai

sky2_vpd_wait is not called in an interrupt handler nor holding a spinlock.
The function mdelay in it can be replaced with msleep, to reduce busy wait.

Signed-off-by: Jia-Ju Bai 
---
 drivers/net/ethernet/marvell/sky2.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/sky2.c 
b/drivers/net/ethernet/marvell/sky2.c
index 9efe177..9fe8530 100644
--- a/drivers/net/ethernet/marvell/sky2.c
+++ b/drivers/net/ethernet/marvell/sky2.c
@@ -4287,7 +4287,7 @@ static int sky2_vpd_wait(const struct sky2_hw *hw, int 
cap, u16 busy)
dev_err(>pdev->dev, "VPD cycle timed out\n");
return -ETIMEDOUT;
}
-   mdelay(1);
+   msleep(1);
}
 
return 0;
-- 
1.7.9.5

Re: [linus:master] BUILD REGRESSION d1f854ac240ea3928a99294390048e9b2aa6fa0e

2017-12-23 Thread Linus Torvalds

On Sat, Dec 23, 2017 at 4:28 PM, kbuild test robot
 wrote:
>
> Regressions in current branch:

This looks more like some odd compiler regression than a kernel one.

   Linus

Re: [linus:master] BUILD REGRESSION d1f854ac240ea3928a99294390048e9b2aa6fa0e

2017-12-23 Thread Linus Torvalds

On Sat, Dec 23, 2017 at 4:28 PM, kbuild test robot
 wrote:
>
> Regressions in current branch:

This looks more like some odd compiler regression than a kernel one.

   Linus

Re: PROBLEM: consolidated IDT invalidation causes kexec to reboot

2017-12-23 Thread Linus Torvalds

On Sat, Dec 23, 2017 at 5:44 PM, Alexandru Chirvasitu
 wrote:
>
> For testing purposes, I've altered machine_kexec_32.c making the
> following toy commit. It naively undoes part of e802a51, solely to
> confirm that's where it goes awry in my setup.

That's really funky.

The idt_invalidate() seems to do *exactly* the same thing. It uses
"load_idt()" on an IDT with size 0, and the supplied address.

Can you disassemble your "set_idt()" code vs the "idt_invalidate()"?

> Is this expected behaviour?

No. The code literally seems identical. The only difference is

 (a) where the 0 limit comes from

 (b) perhaps build flags and whether it is inlined or not due to being
in a different file

and neither of those should matter, but maybe they do.

Which is why I'd like you to actually look at the generated code and
see if you can see any difference..

Linus

Re: PROBLEM: consolidated IDT invalidation causes kexec to reboot

2017-12-23 Thread Linus Torvalds

On Sat, Dec 23, 2017 at 5:44 PM, Alexandru Chirvasitu
 wrote:
>
> For testing purposes, I've altered machine_kexec_32.c making the
> following toy commit. It naively undoes part of e802a51, solely to
> confirm that's where it goes awry in my setup.

That's really funky.

The idt_invalidate() seems to do *exactly* the same thing. It uses
"load_idt()" on an IDT with size 0, and the supplied address.

Can you disassemble your "set_idt()" code vs the "idt_invalidate()"?

> Is this expected behaviour?

No. The code literally seems identical. The only difference is

 (a) where the 0 limit comes from

 (b) perhaps build flags and whether it is inlined or not due to being
in a different file

and neither of those should matter, but maybe they do.

Which is why I'd like you to actually look at the generated code and
see if you can see any difference..

Linus

Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

2017-12-23 Thread Dou Liyang


Hi Thomas,

At 12/23/2017 09:32 PM, Thomas Gleixner wrote:
[...]


The BUG_ON panic happens at line 147:
BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));

I'm sure Thomas and Dou know it better than me.


I'll have a look after the holidays.



Merry Christmas!  :-)

I am trying to look into it.

Thanks,
dou

Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

2017-12-23 Thread Dou Liyang


Hi Thomas,

At 12/23/2017 09:32 PM, Thomas Gleixner wrote:
[...]


The BUG_ON panic happens at line 147:
BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));

I'm sure Thomas and Dou know it better than me.


I'll have a look after the holidays.



Merry Christmas!  :-)

I am trying to look into it.

Thanks,
dou

Re: [PATCH] zsmalloc: use U suffix for negative literals being shifted

2017-12-23 Thread Nick Desaulniers

On Sat, Dec 23, 2017 at 10:24 PM, Matthew Wilcox  wrote:
> On Sat, Dec 23, 2017 at 09:33:40PM -0500, Nick Desaulniers wrote:
>> Fixes warnings about shifting unsigned literals being undefined
>> behavior.
>
> Do you mean signed literals?


A sorry, s/unsigned/negative signed/g.  The warning is:

mm/zsmalloc.c:1059:20: warning: shifting a negative signed value is undefined
  [-Wshift-negative-value]
link->next = -1 << OBJ_TAG_BITS;
 ~~ ^

>
>>*/
>> - link->next = -1 << OBJ_TAG_BITS;
>> + link->next = -1U << OBJ_TAG_BITS;
>>   }
>
> I don't understand what -1U means.  Seems like a contradiction in terms,
> a negative unsigned number.  Is this supposed to be ~0U?

$ ag \\-1U[^L]

The code base is full of that literal.  I think of it as:

(unsigned) -1

Re: [PATCH] zsmalloc: use U suffix for negative literals being shifted

2017-12-23 Thread Nick Desaulniers

On Sat, Dec 23, 2017 at 10:24 PM, Matthew Wilcox  wrote:
> On Sat, Dec 23, 2017 at 09:33:40PM -0500, Nick Desaulniers wrote:
>> Fixes warnings about shifting unsigned literals being undefined
>> behavior.
>
> Do you mean signed literals?


A sorry, s/unsigned/negative signed/g.  The warning is:

mm/zsmalloc.c:1059:20: warning: shifting a negative signed value is undefined
  [-Wshift-negative-value]
link->next = -1 << OBJ_TAG_BITS;
 ~~ ^

>
>>*/
>> - link->next = -1 << OBJ_TAG_BITS;
>> + link->next = -1U << OBJ_TAG_BITS;
>>   }
>
> I don't understand what -1U means.  Seems like a contradiction in terms,
> a negative unsigned number.  Is this supposed to be ~0U?

$ ag \\-1U[^L]

The code base is full of that literal.  I think of it as:

(unsigned) -1

precedence bug in MAKE_PROCESS_CPUCLOCK macro?

2017-12-23 Thread Nick Desaulniers

I'm seeing the following warning compiling with Clang:

kernel/time/posix-cpu-timers.c:1397:29: warning: shifting a negative
signed value is undefined
  [-Wshift-negative-value]
return posix_cpu_clock_get(THREAD_CLOCK, tp);
   ^~~~
kernel/time/posix-cpu-timers.c:1367:22: note: expanded from macro 'THREAD_CLOCK'
#define THREAD_CLOCKMAKE_THREAD_CPUCLOCK(0, CPUCLOCK_SCHED)
^~~
./include/linux/posix-timers.h:48:2: note: expanded from macro
'MAKE_THREAD_CPUCLOCK'
MAKE_PROCESS_CPUCLOCK((tid), (clock) | CPUCLOCK_PERTHREAD_MASK)
^~~
./include/linux/posix-timers.h:46:23: note: expanded from macro
'MAKE_PROCESS_CPUCLOCK'
((~(clockid_t) (pid) << 3) | (clockid_t) (clock))
  ~~ ^

If I understand C's operator precedence rules
(http://en.cppreference.com/w/c/language/operator_precedence)
correctly, then I suspect the problem is in the sub-expression:

(~(clockid_t) (pid) << 3)

where pid (an argument to the macro) is first cast to a clockid_t (aka
[signed] int), then negated, then shifted by 3 (oops, undefined
behavior).

Should the result after negation be cast to an unsigned int, or should
the left shift happen before negation?

CPUCLOCK_PID and CLOCKID_TO_FD seem to shift then negate, but
FD_TO_CLOCKID seems to have the same issue as MAKE_PROCESS_CPUCLOCK.

Changing the sub-expression to:

(~(clockid_t) ((pid) << 3))

changes what it evaluates to.  Changing it to:

(~(unsigned) (pid) << 3)

or

((unsigned) ~(clockid_t) (pid) << 3)

or

(((unsigned) ~(clockid_t) (pid)) << 3) /* ugly */

does not.

I'm happy to send a patch with your suggestion.

precedence bug in MAKE_PROCESS_CPUCLOCK macro?

2017-12-23 Thread Nick Desaulniers

I'm seeing the following warning compiling with Clang:

kernel/time/posix-cpu-timers.c:1397:29: warning: shifting a negative
signed value is undefined
  [-Wshift-negative-value]
return posix_cpu_clock_get(THREAD_CLOCK, tp);
   ^~~~
kernel/time/posix-cpu-timers.c:1367:22: note: expanded from macro 'THREAD_CLOCK'
#define THREAD_CLOCKMAKE_THREAD_CPUCLOCK(0, CPUCLOCK_SCHED)
^~~
./include/linux/posix-timers.h:48:2: note: expanded from macro
'MAKE_THREAD_CPUCLOCK'
MAKE_PROCESS_CPUCLOCK((tid), (clock) | CPUCLOCK_PERTHREAD_MASK)
^~~
./include/linux/posix-timers.h:46:23: note: expanded from macro
'MAKE_PROCESS_CPUCLOCK'
((~(clockid_t) (pid) << 3) | (clockid_t) (clock))
  ~~ ^

If I understand C's operator precedence rules
(http://en.cppreference.com/w/c/language/operator_precedence)
correctly, then I suspect the problem is in the sub-expression:

(~(clockid_t) (pid) << 3)

where pid (an argument to the macro) is first cast to a clockid_t (aka
[signed] int), then negated, then shifted by 3 (oops, undefined
behavior).

Should the result after negation be cast to an unsigned int, or should
the left shift happen before negation?

CPUCLOCK_PID and CLOCKID_TO_FD seem to shift then negate, but
FD_TO_CLOCKID seems to have the same issue as MAKE_PROCESS_CPUCLOCK.

Changing the sub-expression to:

(~(clockid_t) ((pid) << 3))

changes what it evaluates to.  Changing it to:

(~(unsigned) (pid) << 3)

or

((unsigned) ~(clockid_t) (pid) << 3)

or

(((unsigned) ~(clockid_t) (pid)) << 3) /* ugly */

does not.

I'm happy to send a patch with your suggestion.

Re: [PATCH] zsmalloc: use U suffix for negative literals being shifted

2017-12-23 Thread Matthew Wilcox

On Sat, Dec 23, 2017 at 09:33:40PM -0500, Nick Desaulniers wrote:
> Fixes warnings about shifting unsigned literals being undefined
> behavior.

Do you mean signed literals?

>*/
> - link->next = -1 << OBJ_TAG_BITS;
> + link->next = -1U << OBJ_TAG_BITS;
>   }

I don't understand what -1U means.  Seems like a contradiction in terms,
a negative unsigned number.  Is this supposed to be ~0U?

Re: [PATCH] zsmalloc: use U suffix for negative literals being shifted

2017-12-23 Thread Matthew Wilcox

On Sat, Dec 23, 2017 at 09:33:40PM -0500, Nick Desaulniers wrote:
> Fixes warnings about shifting unsigned literals being undefined
> behavior.

Do you mean signed literals?

>*/
> - link->next = -1 << OBJ_TAG_BITS;
> + link->next = -1U << OBJ_TAG_BITS;
>   }

I don't understand what -1U means.  Seems like a contradiction in terms,
a negative unsigned number.  Is this supposed to be ~0U?

Re: [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-12-23 Thread Matthew Wilcox

On Tue, Dec 19, 2017 at 08:17:56PM +0800, Wei Wang wrote:
> +/*
> + * Send balloon pages in sgs to host. The balloon pages are recorded in the
> + * page xbitmap. Each bit in the bitmap corresponds to a page of PAGE_SIZE.
> + * The page xbitmap is searched for continuous "1" bits, which correspond
> + * to continuous pages, to chunk into sgs.
> + *
> + * @page_xb_start and @page_xb_end form the range of bits in the xbitmap that
> + * need to be searched.
> + */
> +static void tell_host_sgs(struct virtio_balloon *vb,
> +   struct virtqueue *vq,
> +   unsigned long page_xb_start,
> +   unsigned long page_xb_end)

I'm not crazy about the naming here.  I'd use pfn_min and pfn_max like
you use in the caller.

> +{
> + unsigned long pfn_start, pfn_end;
> + uint32_t max_len = round_down(UINT_MAX, PAGE_SIZE);
> + uint64_t len;
> +
> + pfn_start = page_xb_start;

And I think pfn_start is actually just 'pfn'.

'pfn_end' is perhaps just 'end'.  Or 'gap'.

> + while (pfn_start < page_xb_end) {
> + pfn_start = xb_find_set(>page_xb, page_xb_end + 1,
> + pfn_start);
> + if (pfn_start == page_xb_end + 1)
> + break;
> + pfn_end = xb_find_zero(>page_xb, page_xb_end + 1,
> +pfn_start);
> + len = (pfn_end - pfn_start) << PAGE_SHIFT;

> +static inline int xb_set_page(struct virtio_balloon *vb,
> +struct page *page,
> +unsigned long *pfn_min,
> +unsigned long *pfn_max)
> +{

I really don't like it that you're naming things after the 'xb'.
Things should be named by something that makes sense to the user, not
after the particular implementation.  If you changed the underlying
representation from an xbitmap to, say, a BTree, you wouldn't want to
rename this function to 'btree_set_page'.  Maybe this function is really
"vb_set_page".  Or "record_page".  Or something.  Someone who understands
this driver better than I do can probably weigh in with a better name.

> + unsigned long pfn = page_to_pfn(page);
> + int ret;
> +
> + *pfn_min = min(pfn, *pfn_min);
> + *pfn_max = max(pfn, *pfn_max);
> +
> + do {
> + if (xb_preload(GFP_NOWAIT | __GFP_NOWARN) < 0)
> + return -ENOMEM;
> +
> + ret = xb_set_bit(>page_xb, pfn);
> + xb_preload_end();
> + } while (unlikely(ret == -EAGAIN));

OK, so you don't need a spinlock because you're under a mutex?  But you
can't allocate memory because you're in the balloon driver, and so a
GFP_KERNEL allocation might recurse into your driver?  Would GFP_NOIO
do the job?  I'm a little hazy on exactly how the balloon driver works.

If you can't preload with anything better than that, I think that
xb_set_bit() should attempt an allocation with GFP_NOWAIT | __GFP_NOWARN,
and then you can skip the preload; it has no value for you.

> @@ -173,8 +292,15 @@ static unsigned fill_balloon(struct virtio_balloon *vb, 
> size_t num)
>  
>   while ((page = balloon_page_pop())) {
>   balloon_page_enqueue(>vb_dev_info, page);
> + if (use_sg) {
> + if (xb_set_page(vb, page, _min, _max) < 0) {
> + __free_page(page);
> + continue;
> + }
> + } else {
> + set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> + }

Is this the right behaviour?  If we can't record the page in the xb,
wouldn't we rather send it across as a single page?

Re: [PATCH v20 4/7] virtio-balloon: VIRTIO_BALLOON_F_SG

2017-12-23 Thread Matthew Wilcox

On Tue, Dec 19, 2017 at 08:17:56PM +0800, Wei Wang wrote:
> +/*
> + * Send balloon pages in sgs to host. The balloon pages are recorded in the
> + * page xbitmap. Each bit in the bitmap corresponds to a page of PAGE_SIZE.
> + * The page xbitmap is searched for continuous "1" bits, which correspond
> + * to continuous pages, to chunk into sgs.
> + *
> + * @page_xb_start and @page_xb_end form the range of bits in the xbitmap that
> + * need to be searched.
> + */
> +static void tell_host_sgs(struct virtio_balloon *vb,
> +   struct virtqueue *vq,
> +   unsigned long page_xb_start,
> +   unsigned long page_xb_end)

I'm not crazy about the naming here.  I'd use pfn_min and pfn_max like
you use in the caller.

> +{
> + unsigned long pfn_start, pfn_end;
> + uint32_t max_len = round_down(UINT_MAX, PAGE_SIZE);
> + uint64_t len;
> +
> + pfn_start = page_xb_start;

And I think pfn_start is actually just 'pfn'.

'pfn_end' is perhaps just 'end'.  Or 'gap'.

> + while (pfn_start < page_xb_end) {
> + pfn_start = xb_find_set(>page_xb, page_xb_end + 1,
> + pfn_start);
> + if (pfn_start == page_xb_end + 1)
> + break;
> + pfn_end = xb_find_zero(>page_xb, page_xb_end + 1,
> +pfn_start);
> + len = (pfn_end - pfn_start) << PAGE_SHIFT;

> +static inline int xb_set_page(struct virtio_balloon *vb,
> +struct page *page,
> +unsigned long *pfn_min,
> +unsigned long *pfn_max)
> +{

I really don't like it that you're naming things after the 'xb'.
Things should be named by something that makes sense to the user, not
after the particular implementation.  If you changed the underlying
representation from an xbitmap to, say, a BTree, you wouldn't want to
rename this function to 'btree_set_page'.  Maybe this function is really
"vb_set_page".  Or "record_page".  Or something.  Someone who understands
this driver better than I do can probably weigh in with a better name.

> + unsigned long pfn = page_to_pfn(page);
> + int ret;
> +
> + *pfn_min = min(pfn, *pfn_min);
> + *pfn_max = max(pfn, *pfn_max);
> +
> + do {
> + if (xb_preload(GFP_NOWAIT | __GFP_NOWARN) < 0)
> + return -ENOMEM;
> +
> + ret = xb_set_bit(>page_xb, pfn);
> + xb_preload_end();
> + } while (unlikely(ret == -EAGAIN));

OK, so you don't need a spinlock because you're under a mutex?  But you
can't allocate memory because you're in the balloon driver, and so a
GFP_KERNEL allocation might recurse into your driver?  Would GFP_NOIO
do the job?  I'm a little hazy on exactly how the balloon driver works.

If you can't preload with anything better than that, I think that
xb_set_bit() should attempt an allocation with GFP_NOWAIT | __GFP_NOWARN,
and then you can skip the preload; it has no value for you.

> @@ -173,8 +292,15 @@ static unsigned fill_balloon(struct virtio_balloon *vb, 
> size_t num)
>  
>   while ((page = balloon_page_pop())) {
>   balloon_page_enqueue(>vb_dev_info, page);
> + if (use_sg) {
> + if (xb_set_page(vb, page, _min, _max) < 0) {
> + __free_page(page);
> + continue;
> + }
> + } else {
> + set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
> + }

Is this the right behaviour?  If we can't record the page in the xb,
wouldn't we rather send it across as a single page?

[PATCH] pid: Handle failure to allocate the first pid in a pid namespace

2017-12-23 Thread Eric W. Biederman


With the replacement of the pid bitmap and hashtable with an idr in
alloc_pid started occassionally failing when allocating the first pid
in a pid namespace.  Things were not completely reset resulting in
the first allocated pid getting the number 2 (not 1).  Which
further resulted in ns->proc_mnt not getting set and eventually
causing an oops in proc_flush_task.

Oops:  [#1] SMP
CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2
RIP: 0010:proc_flush_task+0x8e/0x1b0
RSP: 0018:c9000bbffc40 EFLAGS: 00010286
RAX: 0001 RBX: 0001 RCX: fffb
RDX:  RSI: c9000bbffc50 RDI: 
RBP: c9000bbffc63 R08:  R09: 0002
R10: c9000bbffb70 R11: c9000bbffc64 R12: 0003
R13:  R14: 0003 R15: 8804c10d7840
FS:  7f7cb8965700() GS:88050a20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 0003e21ae003 CR4: 001606e0
DR0: 7fb1d6c22000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 ? release_task+0xaf/0x680
 release_task+0xd2/0x680
 ? wait_consider_task+0xb82/0xce0
 wait_consider_task+0xbe9/0xce0
 ? do_wait+0xe1/0x330
 do_wait+0x151/0x330
 kernel_wait4+0x8d/0x150
 ? task_stopped_code+0x50/0x50
 SYSC_wait4+0x95/0xa0
 ? rcu_read_lock_sched_held+0x6c/0x80
 ? syscall_trace_enter+0x2d7/0x340
 ? do_syscall_64+0x60/0x210
 do_syscall_64+0x60/0x210
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f7cb82603aa
RSP: 002b:7ffd60770bc8 EFLAGS: 0246
 ORIG_RAX: 003d
RAX: ffda RBX: 7f7cb6cd4000 RCX: 7f7cb82603aa
RDX: 000b RSI: 7ffd60770bd0 RDI: 7cca
RBP: 7cca R08: 7f7cb8965700 R09: 7ffd607c7080
R10:  R11: 0246 R12: 
R13: 7ffd60770bd0 R14: 7f7cb6cd4058 R15: cccd
Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 
e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a 
f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24
RIP: proc_flush_task+0x8e/0x1b0 RSP: c9000bbffc40
CR2: 
---[ end trace 53d67a6481059862 ]---

Improve the quality of the implementation by resetting the place to
start allocating pids on failure to allocate the first pid.

As improving the quality of the implementation is the goal remove the now
unnecesarry disable_pid_allocations call when we fail to mount proc.

Fixes: 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API")
Fixes: 8ef047aaaeb8 ("pid namespaces: make alloc_pid(), free_pid() and 
put_pid() work with struct upid")
Reported-by: Dave Jones 
Signed-off-by: "Eric W. Biederman" 
---

I have tested this by forcing an allocation failure after allocating pid
1, and the symptoms are identical to what Dave Jones has observed.

I have further confirmed that this change causes the failure to go away.

If no one objects I will send a pull request to Linus in the next day or two.

 kernel/pid.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index b13b624e2c49..1e8bb6550ec4 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -193,10 +193,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
}
 
if (unlikely(is_child_reaper(pid))) {
-   if (pid_ns_prepare_proc(ns)) {
-   disable_pid_allocation(ns);
+   if (pid_ns_prepare_proc(ns))
goto out_free;
-   }
}
 
get_pid_ns(ns);
@@ -226,6 +224,10 @@ struct pid *alloc_pid(struct pid_namespace *ns)
while (++i <= ns->level)
idr_remove(>idr, (pid->numbers + i)->nr);
 
+   /* On failure to allocate the first pid, reset the state */
+   if (ns->pid_allocated == PIDNS_ADDING)
+   idr_set_cursor(>idr, 0);
+
spin_unlock_irq(_lock);
 
kmem_cache_free(ns->pid_cachep, pid);
-- 
2.14.1

[PATCH] pid: Handle failure to allocate the first pid in a pid namespace

2017-12-23 Thread Eric W. Biederman


With the replacement of the pid bitmap and hashtable with an idr in
alloc_pid started occassionally failing when allocating the first pid
in a pid namespace.  Things were not completely reset resulting in
the first allocated pid getting the number 2 (not 1).  Which
further resulted in ns->proc_mnt not getting set and eventually
causing an oops in proc_flush_task.

Oops:  [#1] SMP
CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2
RIP: 0010:proc_flush_task+0x8e/0x1b0
RSP: 0018:c9000bbffc40 EFLAGS: 00010286
RAX: 0001 RBX: 0001 RCX: fffb
RDX:  RSI: c9000bbffc50 RDI: 
RBP: c9000bbffc63 R08:  R09: 0002
R10: c9000bbffb70 R11: c9000bbffc64 R12: 0003
R13:  R14: 0003 R15: 8804c10d7840
FS:  7f7cb8965700() GS:88050a20() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 0003e21ae003 CR4: 001606e0
DR0: 7fb1d6c22000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0600
Call Trace:
 ? release_task+0xaf/0x680
 release_task+0xd2/0x680
 ? wait_consider_task+0xb82/0xce0
 wait_consider_task+0xbe9/0xce0
 ? do_wait+0xe1/0x330
 do_wait+0x151/0x330
 kernel_wait4+0x8d/0x150
 ? task_stopped_code+0x50/0x50
 SYSC_wait4+0x95/0xa0
 ? rcu_read_lock_sched_held+0x6c/0x80
 ? syscall_trace_enter+0x2d7/0x340
 ? do_syscall_64+0x60/0x210
 do_syscall_64+0x60/0x210
 entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x7f7cb82603aa
RSP: 002b:7ffd60770bc8 EFLAGS: 0246
 ORIG_RAX: 003d
RAX: ffda RBX: 7f7cb6cd4000 RCX: 7f7cb82603aa
RDX: 000b RSI: 7ffd60770bd0 RDI: 7cca
RBP: 7cca R08: 7f7cb8965700 R09: 7ffd607c7080
R10:  R11: 0246 R12: 
R13: 7ffd60770bd0 R14: 7f7cb6cd4058 R15: cccd
Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 
e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a 
f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24
RIP: proc_flush_task+0x8e/0x1b0 RSP: c9000bbffc40
CR2: 
---[ end trace 53d67a6481059862 ]---

Improve the quality of the implementation by resetting the place to
start allocating pids on failure to allocate the first pid.

As improving the quality of the implementation is the goal remove the now
unnecesarry disable_pid_allocations call when we fail to mount proc.

Fixes: 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API")
Fixes: 8ef047aaaeb8 ("pid namespaces: make alloc_pid(), free_pid() and 
put_pid() work with struct upid")
Reported-by: Dave Jones 
Signed-off-by: "Eric W. Biederman" 
---

I have tested this by forcing an allocation failure after allocating pid
1, and the symptoms are identical to what Dave Jones has observed.

I have further confirmed that this change causes the failure to go away.

If no one objects I will send a pull request to Linus in the next day or two.

 kernel/pid.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index b13b624e2c49..1e8bb6550ec4 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -193,10 +193,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
}
 
if (unlikely(is_child_reaper(pid))) {
-   if (pid_ns_prepare_proc(ns)) {
-   disable_pid_allocation(ns);
+   if (pid_ns_prepare_proc(ns))
goto out_free;
-   }
}
 
get_pid_ns(ns);
@@ -226,6 +224,10 @@ struct pid *alloc_pid(struct pid_namespace *ns)
while (++i <= ns->level)
idr_remove(>idr, (pid->numbers + i)->nr);
 
+   /* On failure to allocate the first pid, reset the state */
+   if (ns->pid_allocated == PIDNS_ADDING)
+   idr_set_cursor(>idr, 0);
+
spin_unlock_irq(_lock);
 
kmem_cache_free(ns->pid_cachep, pid);
-- 
2.14.1

Re: [TEST PATCH] pid: fix allocating pid 2 for init (was Re: proc_flush_task oops)

2017-12-23 Thread Eric W. Biederman

Alexey Dobriyan  writes:

> On Fri, Dec 22, 2017 at 08:41:54AM -0600, Eric W. Biederman wrote:
>> Alexey Dobriyan  writes:
>
>> > unshare
>> > fork
>> > alloc_pid in level 1 succeeds
>> > alloc_pid in level 0 fails, ->idr_next is 2
>> > fork
>> > alloc pid 2
>> > exit
>> >
>> > Reliable reproducer and fail injection patch attached
>> >
>> > I'd say proper fix is allocating pids in the opposite order
>> > so that failure in the last layer doesn't move IDR cursor
>> > in baby child pidns.
>> 
>> I agree with you about changing the order.  That will make
>> the code simpler and in the second loop actually conforming C code,
>> and fix the immediate problem.
>
> Something like that (barely tested)

I have thought about this some more and I think we can do better.

I don't like the on stack pid_ns array.

The only reason the code calls disable_pid_allocation is that we
don't handle this error.

The semantics of least surprise, are that if we run out of resources
while allocating something, trying again when more resources are
available will make it work.

So it looks like handling the error will improve the quality of the
implemenation, and be a simpler, less dangerous patch.

diff --git a/kernel/pid.c b/kernel/pid.c
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -226,6 +224,10 @@ struct pid *alloc_pid(struct pid_namespace *ns)
while (++i <= ns->level)
idr_remove(>idr, (pid->numbers + i)->nr);

+   /* On failure to allocate the first pid, reset the state */
+   if (ns->pid_allocated == PIDNS_ADDING)
+   idr_set_cursor(>idr, 0);
+
spin_unlock_irq(_lock);

kmem_cache_free(ns->pid_cachep, pid);

Eric

Re: [TEST PATCH] pid: fix allocating pid 2 for init (was Re: proc_flush_task oops)

2017-12-23 Thread Eric W. Biederman

Alexey Dobriyan  writes:

> On Fri, Dec 22, 2017 at 08:41:54AM -0600, Eric W. Biederman wrote:
>> Alexey Dobriyan  writes:
>
>> > unshare
>> > fork
>> > alloc_pid in level 1 succeeds
>> > alloc_pid in level 0 fails, ->idr_next is 2
>> > fork
>> > alloc pid 2
>> > exit
>> >
>> > Reliable reproducer and fail injection patch attached
>> >
>> > I'd say proper fix is allocating pids in the opposite order
>> > so that failure in the last layer doesn't move IDR cursor
>> > in baby child pidns.
>> 
>> I agree with you about changing the order.  That will make
>> the code simpler and in the second loop actually conforming C code,
>> and fix the immediate problem.
>
> Something like that (barely tested)

I have thought about this some more and I think we can do better.

I don't like the on stack pid_ns array.

The only reason the code calls disable_pid_allocation is that we
don't handle this error.

The semantics of least surprise, are that if we run out of resources
while allocating something, trying again when more resources are
available will make it work.

So it looks like handling the error will improve the quality of the
implemenation, and be a simpler, less dangerous patch.

diff --git a/kernel/pid.c b/kernel/pid.c
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -226,6 +224,10 @@ struct pid *alloc_pid(struct pid_namespace *ns)
while (++i <= ns->level)
idr_remove(>idr, (pid->numbers + i)->nr);

+   /* On failure to allocate the first pid, reset the state */
+   if (ns->pid_allocated == PIDNS_ADDING)
+   idr_set_cursor(>idr, 0);
+
spin_unlock_irq(_lock);

kmem_cache_free(ns->pid_cachep, pid);

Eric

[PATCH] x86/xen/time: fix section mismatch for xen_init_time_ops()

2017-12-23 Thread Nick Desaulniers

The header declares this function as __init but is defined in __ref
section.

Signed-off-by: Nick Desaulniers 
---
 arch/x86/xen/xen-ops.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 75011b8..3b34745 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -72,7 +72,7 @@ u64 xen_clocksource_read(void);
 void xen_setup_cpu_clockevents(void);
 void xen_save_time_memory_area(void);
 void xen_restore_time_memory_area(void);
-void __init xen_init_time_ops(void);
+void __ref xen_init_time_ops(void);
 void __init xen_hvm_init_time_ops(void);
 
 irqreturn_t xen_debug_interrupt(int irq, void *dev_id);
-- 
2.7.4

[PATCH] x86/xen/time: fix section mismatch for xen_init_time_ops()

2017-12-23 Thread Nick Desaulniers

The header declares this function as __init but is defined in __ref
section.

Signed-off-by: Nick Desaulniers 
---
 arch/x86/xen/xen-ops.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 75011b8..3b34745 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -72,7 +72,7 @@ u64 xen_clocksource_read(void);
 void xen_setup_cpu_clockevents(void);
 void xen_save_time_memory_area(void);
 void xen_restore_time_memory_area(void);
-void __init xen_init_time_ops(void);
+void __ref xen_init_time_ops(void);
 void __init xen_hvm_init_time_ops(void);
 
 irqreturn_t xen_debug_interrupt(int irq, void *dev_id);
-- 
2.7.4

[PATCH] zsmalloc: use U suffix for negative literals being shifted

2017-12-23 Thread Nick Desaulniers

Fixes warnings about shifting unsigned literals being undefined
behavior.

Signed-off-by: Nick Desaulniers 
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 685049a..5d31458 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1056,7 +1056,7 @@ static void init_zspage(struct size_class *class, struct 
zspage *zspage)
 * Reset OBJ_TAG_BITS bit to last link to tell
 * whether it's allocated object or not.
 */
-   link->next = -1 << OBJ_TAG_BITS;
+   link->next = -1U << OBJ_TAG_BITS;
}
kunmap_atomic(vaddr);
page = next_page;
-- 
2.7.4

[PATCH] zsmalloc: use U suffix for negative literals being shifted

2017-12-23 Thread Nick Desaulniers

Fixes warnings about shifting unsigned literals being undefined
behavior.

Signed-off-by: Nick Desaulniers 
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 685049a..5d31458 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1056,7 +1056,7 @@ static void init_zspage(struct size_class *class, struct 
zspage *zspage)
 * Reset OBJ_TAG_BITS bit to last link to tell
 * whether it's allocated object or not.
 */
-   link->next = -1 << OBJ_TAG_BITS;
+   link->next = -1U << OBJ_TAG_BITS;
}
kunmap_atomic(vaddr);
page = next_page;
-- 
2.7.4

[PATCH] objtool: fix enum-conversion warning

2017-12-23 Thread Nick Desaulniers

When compiling with Clang, the following warning is observed:

  CC   tools/objtool/arch/x86/decode.o
arch/x86/decode.c:141:20: error: implicit conversion from enumeration
type 'enum op_src_type' to different enumeration
type 'enum op_dest_type' [-Werror,-Wenum-conversion]

  op->dest.type = OP_SRC_REG;
~ ^~

Signed-off-by: Nick Desaulniers 
---
 tools/objtool/arch/x86/decode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 8acfc47..540a209 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -138,7 +138,7 @@ int arch_decode_instruction(struct elf *elf, struct section 
*sec,
*type = INSN_STACK;
op->src.type = OP_SRC_ADD;
op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
-   op->dest.type = OP_SRC_REG;
+   op->dest.type = OP_DEST_REG;
op->dest.reg = CFI_SP;
}
break;
-- 
2.7.4

[PATCH] objtool: fix enum-conversion warning

2017-12-23 Thread Nick Desaulniers

When compiling with Clang, the following warning is observed:

  CC   tools/objtool/arch/x86/decode.o
arch/x86/decode.c:141:20: error: implicit conversion from enumeration
type 'enum op_src_type' to different enumeration
type 'enum op_dest_type' [-Werror,-Wenum-conversion]

  op->dest.type = OP_SRC_REG;
~ ^~

Signed-off-by: Nick Desaulniers 
---
 tools/objtool/arch/x86/decode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 8acfc47..540a209 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -138,7 +138,7 @@ int arch_decode_instruction(struct elf *elf, struct section 
*sec,
*type = INSN_STACK;
op->src.type = OP_SRC_ADD;
op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
-   op->dest.type = OP_SRC_REG;
+   op->dest.type = OP_DEST_REG;
op->dest.reg = CFI_SP;
}
break;
-- 
2.7.4

Re: linux/master crashes on boot with KASAN=y

2017-12-23 Thread Andy Lutomirski

On Sat, Dec 23, 2017 at 4:41 AM, Andrey Ryabinin
 wrote:
> On 12/23/2017 11:01 AM, Jakub Kicinski wrote:
>> Hi!
>>
>> I bisected a crash on boot to this:
>>
>> commit 21506525fb8ddb0342f2a2370812d47f6a1f3833 (HEAD, refs/bisect/bad)
>> Author: Andy Lutomirski 
>> Date:   Mon Dec 4 15:07:16 2017 +0100
>>
>> x86/kasan/64: Teach KASAN about the cpu_entry_area
>
>
> Thanks.
> There is nothing wrong with this patch, it just uncovered older bug.
> The actual problem comes from f06bdd4001c2 ("x86/mm: Adapt MODULES_END based 
> on fixmap section size")
> which is made kasan_mem_to_shadow(MODULES_END) potentially unaligned to page 
> boundary.
> And when we feed unaligned address to kasan_populate_zero_shadow() it doesn't 
> do the right thing.
>
> Could you tell me if the fix bellow works for you?
>
> ---
>  arch/x86/include/asm/kasan.h| 8 
>  arch/x86/include/asm/pgtable_64_types.h | 4 +++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kasan.h b/arch/x86/include/asm/kasan.h
> index b577dd0916aa..0c580e4b2ccc 100644
> --- a/arch/x86/include/asm/kasan.h
> +++ b/arch/x86/include/asm/kasan.h
> @@ -5,6 +5,14 @@
>  #include 
>  #define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>
> +#ifndef KASAN_SHADOW_SCALE_SHIFT
> +# ifdef CONFIG_KASAN
> +#  define KASAN_SHADOW_SCALE_SHIFT 3
> +# else
> +#  define KASAN_SHADOW_SCALE_SHIFT 0
> +# endif
> +#endif
> +
>  /*
>   * Compiler uses shadow offset assuming that addresses start
>   * from 0. Kernel addresses don't start from 0, so shadow
> diff --git a/arch/x86/include/asm/pgtable_64_types.h 
> b/arch/x86/include/asm/pgtable_64_types.h
> index 6d5f45dcd4a1..d34a90d6c374 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -6,6 +6,7 @@
>
>  #ifndef __ASSEMBLY__
>  #include 
> +#include 
>  #include 
>
>  /*
> @@ -96,7 +97,8 @@ typedef struct { pteval_t pte; } pte_t;
>  #define VMALLOC_END(VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
>  #define MODULES_VADDR(__START_KERNEL_map + KERNEL_IMAGE_SIZE)
>  /* The module sections ends with the start of the fixmap */
> -#define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
> +#define MODULES_END   (__fix_to_virt(__end_of_fixed_addresses + 1) & \
> +   ~((PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT) - 1))

Could this be #define MODULES_END KASAN_ROUND_DOWN(__fix_to_virt(...)) instead?

Re: linux/master crashes on boot with KASAN=y

2017-12-23 Thread Andy Lutomirski

On Sat, Dec 23, 2017 at 4:41 AM, Andrey Ryabinin
 wrote:
> On 12/23/2017 11:01 AM, Jakub Kicinski wrote:
>> Hi!
>>
>> I bisected a crash on boot to this:
>>
>> commit 21506525fb8ddb0342f2a2370812d47f6a1f3833 (HEAD, refs/bisect/bad)
>> Author: Andy Lutomirski 
>> Date:   Mon Dec 4 15:07:16 2017 +0100
>>
>> x86/kasan/64: Teach KASAN about the cpu_entry_area
>
>
> Thanks.
> There is nothing wrong with this patch, it just uncovered older bug.
> The actual problem comes from f06bdd4001c2 ("x86/mm: Adapt MODULES_END based 
> on fixmap section size")
> which is made kasan_mem_to_shadow(MODULES_END) potentially unaligned to page 
> boundary.
> And when we feed unaligned address to kasan_populate_zero_shadow() it doesn't 
> do the right thing.
>
> Could you tell me if the fix bellow works for you?
>
> ---
>  arch/x86/include/asm/kasan.h| 8 
>  arch/x86/include/asm/pgtable_64_types.h | 4 +++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kasan.h b/arch/x86/include/asm/kasan.h
> index b577dd0916aa..0c580e4b2ccc 100644
> --- a/arch/x86/include/asm/kasan.h
> +++ b/arch/x86/include/asm/kasan.h
> @@ -5,6 +5,14 @@
>  #include 
>  #define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>
> +#ifndef KASAN_SHADOW_SCALE_SHIFT
> +# ifdef CONFIG_KASAN
> +#  define KASAN_SHADOW_SCALE_SHIFT 3
> +# else
> +#  define KASAN_SHADOW_SCALE_SHIFT 0
> +# endif
> +#endif
> +
>  /*
>   * Compiler uses shadow offset assuming that addresses start
>   * from 0. Kernel addresses don't start from 0, so shadow
> diff --git a/arch/x86/include/asm/pgtable_64_types.h 
> b/arch/x86/include/asm/pgtable_64_types.h
> index 6d5f45dcd4a1..d34a90d6c374 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -6,6 +6,7 @@
>
>  #ifndef __ASSEMBLY__
>  #include 
> +#include 
>  #include 
>
>  /*
> @@ -96,7 +97,8 @@ typedef struct { pteval_t pte; } pte_t;
>  #define VMALLOC_END(VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
>  #define MODULES_VADDR(__START_KERNEL_map + KERNEL_IMAGE_SIZE)
>  /* The module sections ends with the start of the fixmap */
> -#define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
> +#define MODULES_END   (__fix_to_virt(__end_of_fixed_addresses + 1) & \
> +   ~((PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT) - 1))

Could this be #define MODULES_END KASAN_ROUND_DOWN(__fix_to_virt(...)) instead?

PROBLEM: consolidated IDT invalidation causes kexec to reboot

2017-12-23 Thread Alexandru Chirvasitu

Short description: loading a crash kernel with (a) kexec -l [..] or
(b) kexec -p [..] and then testing it with (a) kexec -e or (b) echo c
> /proc/sysrq-trigger results in a regular reboot (going through BIOS,
etc.).

The commit that starts exhibiting this behaviour for me is

e802a51: x86/idt: Consolidate IDT invalidation

with its parent 8f55868 behaving normally (in scenarios (a) and (b)
alike; (b) produces /proc/vmcore, etc.).

For testing purposes, I've altered machine_kexec_32.c making the
following toy commit. It naively undoes part of e802a51, solely to
confirm that's where it goes awry in my setup.


--

machine_kexec calls set_idt instead of idt_invalidate for testing purposes

diff --git a/arch/x86/kernel/machine_kexec_32.c 
b/arch/x86/kernel/machine_kexec_32.c
index 00bc751..70f7d05 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -26,6 +26,19 @@
 #include 
 #include 
 
+
+
+static void set_idt(void *newidt, __u16 limit)
+{
+   struct desc_ptr curidt;
+
+   /* ia32 supports unaliged loads & stores */
+   curidt.size= limit;
+   curidt.address = (unsigned long)newidt;
+
+   load_idt();
+}
+
 static void set_gdt(void *newgdt, __u16 limit)
 {
struct desc_ptr curgdt;
@@ -233,7 +246,7 @@ void machine_kexec(struct kimage *image)
 * If you want to load them you must set up your own idt & gdt.
 */
set_gdt(phys_to_virt(0), 0);
-   idt_invalidate(phys_to_virt(0));
+   set_idt(phys_to_virt(0), 0);
 
/* now call it */
image->start = relocate_kernel_ptr((unsigned long)image->head

--

The kernel compiled with these changes restores kexec functionality on
the machine I'm trying it on:

ASUS F5RL Core(TM)2 Duo CPU T5450 @ 1.66GHz

on Debian stable 9.3 32 bit. The loading command I use:

kexec [-l|-p] /boot/dump/vmlinuz-4.14.8-dump 
--initrd=/boot/dump/initrd.img-4.14.8-dump --append="root=/dev/sda1 1 irqpoll 
nr_cpus=1 reset_devices"

The nr_cpus=1 is a remnant I left in there; the dump kernel is an
SMP-disabled version of the latest stable one (4.14.8).

Is this expected behaviour?

The issue emerged while reporting a CPU lockup in another email
thread; as this seems different, I figured it wouldn't hurt to send
out a separate message.


Thank you.

PROBLEM: consolidated IDT invalidation causes kexec to reboot

2017-12-23 Thread Alexandru Chirvasitu

Short description: loading a crash kernel with (a) kexec -l [..] or
(b) kexec -p [..] and then testing it with (a) kexec -e or (b) echo c
> /proc/sysrq-trigger results in a regular reboot (going through BIOS,
etc.).

The commit that starts exhibiting this behaviour for me is

e802a51: x86/idt: Consolidate IDT invalidation

with its parent 8f55868 behaving normally (in scenarios (a) and (b)
alike; (b) produces /proc/vmcore, etc.).

For testing purposes, I've altered machine_kexec_32.c making the
following toy commit. It naively undoes part of e802a51, solely to
confirm that's where it goes awry in my setup.


--

machine_kexec calls set_idt instead of idt_invalidate for testing purposes

diff --git a/arch/x86/kernel/machine_kexec_32.c 
b/arch/x86/kernel/machine_kexec_32.c
index 00bc751..70f7d05 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -26,6 +26,19 @@
 #include 
 #include 
 
+
+
+static void set_idt(void *newidt, __u16 limit)
+{
+   struct desc_ptr curidt;
+
+   /* ia32 supports unaliged loads & stores */
+   curidt.size= limit;
+   curidt.address = (unsigned long)newidt;
+
+   load_idt();
+}
+
 static void set_gdt(void *newgdt, __u16 limit)
 {
struct desc_ptr curgdt;
@@ -233,7 +246,7 @@ void machine_kexec(struct kimage *image)
 * If you want to load them you must set up your own idt & gdt.
 */
set_gdt(phys_to_virt(0), 0);
-   idt_invalidate(phys_to_virt(0));
+   set_idt(phys_to_virt(0), 0);
 
/* now call it */
image->start = relocate_kernel_ptr((unsigned long)image->head

--

The kernel compiled with these changes restores kexec functionality on
the machine I'm trying it on:

ASUS F5RL Core(TM)2 Duo CPU T5450 @ 1.66GHz

on Debian stable 9.3 32 bit. The loading command I use:

kexec [-l|-p] /boot/dump/vmlinuz-4.14.8-dump 
--initrd=/boot/dump/initrd.img-4.14.8-dump --append="root=/dev/sda1 1 irqpoll 
nr_cpus=1 reset_devices"

The nr_cpus=1 is a remnant I left in there; the dump kernel is an
SMP-disabled version of the latest stable one (4.14.8).

Is this expected behaviour?

The issue emerged while reporting a CPU lockup in another email
thread; as this seems different, I figured it wouldn't hurt to send
out a separate message.


Thank you.

Re: [PATCH 4.4 00/78] 4.4.108-stable review

2017-12-23 Thread Guenter Roeck


On 12/22/2017 12:45 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.4.108 release.
There are 78 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Dec 24 08:45:30 UTC 2017.
Anything received after that time might be too late.



For v4.4.107-79-g0989207:

Build results:
total: 145 pass: 144 fail: 1
Failed builds:
alpha:allmodconfig
Qemu test results:
total: 118 pass: 118 fail: 0

The alpha fix is missing from 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
branch linux-4.4.y; it points to the same sha.

Guenter

Re: [PATCH 4.4 00/78] 4.4.108-stable review

2017-12-23 Thread Guenter Roeck


On 12/22/2017 12:45 AM, Greg Kroah-Hartman wrote:

This is the start of the stable review cycle for the 4.4.108 release.
There are 78 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sun Dec 24 08:45:30 UTC 2017.
Anything received after that time might be too late.



For v4.4.107-79-g0989207:

Build results:
total: 145 pass: 144 fail: 1
Failed builds:
alpha:allmodconfig
Qemu test results:
total: 118 pass: 118 fail: 0

The alpha fix is missing from 
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
branch linux-4.4.y; it points to the same sha.

Guenter

Re: [patch 0/4] timer/nohz: Fix timer/nohz woes

2017-12-23 Thread Paul E. McKenney

On Sat, Dec 23, 2017 at 05:21:20PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 22, 2017 at 09:09:07AM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 22, 2017 at 03:51:11PM +0100, Thomas Gleixner wrote:
> > > Paul was observing weird stalls which are hard to reproduce and decode. We
> > > were finally able to reproduce and decode the wreckage on RT.
> > > 
> > > The following series addresses the issues and hopefully nails the root
> > > cause completely.
> > > 
> > > Please review carefully and expose it to the dreaded rcu torture tests
> > > which seem to be the only way to trigger it.
> > 
> > Best Christmas present ever, thank you!!!
> > 
> > Just started up three concurrent 10-hour runs of the infamous rcutorture
> > TREE01 scenario, and will let you know how it goes!
> 
> Well, I messed up the first test and then reran it.  Which had the benefit
> of giving me a baseline.  The rerun (with all four patches) produced
> failures, so I ran it again with an additional patch of mine.  I score
> these tests by recording the time at first failure, or, if there is no
> failure, the duration of the test.  Summing the values gives the score.
> And here are the scores, where 30 is a perfect score:

Sigh.  They were five-hour tests, not ten-hour tests.  

1.  Baseline: 3.0+2.5+5=10.5

2.  Four patches from Anna-Marie and Thomas: 5+2.7+1.7=9.4

3.  Ditto plus the patch below: 5+4.3+5=14.3

Oh, and the reason for my suspecting that #2 is actually an improvement
over #1 is that my patch by itself produced a very small improvement in
reliability.  This leads to the hypothesis that #2 really is helping out
in some way or another.

Thanx, Paul

> 1.Baseline: 3.0+2.5+10=15.5
> 
> 2.Four patches from Anna-Marie and Thomas: 10+2.7+1.7=14.4
> 
> 3.Ditto plus the patch below: 10+4.3+10=24.3
> 
> Please note that these are nowhere near anything even resembling
> statistical significance.  However, they are encouraging.  I will do
> more runs, but also do shorter five-hour runs to increase the amount
> of data per unit time.  Please note also that my patch by itself never
> did provide that great of an improvement, so there might be some sort
> of combination effect going on here.  Or maybe it is just luck, who knows?
> 
> Please note that I have not yet ported my diagnostic patches on top of
> these, however, the stacks have the usual schedule_timeout() entries.
> This is not too surprising from a software-engineering viewpoint:
> Locating several bugs at a given point of time usually indicates that
> there are more to be found.  So in a sense we are lucky that the
> same test triggers at least one of those additional bugs.
> 
>   Thanx, Paul
> 
> 
> 
> commit accb0edb85526a05b934eac49658d05ea0216fc4
> Author: Paul E. McKenney 
> Date:   Thu Dec 7 13:18:44 2017 -0800
> 
> timers: Ensure that timer_base ->clk accounts for time offline
> 
> The timer_base ->must_forward_clk is set to indicate that the next timer
> operation on that timer_base must check for passage of time.  One instance
> of time passage is when the timer wheel goes idle, and another is when
> the corresponding CPU is offline.  Note that it is not appropriate to set
> ->is_idle because that could result in IPIing an offline CPU.  Therefore,
> this commit instead sets ->must_forward_clk at CPU-offline time.
> 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index ffebcf878fba..94cce780c574 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1875,6 +1875,7 @@ int timers_dead_cpu(unsigned int cpu)
>  
>   BUG_ON(old_base->running_timer);
>  
> + old_base->must_forward_clk = true;
>   for (i = 0; i < WHEEL_SIZE; i++)
>   migrate_timer_list(new_base, old_base->vectors + i);
>

Re: [patch 0/4] timer/nohz: Fix timer/nohz woes

2017-12-23 Thread Paul E. McKenney

On Sat, Dec 23, 2017 at 05:21:20PM -0800, Paul E. McKenney wrote:
> On Fri, Dec 22, 2017 at 09:09:07AM -0800, Paul E. McKenney wrote:
> > On Fri, Dec 22, 2017 at 03:51:11PM +0100, Thomas Gleixner wrote:
> > > Paul was observing weird stalls which are hard to reproduce and decode. We
> > > were finally able to reproduce and decode the wreckage on RT.
> > > 
> > > The following series addresses the issues and hopefully nails the root
> > > cause completely.
> > > 
> > > Please review carefully and expose it to the dreaded rcu torture tests
> > > which seem to be the only way to trigger it.
> > 
> > Best Christmas present ever, thank you!!!
> > 
> > Just started up three concurrent 10-hour runs of the infamous rcutorture
> > TREE01 scenario, and will let you know how it goes!
> 
> Well, I messed up the first test and then reran it.  Which had the benefit
> of giving me a baseline.  The rerun (with all four patches) produced
> failures, so I ran it again with an additional patch of mine.  I score
> these tests by recording the time at first failure, or, if there is no
> failure, the duration of the test.  Summing the values gives the score.
> And here are the scores, where 30 is a perfect score:

Sigh.  They were five-hour tests, not ten-hour tests.  

1.  Baseline: 3.0+2.5+5=10.5

2.  Four patches from Anna-Marie and Thomas: 5+2.7+1.7=9.4

3.  Ditto plus the patch below: 5+4.3+5=14.3

Oh, and the reason for my suspecting that #2 is actually an improvement
over #1 is that my patch by itself produced a very small improvement in
reliability.  This leads to the hypothesis that #2 really is helping out
in some way or another.

Thanx, Paul

> 1.Baseline: 3.0+2.5+10=15.5
> 
> 2.Four patches from Anna-Marie and Thomas: 10+2.7+1.7=14.4
> 
> 3.Ditto plus the patch below: 10+4.3+10=24.3
> 
> Please note that these are nowhere near anything even resembling
> statistical significance.  However, they are encouraging.  I will do
> more runs, but also do shorter five-hour runs to increase the amount
> of data per unit time.  Please note also that my patch by itself never
> did provide that great of an improvement, so there might be some sort
> of combination effect going on here.  Or maybe it is just luck, who knows?
> 
> Please note that I have not yet ported my diagnostic patches on top of
> these, however, the stacks have the usual schedule_timeout() entries.
> This is not too surprising from a software-engineering viewpoint:
> Locating several bugs at a given point of time usually indicates that
> there are more to be found.  So in a sense we are lucky that the
> same test triggers at least one of those additional bugs.
> 
>   Thanx, Paul
> 
> 
> 
> commit accb0edb85526a05b934eac49658d05ea0216fc4
> Author: Paul E. McKenney 
> Date:   Thu Dec 7 13:18:44 2017 -0800
> 
> timers: Ensure that timer_base ->clk accounts for time offline
> 
> The timer_base ->must_forward_clk is set to indicate that the next timer
> operation on that timer_base must check for passage of time.  One instance
> of time passage is when the timer wheel goes idle, and another is when
> the corresponding CPU is offline.  Note that it is not appropriate to set
> ->is_idle because that could result in IPIing an offline CPU.  Therefore,
> this commit instead sets ->must_forward_clk at CPU-offline time.
> 
> Signed-off-by: Paul E. McKenney 
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index ffebcf878fba..94cce780c574 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1875,6 +1875,7 @@ int timers_dead_cpu(unsigned int cpu)
>  
>   BUG_ON(old_base->running_timer);
>  
> + old_base->must_forward_clk = true;
>   for (i = 0; i < WHEEL_SIZE; i++)
>   migrate_timer_list(new_base, old_base->vectors + i);
>

Re: linux/master crashes on boot with KASAN=y

2017-12-23 Thread Jakub Kicinski

On Sat, 23 Dec 2017 15:41:27 +0300, Andrey Ryabinin wrote:
> On 12/23/2017 11:01 AM, Jakub Kicinski wrote:
> > Hi!
> > 
> > I bisected a crash on boot to this:
> > 
> > commit 21506525fb8ddb0342f2a2370812d47f6a1f3833 (HEAD, refs/bisect/bad)
> > Author: Andy Lutomirski 
> > Date:   Mon Dec 4 15:07:16 2017 +0100
> > 
> > x86/kasan/64: Teach KASAN about the cpu_entry_area  
> 
> 
> Thanks.
> There is nothing wrong with this patch, it just uncovered older bug.
> The actual problem comes from f06bdd4001c2 ("x86/mm: Adapt MODULES_END based 
> on fixmap section size")
> which is made kasan_mem_to_shadow(MODULES_END) potentially unaligned to page 
> boundary.
> And when we feed unaligned address to kasan_populate_zero_shadow() it doesn't 
> do the right thing.
> 
> Could you tell me if the fix bellow works for you?

Works for me, thank you!

Tested-by: Jakub Kicinski 

>  arch/x86/include/asm/kasan.h| 8 
>  arch/x86/include/asm/pgtable_64_types.h | 4 +++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kasan.h b/arch/x86/include/asm/kasan.h
> index b577dd0916aa..0c580e4b2ccc 100644
> --- a/arch/x86/include/asm/kasan.h
> +++ b/arch/x86/include/asm/kasan.h
> @@ -5,6 +5,14 @@
>  #include 
>  #define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>  
> +#ifndef KASAN_SHADOW_SCALE_SHIFT
> +# ifdef CONFIG_KASAN
> +#  define KASAN_SHADOW_SCALE_SHIFT 3
> +# else
> +#  define KASAN_SHADOW_SCALE_SHIFT 0
> +# endif
> +#endif
> +
>  /*
>   * Compiler uses shadow offset assuming that addresses start
>   * from 0. Kernel addresses don't start from 0, so shadow
> diff --git a/arch/x86/include/asm/pgtable_64_types.h 
> b/arch/x86/include/asm/pgtable_64_types.h
> index 6d5f45dcd4a1..d34a90d6c374 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -6,6 +6,7 @@
>  
>  #ifndef __ASSEMBLY__
>  #include 
> +#include 
>  #include 
>  
>  /*
> @@ -96,7 +97,8 @@ typedef struct { pteval_t pte; } pte_t;
>  #define VMALLOC_END  (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
>  #define MODULES_VADDR(__START_KERNEL_map + KERNEL_IMAGE_SIZE)
>  /* The module sections ends with the start of the fixmap */
> -#define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
> +#define MODULES_END   (__fix_to_virt(__end_of_fixed_addresses + 1) & \
> + ~((PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT) - 1))
>  #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
>  #define ESPFIX_PGD_ENTRY _AC(-2, UL)
>  #define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT)

Re: linux/master crashes on boot with KASAN=y

2017-12-23 Thread Jakub Kicinski

On Sat, 23 Dec 2017 15:41:27 +0300, Andrey Ryabinin wrote:
> On 12/23/2017 11:01 AM, Jakub Kicinski wrote:
> > Hi!
> > 
> > I bisected a crash on boot to this:
> > 
> > commit 21506525fb8ddb0342f2a2370812d47f6a1f3833 (HEAD, refs/bisect/bad)
> > Author: Andy Lutomirski 
> > Date:   Mon Dec 4 15:07:16 2017 +0100
> > 
> > x86/kasan/64: Teach KASAN about the cpu_entry_area  
> 
> 
> Thanks.
> There is nothing wrong with this patch, it just uncovered older bug.
> The actual problem comes from f06bdd4001c2 ("x86/mm: Adapt MODULES_END based 
> on fixmap section size")
> which is made kasan_mem_to_shadow(MODULES_END) potentially unaligned to page 
> boundary.
> And when we feed unaligned address to kasan_populate_zero_shadow() it doesn't 
> do the right thing.
> 
> Could you tell me if the fix bellow works for you?

Works for me, thank you!

Tested-by: Jakub Kicinski 

>  arch/x86/include/asm/kasan.h| 8 
>  arch/x86/include/asm/pgtable_64_types.h | 4 +++-
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kasan.h b/arch/x86/include/asm/kasan.h
> index b577dd0916aa..0c580e4b2ccc 100644
> --- a/arch/x86/include/asm/kasan.h
> +++ b/arch/x86/include/asm/kasan.h
> @@ -5,6 +5,14 @@
>  #include 
>  #define KASAN_SHADOW_OFFSET _AC(CONFIG_KASAN_SHADOW_OFFSET, UL)
>  
> +#ifndef KASAN_SHADOW_SCALE_SHIFT
> +# ifdef CONFIG_KASAN
> +#  define KASAN_SHADOW_SCALE_SHIFT 3
> +# else
> +#  define KASAN_SHADOW_SCALE_SHIFT 0
> +# endif
> +#endif
> +
>  /*
>   * Compiler uses shadow offset assuming that addresses start
>   * from 0. Kernel addresses don't start from 0, so shadow
> diff --git a/arch/x86/include/asm/pgtable_64_types.h 
> b/arch/x86/include/asm/pgtable_64_types.h
> index 6d5f45dcd4a1..d34a90d6c374 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -6,6 +6,7 @@
>  
>  #ifndef __ASSEMBLY__
>  #include 
> +#include 
>  #include 
>  
>  /*
> @@ -96,7 +97,8 @@ typedef struct { pteval_t pte; } pte_t;
>  #define VMALLOC_END  (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL))
>  #define MODULES_VADDR(__START_KERNEL_map + KERNEL_IMAGE_SIZE)
>  /* The module sections ends with the start of the fixmap */
> -#define MODULES_END   __fix_to_virt(__end_of_fixed_addresses + 1)
> +#define MODULES_END   (__fix_to_virt(__end_of_fixed_addresses + 1) & \
> + ~((PAGE_SIZE << KASAN_SHADOW_SCALE_SHIFT) - 1))
>  #define MODULES_LEN   (MODULES_END - MODULES_VADDR)
>  #define ESPFIX_PGD_ENTRY _AC(-2, UL)
>  #define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT)

Re: [patch 0/4] timer/nohz: Fix timer/nohz woes

2017-12-23 Thread Paul E. McKenney

On Fri, Dec 22, 2017 at 09:09:07AM -0800, Paul E. McKenney wrote:
> On Fri, Dec 22, 2017 at 03:51:11PM +0100, Thomas Gleixner wrote:
> > Paul was observing weird stalls which are hard to reproduce and decode. We
> > were finally able to reproduce and decode the wreckage on RT.
> > 
> > The following series addresses the issues and hopefully nails the root
> > cause completely.
> > 
> > Please review carefully and expose it to the dreaded rcu torture tests
> > which seem to be the only way to trigger it.
> 
> Best Christmas present ever, thank you!!!
> 
> Just started up three concurrent 10-hour runs of the infamous rcutorture
> TREE01 scenario, and will let you know how it goes!

Well, I messed up the first test and then reran it.  Which had the benefit
of giving me a baseline.  The rerun (with all four patches) produced
failures, so I ran it again with an additional patch of mine.  I score
these tests by recording the time at first failure, or, if there is no
failure, the duration of the test.  Summing the values gives the score.
And here are the scores, where 30 is a perfect score:

1.  Baseline: 3.0+2.5+10=15.5

2.  Four patches from Anna-Marie and Thomas: 10+2.7+1.7=14.4

3.  Ditto plus the patch below: 10+4.3+10=24.3

Please note that these are nowhere near anything even resembling
statistical significance.  However, they are encouraging.  I will do
more runs, but also do shorter five-hour runs to increase the amount
of data per unit time.  Please note also that my patch by itself never
did provide that great of an improvement, so there might be some sort
of combination effect going on here.  Or maybe it is just luck, who knows?

Please note that I have not yet ported my diagnostic patches on top of
these, however, the stacks have the usual schedule_timeout() entries.
This is not too surprising from a software-engineering viewpoint:
Locating several bugs at a given point of time usually indicates that
there are more to be found.  So in a sense we are lucky that the
same test triggers at least one of those additional bugs.

Thanx, Paul

commit accb0edb85526a05b934eac49658d05ea0216fc4
Author: Paul E. McKenney 
Date:   Thu Dec 7 13:18:44 2017 -0800

timers: Ensure that timer_base ->clk accounts for time offline

The timer_base ->must_forward_clk is set to indicate that the next timer
operation on that timer_base must check for passage of time.  One instance
of time passage is when the timer wheel goes idle, and another is when
the corresponding CPU is offline.  Note that it is not appropriate to set
->is_idle because that could result in IPIing an offline CPU.  Therefore,
this commit instead sets ->must_forward_clk at CPU-offline time.

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ffebcf878fba..94cce780c574 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1875,6 +1875,7 @@ int timers_dead_cpu(unsigned int cpu)

BUG_ON(old_base->running_timer);

+   old_base->must_forward_clk = true;
for (i = 0; i < WHEEL_SIZE; i++)
migrate_timer_list(new_base, old_base->vectors + i);

Re: [patch 0/4] timer/nohz: Fix timer/nohz woes

2017-12-23 Thread Paul E. McKenney

On Fri, Dec 22, 2017 at 09:09:07AM -0800, Paul E. McKenney wrote:
> On Fri, Dec 22, 2017 at 03:51:11PM +0100, Thomas Gleixner wrote:
> > Paul was observing weird stalls which are hard to reproduce and decode. We
> > were finally able to reproduce and decode the wreckage on RT.
> > 
> > The following series addresses the issues and hopefully nails the root
> > cause completely.
> > 
> > Please review carefully and expose it to the dreaded rcu torture tests
> > which seem to be the only way to trigger it.
> 
> Best Christmas present ever, thank you!!!
> 
> Just started up three concurrent 10-hour runs of the infamous rcutorture
> TREE01 scenario, and will let you know how it goes!

Well, I messed up the first test and then reran it.  Which had the benefit
of giving me a baseline.  The rerun (with all four patches) produced
failures, so I ran it again with an additional patch of mine.  I score
these tests by recording the time at first failure, or, if there is no
failure, the duration of the test.  Summing the values gives the score.
And here are the scores, where 30 is a perfect score:

1.  Baseline: 3.0+2.5+10=15.5

2.  Four patches from Anna-Marie and Thomas: 10+2.7+1.7=14.4

3.  Ditto plus the patch below: 10+4.3+10=24.3

Please note that these are nowhere near anything even resembling
statistical significance.  However, they are encouraging.  I will do
more runs, but also do shorter five-hour runs to increase the amount
of data per unit time.  Please note also that my patch by itself never
did provide that great of an improvement, so there might be some sort
of combination effect going on here.  Or maybe it is just luck, who knows?

Please note that I have not yet ported my diagnostic patches on top of
these, however, the stacks have the usual schedule_timeout() entries.
This is not too surprising from a software-engineering viewpoint:
Locating several bugs at a given point of time usually indicates that
there are more to be found.  So in a sense we are lucky that the
same test triggers at least one of those additional bugs.

Thanx, Paul

commit accb0edb85526a05b934eac49658d05ea0216fc4
Author: Paul E. McKenney 
Date:   Thu Dec 7 13:18:44 2017 -0800

timers: Ensure that timer_base ->clk accounts for time offline

The timer_base ->must_forward_clk is set to indicate that the next timer
operation on that timer_base must check for passage of time.  One instance
of time passage is when the timer wheel goes idle, and another is when
the corresponding CPU is offline.  Note that it is not appropriate to set
->is_idle because that could result in IPIing an offline CPU.  Therefore,
this commit instead sets ->must_forward_clk at CPU-offline time.

Signed-off-by: Paul E. McKenney 

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ffebcf878fba..94cce780c574 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1875,6 +1875,7 @@ int timers_dead_cpu(unsigned int cpu)

BUG_ON(old_base->running_timer);

+   old_base->must_forward_clk = true;
for (i = 0; i < WHEEL_SIZE; i++)
migrate_timer_list(new_base, old_base->vectors + i);

Legitimate !!!

2017-12-23 Thread Chen

Hi linux-kernel@vger.kernel.org  ,

   Can work to-gether legally  ? . IF YES for more info email ( 
chen.ya...@yandex.com  )

Legitimate !!!

2017-12-23 Thread Chen

Hi linux-kernel@vger.kernel.org  ,

   Can work to-gether legally  ? . IF YES for more info email ( 
chen.ya...@yandex.com  )

[linus:master] BUILD REGRESSION d1f854ac240ea3928a99294390048e9b2aa6fa0e

2017-12-23 Thread kbuild test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
 master
branch HEAD: d1f854ac240ea3928a99294390048e9b2aa6fa0e  Merge branch 
'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Regressions in current branch:

arch/c6x/platforms/plldata.c:279:33: error: implicit declaration of function 
'get_coreid'; did you mean 'get_order'? [-Werror=implicit-function-declaration]
drivers/tty/serial/8250/8250_core.c:1094:1: error: unrecognizable insn:
drivers/tty/serial/8250/8250_core.c:1094:1: internal compiler error: in 
extract_insn, at recog.c:2311
fs//xfs/xfs_ioctl.c:1624:1: internal compiler error: in change_address_1, at 
emit-rtl.c:2150
fs/xfs/xfs_ioctl.c:1629:1: internal compiler error: in change_address_1, at 
emit-rtl.c:2150
Please submit a full bug report,
{standard input}:1226: Error: displacement to undefined symbol .L329 overflows 
12-bit field
{standard input}:1233: Error: displacement to undefined symbol .L331 overflows 
12-bit field
{standard input}:1253: Error: displacement to undefined symbol .L359 overflows 
12-bit field
{standard input}:1278: Error: displacement to undefined symbol .L360 overflows 
12-bit field
{standard input}:1405: Error: displacement to undefined symbol .L255 overflows 
12-bit field
{standard input}:1408: Error: invalid operands for opcode
{standard input}:1408: Error: missing operand
{standard input}:1453: Error: displacement to undefined symbol .L285 overflows 
12-bit field
{standard input}:1457: Error: displacement to undefined symbol .L286 overflows 
12-bit field
{standard input}:1467: Error: displacement to undefined symbol .L257 overflows 
12-bit field
{standard input}:1893: Error: displacement to undefined symbol .L229 overflows 
12-bit field
{standard input}:199: Error: unknown opcode
{standard input}:2013: Error: displacement to undefined symbol .L235 overflows 
12-bit field
{standard input}:9613: Error: invalid operands for opcode
{standard input}:9613: Error: missing operand
{standard input}: Error: open CFI at the end of file; missing .cfi_endproc 
directive
verifier.c:(.text+0x31ec): undefined reference to `__multi3'

Error ids grouped by kconfigs:

recent_errors
├── c6x-evmc6472_defconfig
│   └── 
arch-c6x-platforms-plldata.c:error:implicit-declaration-of-function-get_coreid-did-you-mean-get_order
├── cris-allyesconfig
│   ├── drivers-tty-serial-8250_core.c:error:unrecognizable-insn:
│   └── 
drivers-tty-serial-8250_core.c:internal-compiler-error:in-extract_insn-at-recog.c
├── mips-64r6el_defconfig
│   └── verifier.c:(.text):undefined-reference-to-__multi3
├── sh-allyesconfig
│   ├── 
fs-xfs-xfs_ioctl.c:internal-compiler-error:in-change_address_1-at-emit-rtl.c
│   ├── Please-submit-a-full-bug-report
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L229-overflows-bit-field
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L235-overflows-bit-field
│   ├── standard-input:Error:invalid-operands-for-opcode
│   ├── standard-input:Error:missing-operand
│   └── 
standard-input:Error:open-CFI-at-the-end-of-file-missing-.cfi_endproc-directive
├── sh-j2_defconfig
│   └── standard-input:Error:unknown-opcode
├── sh-sdk7786_defconfig
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L255-overflows-bit-field
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L257-overflows-bit-field
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L285-overflows-bit-field
│   └── 
standard-input:Error:displacement-to-undefined-symbol-.L286-overflows-bit-field
└── sh-titan_defconfig
├── 
fs-xfs-xfs_ioctl.c:internal-compiler-error:in-change_address_1-at-emit-rtl.c
├── Please-submit-a-full-bug-report
├── 
standard-input:Error:displacement-to-undefined-symbol-.L329-overflows-bit-field
├── 
standard-input:Error:displacement-to-undefined-symbol-.L331-overflows-bit-field
├── 
standard-input:Error:displacement-to-undefined-symbol-.L359-overflows-bit-field
├── 
standard-input:Error:displacement-to-undefined-symbol-.L360-overflows-bit-field
├── standard-input:Error:invalid-operands-for-opcode
└── standard-input:Error:missing-operand

elapsed time: 155m

configs tested: 144

x86_64 acpi-redef
x86_64   allyesdebian
x86_64nfsroot
i386   tinyconfig
i386   randconfig-x016-201752
i386   randconfig-x011-201752
i386   randconfig-x014-201752
i386   randconfig-x017-201752
i386   randconfig-x019-201752
i386   randconfig-x018-201752
i386   randconfig-x010-201752
i386   randconfig-x013-201752
i386   randconfig-x012-201752
i386   randconfig-x015-201752
i386 randconfig-n0-201752
x86_64 randconfig-x003-201752
x86_64 randconfig-x002-201752

[linus:master] BUILD REGRESSION d1f854ac240ea3928a99294390048e9b2aa6fa0e

2017-12-23 Thread kbuild test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
 master
branch HEAD: d1f854ac240ea3928a99294390048e9b2aa6fa0e  Merge branch 
'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Regressions in current branch:

arch/c6x/platforms/plldata.c:279:33: error: implicit declaration of function 
'get_coreid'; did you mean 'get_order'? [-Werror=implicit-function-declaration]
drivers/tty/serial/8250/8250_core.c:1094:1: error: unrecognizable insn:
drivers/tty/serial/8250/8250_core.c:1094:1: internal compiler error: in 
extract_insn, at recog.c:2311
fs//xfs/xfs_ioctl.c:1624:1: internal compiler error: in change_address_1, at 
emit-rtl.c:2150
fs/xfs/xfs_ioctl.c:1629:1: internal compiler error: in change_address_1, at 
emit-rtl.c:2150
Please submit a full bug report,
{standard input}:1226: Error: displacement to undefined symbol .L329 overflows 
12-bit field
{standard input}:1233: Error: displacement to undefined symbol .L331 overflows 
12-bit field
{standard input}:1253: Error: displacement to undefined symbol .L359 overflows 
12-bit field
{standard input}:1278: Error: displacement to undefined symbol .L360 overflows 
12-bit field
{standard input}:1405: Error: displacement to undefined symbol .L255 overflows 
12-bit field
{standard input}:1408: Error: invalid operands for opcode
{standard input}:1408: Error: missing operand
{standard input}:1453: Error: displacement to undefined symbol .L285 overflows 
12-bit field
{standard input}:1457: Error: displacement to undefined symbol .L286 overflows 
12-bit field
{standard input}:1467: Error: displacement to undefined symbol .L257 overflows 
12-bit field
{standard input}:1893: Error: displacement to undefined symbol .L229 overflows 
12-bit field
{standard input}:199: Error: unknown opcode
{standard input}:2013: Error: displacement to undefined symbol .L235 overflows 
12-bit field
{standard input}:9613: Error: invalid operands for opcode
{standard input}:9613: Error: missing operand
{standard input}: Error: open CFI at the end of file; missing .cfi_endproc 
directive
verifier.c:(.text+0x31ec): undefined reference to `__multi3'

Error ids grouped by kconfigs:

recent_errors
├── c6x-evmc6472_defconfig
│   └── 
arch-c6x-platforms-plldata.c:error:implicit-declaration-of-function-get_coreid-did-you-mean-get_order
├── cris-allyesconfig
│   ├── drivers-tty-serial-8250_core.c:error:unrecognizable-insn:
│   └── 
drivers-tty-serial-8250_core.c:internal-compiler-error:in-extract_insn-at-recog.c
├── mips-64r6el_defconfig
│   └── verifier.c:(.text):undefined-reference-to-__multi3
├── sh-allyesconfig
│   ├── 
fs-xfs-xfs_ioctl.c:internal-compiler-error:in-change_address_1-at-emit-rtl.c
│   ├── Please-submit-a-full-bug-report
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L229-overflows-bit-field
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L235-overflows-bit-field
│   ├── standard-input:Error:invalid-operands-for-opcode
│   ├── standard-input:Error:missing-operand
│   └── 
standard-input:Error:open-CFI-at-the-end-of-file-missing-.cfi_endproc-directive
├── sh-j2_defconfig
│   └── standard-input:Error:unknown-opcode
├── sh-sdk7786_defconfig
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L255-overflows-bit-field
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L257-overflows-bit-field
│   ├── 
standard-input:Error:displacement-to-undefined-symbol-.L285-overflows-bit-field
│   └── 
standard-input:Error:displacement-to-undefined-symbol-.L286-overflows-bit-field
└── sh-titan_defconfig
├── 
fs-xfs-xfs_ioctl.c:internal-compiler-error:in-change_address_1-at-emit-rtl.c
├── Please-submit-a-full-bug-report
├── 
standard-input:Error:displacement-to-undefined-symbol-.L329-overflows-bit-field
├── 
standard-input:Error:displacement-to-undefined-symbol-.L331-overflows-bit-field
├── 
standard-input:Error:displacement-to-undefined-symbol-.L359-overflows-bit-field
├── 
standard-input:Error:displacement-to-undefined-symbol-.L360-overflows-bit-field
├── standard-input:Error:invalid-operands-for-opcode
└── standard-input:Error:missing-operand

elapsed time: 155m

configs tested: 144

x86_64 acpi-redef
x86_64   allyesdebian
x86_64nfsroot
i386   tinyconfig
i386   randconfig-x016-201752
i386   randconfig-x011-201752
i386   randconfig-x014-201752
i386   randconfig-x017-201752
i386   randconfig-x019-201752
i386   randconfig-x018-201752
i386   randconfig-x010-201752
i386   randconfig-x013-201752
i386   randconfig-x012-201752
i386   randconfig-x015-201752
i386 randconfig-n0-201752
x86_64 randconfig-x003-201752
x86_64 randconfig-x002-201752

Re: [PATCH v5 3/6] cx25840: add pin to pad mapping and output format configuration

2017-12-23 Thread Maciej S. Szmigiero

On 23.12.2017 15:08, Philippe Ombredanne wrote:
> On Sat, Dec 23, 2017 at 12:18 AM, Maciej S. Szmigiero
>  wrote:
>> This commit adds pin to pad mapping and output format configuration support
>> in CX2584x-series chips to cx25840 driver.
>>
>> This functionality is then used to allow disabling ivtv-specific hacks
>> (called a "generic mode"), so cx25840 driver can be used for other devices
>> not needing them without risking compatibility problems.
>>
>> Signed-off-by: Maciej S. Szmigiero 
>> ---
>>  drivers/media/i2c/cx25840/cx25840-core.c | 396 
>> ++-
>>  drivers/media/i2c/cx25840/cx25840-core.h |  13 +
>>  drivers/media/i2c/cx25840/cx25840-vbi.c  |   3 +
>>  include/media/drv-intf/cx25840.h |  74 +-
>>  4 files changed, 484 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/media/i2c/cx25840/cx25840-core.c 
>> b/drivers/media/i2c/cx25840/cx25840-core.c
>> index 2189980a0f29..e2fd33a64550 100644
>> --- a/drivers/media/i2c/cx25840/cx25840-core.c
>> +++ b/drivers/media/i2c/cx25840/cx25840-core.c
>> @@ -21,6 +21,9 @@
>>   * CX23888 DIF support for the HVR1850
>>   * Copyright (C) 2011 Steven Toth 
>>   *
>> + * CX2584x pin to pad mapping and output format configuration support are
>> + * Copyright (C) 2011 Maciej S. Szmigiero 
>> + *
>>   * This program is free software; you can redistribute it and/or
>>   * modify it under the terms of the GNU General Public License
>>   * as published by the Free Software Foundation; either version 2
> 
> Since you are touching the copyright here, I wonder if you could reach
> out to other copyright holders and switch to using an SPDX tag
> instead?
> 

Well, I'm really just adding two functionalities to the cx25840 driver,
which consist of 6 additional files besides "cx25840-core.c".

All this code is mostly written by other people, in addition to this
there is also a note at the top of this file that the driver is
"[b]ased on the saa7115 driver and on the first version of Chris
Kennedy's cx25840 driver".

Since of 119 *.c files in drivers/media/i2c/ directory only 3 already
have SPDX tags the rest will (probably) be amended with this tag anyway,
and that's when the cx25840 driver can also be tagged, using an
appropriate process for this (IANAL).

> -- 
> Cordially
> Philippe Ombredanne

Best regards,
Maciej Szmigiero

Re: [PATCH v5 3/6] cx25840: add pin to pad mapping and output format configuration

2017-12-23 Thread Maciej S. Szmigiero

On 23.12.2017 15:08, Philippe Ombredanne wrote:
> On Sat, Dec 23, 2017 at 12:18 AM, Maciej S. Szmigiero
>  wrote:
>> This commit adds pin to pad mapping and output format configuration support
>> in CX2584x-series chips to cx25840 driver.
>>
>> This functionality is then used to allow disabling ivtv-specific hacks
>> (called a "generic mode"), so cx25840 driver can be used for other devices
>> not needing them without risking compatibility problems.
>>
>> Signed-off-by: Maciej S. Szmigiero 
>> ---
>>  drivers/media/i2c/cx25840/cx25840-core.c | 396 
>> ++-
>>  drivers/media/i2c/cx25840/cx25840-core.h |  13 +
>>  drivers/media/i2c/cx25840/cx25840-vbi.c  |   3 +
>>  include/media/drv-intf/cx25840.h |  74 +-
>>  4 files changed, 484 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/media/i2c/cx25840/cx25840-core.c 
>> b/drivers/media/i2c/cx25840/cx25840-core.c
>> index 2189980a0f29..e2fd33a64550 100644
>> --- a/drivers/media/i2c/cx25840/cx25840-core.c
>> +++ b/drivers/media/i2c/cx25840/cx25840-core.c
>> @@ -21,6 +21,9 @@
>>   * CX23888 DIF support for the HVR1850
>>   * Copyright (C) 2011 Steven Toth 
>>   *
>> + * CX2584x pin to pad mapping and output format configuration support are
>> + * Copyright (C) 2011 Maciej S. Szmigiero 
>> + *
>>   * This program is free software; you can redistribute it and/or
>>   * modify it under the terms of the GNU General Public License
>>   * as published by the Free Software Foundation; either version 2
> 
> Since you are touching the copyright here, I wonder if you could reach
> out to other copyright holders and switch to using an SPDX tag
> instead?
> 

Well, I'm really just adding two functionalities to the cx25840 driver,
which consist of 6 additional files besides "cx25840-core.c".

All this code is mostly written by other people, in addition to this
there is also a note at the top of this file that the driver is
"[b]ased on the saa7115 driver and on the first version of Chris
Kennedy's cx25840 driver".

Since of 119 *.c files in drivers/media/i2c/ directory only 3 already
have SPDX tags the rest will (probably) be amended with this tag anyway,
and that's when the cx25840 driver can also be tagged, using an
appropriate process for this (IANAL).

> -- 
> Cordially
> Philippe Ombredanne

Best regards,
Maciej Szmigiero

1 2 3 4 >

1 - 100 of 384 matches

Mail list logo