date:20160212

Re: linux-next: Tree for Feb 12

2016-02-12 Thread Guenter Roeck

On Fri, Feb 12, 2016 at 11:32:42AM +0530, Sudip Mukherjee wrote:
> On Fri, Feb 12, 2016 at 04:20:35PM +1100, Stephen Rothwell wrote:
> > Hi all,
> > 
> > Changes since 20160211:
> 
> since last few days with gcc-4.6.3, x86_64 and i386 defconfig and
> allmodconfig builds are failing with the error:
> "arch/x86/include/asm/tlbflush.h:26:18: error: memory input 0 is not
> directly addressable"
> 
> But with gcc-4.8.2 and later it builds fine. Should I try to bisect and
> find the problem or its not worth for gcc-4.6.3 ?
> 
Bisect log attached. Looks like older versions of gcc (or at least 4.6.3)
don't like the newly introduced __invpcid().

Guenter

---
# bad: [64d9a3617b3b8bc0734ba97caeb433b7019c6187] Add linux-next specific files 
for 20160212
# good: [388f7b1d6e8ca06762e2454d28d6c3c55ad0fe95] Linux 4.5-rc3
git bisect start 'HEAD' 'v4.5-rc3'
# good: [597dc9d36e8bc04941b61b26ac7aa3f8a33aba53] Merge remote-tracking branch 
'sound-asoc/for-next'
git bisect good 597dc9d36e8bc04941b61b26ac7aa3f8a33aba53
# bad: [91fe8ea815243ec595753ccf7e14126b6f87f2bf] Merge remote-tracking branch 
'usb-chipidea-next/ci-for-usb-next'
git bisect bad 91fe8ea815243ec595753ccf7e14126b6f87f2bf
# bad: [1d6796e67f265e835bcb1a19d27ba0433dbd75e4] Merge remote-tracking branch 
'tip/auto-latest'
git bisect bad 1d6796e67f265e835bcb1a19d27ba0433dbd75e4
# good: [4abecd5aab4960c786530db5ce4ca332ceba2b73] Merge branch 'x86/microcode'
git bisect good 4abecd5aab4960c786530db5ce4ca332ceba2b73
# good: [84d5092d655bc9532c5fc88e7b2308090889187d] Merge remote-tracking branch 
'kgdb/kgdb-next'
git bisect good 84d5092d655bc9532c5fc88e7b2308090889187d
# good: [e4e0cfcfc9fd3885ff7dadfb5a4b553495d011e4] Merge remote-tracking branch 
'iommu/next'
git bisect good e4e0cfcfc9fd3885ff7dadfb5a4b553495d011e4
# good: [e908e75fc3833b413837aca4667db96c372c843e] Merge remote-tracking branch 
'spi/topic/ti-qspi' into spi-next
git bisect good e908e75fc3833b413837aca4667db96c372c843e
# good: [88e2211191dabe9ae2a953d5a4326d3e4b7f2901] Merge remote-tracking branch 
'trivial/for-next'
git bisect good 88e2211191dabe9ae2a953d5a4326d3e4b7f2901
# good: [d12a72b844a49d4162f24cefdab30bed3f86730e] x86/mm: Add a 'noinvpcid' 
boot option to turn off INVPCID
git bisect good d12a72b844a49d4162f24cefdab30bed3f86730e
# good: [a135746b7d4386c17290e030101d037e86f8] Merge remote-tracking branch 
'dt-rh/for-next'
git bisect good a135746b7d4386c17290e030101d037e86f8
# bad: [ce1143aa60273220a9f89012f2aaaed04f97e9a2] x86/dmi: Switch dmi_remap() 
from ioremap() [uncached] to ioremap_cache()
git bisect bad ce1143aa60273220a9f89012f2aaaed04f97e9a2
# bad: [d8bced79af1db6734f66b42064cc773cada2ce99] x86/mm: If INVPCID is 
available, use it to flush global mappings
git bisect bad d8bced79af1db6734f66b42064cc773cada2ce99
# first bad commit: [d8bced79af1db6734f66b42064cc773cada2ce99] x86/mm: If 
INVPCID is available, use it to flush global mappings

Allowing external modules to run across more configs and distros

2016-02-12 Thread David F.

While creating a linux module that should be usable across a wide
array of linux versions and builds, I've run into struct modules
(THIS_MODULE) being a problem.  It's the only internal struct accessed
as a requirement to struct block_device_operations .owner.   It's a
bit annoying for this module to be rejected for nothing it has any
interest in using.  So I was thinking of a quick solution but don't
want to waste my time if people will resist it (not sure when I'll
have time to do it, but should be able to do it quickly once i'd have
all the source local).  The idea is:

Take all #if defines out of the struct module and place them in a
separate struct (struct modconfigopts).  Place a single member
variable at the end of struct modules as void * (void *options) to
access the options.  Have a single macro to access the options member
variable for the internal code that access it (#define MOD_OPT(v)
((struct modconfigopts*)(v)).  So internal code would use
module->MOD_OPT(options)->param_lock for example.

When the module structure is created, it would initialize the options
member variable (the memory allocated (sizeof(struct
module)+sizeof(struct modconfigopts)) could be contiguous so it's like
one big structure, then cleanup would be the same).

Doing this should then allow "external" modules that don't need access
to the "internal" config options to continue to load and work across a
much greater range of Linux distributions.

What do you think?

Re: [PATCH] ARM: omapfb: Add early framebuffer memory allocator

2016-02-12 Thread Ivaylo Dimitrov


Hi Tomi,

On 11.01.2016 20:34, Tomi Valkeinen wrote:


So, I'm not very enthusiastic about adding this feature as an omapfb
specific boot parameter.



What about something like (not properly formatted, just want your 
opinion on the idea):


diff --git a/arch/arm/mach-omap2/fb.c b/arch/arm/mach-omap2/fb.c
index 1f1ecf8..0d109d8 100644
--- a/arch/arm/mach-omap2/fb.c
+++ b/arch/arm/mach-omap2/fb.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 

@@ -110,6 +111,49 @@ int __init omap_init_fb(void)
 {
return platform_device_register(_fb_device);
 }
+
+static int rmem_omapfb_device_init(struct reserved_mem *rmem, struct 
device *dev)

+{
+   int dma;
+
+   if (rmem->priv)
+   return 0;
+
+   dma = dma_declare_coherent_memory(_fb_device.dev, rmem->base,
+ rmem->base, rmem->size,
+ DMA_MEMORY_MAP |
+ DMA_MEMORY_EXCLUSIVE);
+
+   if (!(dma & DMA_MEMORY_MAP)) {
+   pr_err("omapfb: dma_declare_coherent_memory failed\n");
+   return -ENOMEM;
+   }
+   else
+   rmem->priv = omap_fb_device.dev.dma_mem;
+
+   return 0;
+}
+
+static void rmem_omapfb_device_release(struct reserved_mem *rmem,
+  struct device *dev)
+{
+   dma_release_declared_memory(_fb_device.dev);
+}
+
+static const struct reserved_mem_ops rmem_omapfb_ops = {
+   .device_init= rmem_omapfb_device_init,
+   .device_release = rmem_omapfb_device_release,
+};
+
+static int __init rmem_omapfb_setup(struct reserved_mem *rmem)
+{
+   rmem->ops = _omapfb_ops;
+   pr_info("omapfb: reserved %d bytes at %pa\n", rmem->size, 
>base);

+
+   return 0;
+}
+
+RESERVEDMEM_OF_DECLARE(dss, "ti,omapfb-memsize", rmem_omapfb_setup);
 #else
 int __init omap_init_fb(void) { return 0; }
 #endif

diff --git a/arch/arm/mach-omap2/display.c b/arch/arm/mach-omap2/display.c
index 6ab13d1..6f0ba03 100644
--- a/arch/arm/mach-omap2/display.c
+++ b/arch/arm/mach-omap2/display.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 

 #include 
 #include "omap_hwmod.h"
@@ -640,6 +641,7 @@ int __init omapdss_init_of(void)
omap_display_device.dev.platform_data = _data;

r = platform_device_register(_display_device);
+
if (r < 0) {
pr_err("Unable to register omapdss device\n");
return r;
@@ -666,6 +668,9 @@ int __init omapdss_init_of(void)
return r;
}

+   /* Init fb reserved memory, there may be none so ignore the 
result */

+   of_reserved_mem_device_init(>dev);
+
/* create V4L2 display device */
r = omap_init_vout();
if (r < 0) {


Regards,
Ivo

Re: [RFC v2b 3/5] fs: btrfs: Use vfs_time accessors

2016-02-12 Thread Deepa Dinamani

On Fri, Feb 12, 2016 at 5:57 AM, Arnd Bergmann  wrote:
> On Friday 12 February 2016 01:45:47 Deepa Dinamani wrote:
>> +   ts = vfs_time_to_timespec(inode->i_mtime);
>> +   if (!timespec_equal(, ))
>> +   inode->i_mtime = timespec_to_vfs_time(now);
>> +
>> +   ts = vfs_time_to_timespec(inode->i_mtime);
>> +   if (!timespec_equal(, ))
>> +   inode->i_ctime = timespec_to_vfs_time(now);
>>
>
> The second one needs to be fs_time_to_timespec(inode->i_ctime), not i_mtime.


Yes, you are correct.
I will wait for some consensus on the proposal to figure out which
version to post again.

Thanks,
-Deepa

Re: [Y2038] [RFC v2] vfs 64 bit time transition proposals

2016-02-12 Thread Deepa Dinamani

> Regarding the three versions, I think all of them are doable
> doable, and they all have their upsides and downsides but no
> showstoppers.

I agree that all the approaches are doable.

> Let me summarize what I see in the patches:
>
> 2a is the smallest set of changes in number of lines, as you indicated
>in the previous discussion (I was skeptical here initially, but
>you were right). The main downside is that each patch has to
>carefully consider what happens at the point when the type gets
>flipped, so that printk format strings are correct and assignments
>to local variables don't truncate the range. It also requires
>changing the types again after the VFS change, but that is
>something we can automate using coccinelle.

2c has the same downside as this.
It also has to carefully consider what happens when you switch end filesystems
to timespec64, be it for printks or assignments.
I would say that the effort to do this was the same for 2a and 2c.

And, 2c also needs to get rid of the abstraction macros when vfs is transitioned
to using timespec64.

> 2b has the main advantage of not changing behavior with the flip, so
>we can convert all file systems to use vfs_time relatively easily
>and then later make them actually use 64-bit timestamps with
>a patch that each file system developer can do for themselves.
>One downside is that it leads to rather ugly code as discussed
>before, examples are in "[RFC v2b 5/5] fs: xfs: change inode
>times to use vfs_time data type" and "[RFC v2b 3/5] fs: btrfs:
>Use vfs_time accessors".

Here is the breakup of the number of changes required from the table
in the cover letter(https://lkml.org/lkml/2016/2/12/76):

* # Changes needed in 2a = row 1 + row 7 + row 8 + row 9 + row 10
= 34 + 80 + 10 + 3 + 3 = 130
* # Changes needed in 2b = row 1 + row 4 + row 5 + row 6 + row 7 * (~3)
= 34 + 80 + 141 + 74 + 85 + 240 = 654
* # Changes needed in 2c = Changes in 2b + some more

It is clear to see from the above table that number of such changes will be
considerably more for approaches 2b and 2c.

And, 2b is not even close to what we want to achieve and will again confuse
developers even more as there will be 2 sets of abstraction apis now:
1. vfs_time apis
2. timespec64 to timespec/ timespec to timespec64 apis
Since there is no clean up effort here after vfs is switched over, we are just
making all filesystems that use these apis harder to read.

> 2c gets us the furthest along the way for the conversion, close
>to where we want to end up in the long run, so we could do that
>to file systems one by one. The behavior change is immediate,
>so there are fewer possible surprises than with 2a, but it
>also means the most upfront work.

2c abstractions can be used in more than one way.
And, 2c also introduces a new timestamp data type along with
timespec64 in the filesystem code.
The above two factors can make it confusing for the developers
until we transition vfs and remove abstractions from individual
filesystems. And, this is a problem as we want to remove
abstractions in a different kernel release than the one we do the
transition in, as we discussed previously.

2a still seems like the right choice to me.
And, will have the least number of changes.

As Arnd thinks all of them are doable, if anybody else has other
concerns we missed
please comment.

-Deepa

RE: [PATCH V7 4/6] i2c: qup: Add bam dma capabilities

2016-02-12 Thread Sricharan

Hi Wolfram,

> -Original Message-
> From: Wolfram Sang [mailto:w...@the-dreams.de]
> Sent: Saturday, February 13, 2016 12:08 AM
> To: Sricharan R
> Cc: devicet...@vger.kernel.org; linux-arm-...@vger.kernel.org;
> agr...@codeaurora.org; linux-kernel@vger.kernel.org; linux-
> i...@vger.kernel.org; iiva...@mm-sol.com; ga...@codeaurora.org;
> dmaeng...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> andy.gr...@linaro.org; ntel...@codeaurora.org; arch...@codeaurora.org
> Subject: Re: [PATCH V7 4/6] i2c: qup: Add bam dma capabilities
> 
> On Tue, Jan 19, 2016 at 03:32:44PM +0530, Sricharan R wrote:
> > QUP cores can be attached to a BAM module, which acts as a dma engine
> > for the QUP core. When DMA with BAM is enabled, the BAM consumer
> pipe
> > transmitted data is written to the output FIFO and the BAM producer
> > pipe received data is read from the input FIFO.
> >
> > With BAM capabilities, qup-i2c core can transfer more than
> > 256 bytes, without a 'stop' which is not possible otherwise.
> >
> > Signed-off-by: Sricharan R 
> > Reviewed-by: Andy Gross 
> > Tested-by: Archit Taneja 
> > Tested-by: Telkar Nagender 
> 
> My code checkers found some issues:
> 
> SPARSE
> drivers/i2c/busses/i2c-qup.c:555:6: warning: symbol 'qup_sg_set_buf' was
> not declared. Should it be static?
> drivers/i2c/busses/i2c-qup.c:1243:50: warning: dubious: !x & !y
> SMATCH
> drivers/i2c/busses/i2c-qup.c:165 qup_sg_set_buf warn: unused return: s =
> sg_next()
> drivers/i2c/busses/i2c-qup.c:165 qup_sg_set_buf warn: unused return: s =
> sg_next()
> drivers/i2c/busses/i2c-qup.c:1243 qup_i2c_xfer_v2() warn: add some
> parenthesis here?
> CPPCHECK
> drivers/i2c/busses/i2c-qup.c:1243: style: Boolean result is used in
bitwise
> operation. Clarify expression with parentheses.
> SPATCH
> drivers/i2c/busses/i2c-qup.c:1380:2-13: WARNING: Assignment of bool to 0/1
> drivers/i2c/busses/i2c-qup.c:1481:1-13: WARNING: Assignment of bool to 0/1
>   CC  drivers/i2c/busses/i2c-qup.o
> drivers/i2c/busses/i2c-qup.c:555:6: warning: no previous prototype for
> 'qup_sg_set_buf' [-Wmissing-prototypes]  void qup_sg_set_buf(struct
> scatterlist *sg, void *buf, struct qup_i2c_tag *tg,
> 
> Can you fix them and resend??

Sorry on this , will resend this patch.

Regards,
 Sricharan

Re: [PATCH 2/2 v2] devicetree: Add DTS file to support the Nexus7 2013 (flo) device.

2016-02-12 Thread Bjorn Andersson

On Fri 12 Feb 21:16 PST 2016, John Stultz wrote:

> On Fri, Feb 5, 2016 at 11:21 AM, John Stultz  wrote:
> > This patch adds a dts file to support the Nexus7 2013
> > device. Its based off of the qcom-apq8064-ifc6410.dts
> > which is similar hardware.
> >
> > Also includes some comments and context folded in
> > from Vinay Simha BN 
> >
> > Cc: Rob Herring 
> > Cc: Arnd Bergmann 
> > Cc: Pawel Moll 
> > Cc: Mark Rutland 
> > Cc: Ian Campbell 
> > Cc: Kumar Gala 
> > Cc: Andy Gross 
> > Cc: Russell King 
> > Cc: Vinay Simha BN 
> > Cc: Bjorn Andersson 
> > Cc: Stephen Boyd 
> > Cc: linux-arm-...@vger.kernel.org
> > Cc: devicet...@vger.kernel.org
> > Acked-by: Rob Herring 
> > Signed-off-by: John Stultz 
> > ---
> > v2: Fix dts/dtb typeo in makefile pointed out by Rob
> 
> 
> Ping?  Is there someone I'm missing on the CC list for this? Or is
> there some special submission process?
> 

You're doing it right, sorry for not looking at this earlier.

Acked-by: Bjorn Andersson 

Andy can you apply these two patches, please.

Regards,
Bjorn

Re: [PATCH v6 0/3] mailbox: Add APM X-Gene platform mailbox driver

2016-02-12 Thread Duc Dang

Hi Itaru,

On Friday, February 12, 2016, Itaru Kitayama  wrote:
>
> Hi Duc,
>
> I've been testing your patch set for v4.5-rc1 on Mustang, with ACPI, however 
> the boot hangs in the middle of it:
>
> EFI stub: Booting Linux Kernel...
> EFI stub: Using DTB from configuration table
> EFI stub: Exiting boot services and installing virtual address map...
> L3c Cache: 8MB
> Booting Linux on physical CPU 0x0
> Linux version 4.5.0-rc1+ (itaru.kitayama@r2-a21) (gcc version 5.3.1 20151207 
> (Red Hat 5.3.1-2) (GCC) ) #96 SMP PREEMPT Fri Feb 12 22:43:54 CST 2016
> Boot CPU: AArch64 Processor [500f0001]
> earlycon: Early serial console at MMIO32 0x1c02 (options '')
> bootconsole [uart0] enabled
> efi: Getting EFI parameters from FDT:
> EFI v2.40 by X-Gene Mustang Board EFI Nov 24 2015 13:22:41
> efi:  ACPI=0x47fa869000  ACPI 2.0=0x47fa869014  SMBIOS 3.0=0x47fa867000
> cma: Reserved 512 MiB at 0x0040e000
> ACPI: Early table checksum verification disabled
> ACPI: RSDP 0x0047FA869014 24 (v02 APM   )
> ACPI: XSDT 0x0047FA8680E8 6C (v01 APMXGENE0003 0113)
> ACPI: FACP 0x0047FA85F000 00010C (v05 APMXGENE0003 INTL 
> 20140724)
> ACPI: DSDT 0x0047FA86 00495F (v05 APMAPM88xxx 0001 INTL 
> 20140724)
> ACPI: DBG2 0x0047FA865000 AA (v00 APMC0D XGENEDBG  INTL 
> 20140724)
> ACPI: GTDT 0x0047FA85D000 E0 (v02 APMXGENE0001 INTL 
> 20140724)
> ACPI: MCFG 0x0047FA85C000 3C (v01 APMXGENE0002 INTL 
> 20140724)
> ACPI: SPCR 0x0047FA85B000 50 (v02 APMC0D XGENESPC  INTL 
> 20140724)
> ACPI: SSDT 0x0047FA85A000 2D (v02 APMXGENE0001 INTL 
> 20140724)
> ACPI: APIC 0x0047FA859000 0002A4 (v03 APMXGENE0003 0113)
> ACPI: SSDT 0x0047FA858000 78 (v02 REDHAT MACADDRS 0001 0113)
> ACPI: SSDT 0x0047FA857000 32 (v02 REDHAT UARTCLKS 0001 0113)
> psci: is not implemented in ACPI.
> Unsupported ACPI enable-method
> Unsupported ACPI enable-method
> Unsupported ACPI enable-method
> Unsupported ACPI enable-method
> Unsupported ACPI enable-method
> Unsupported ACPI enable-method
> Unsupported ACPI enable-method
> Unsupported ACPI enable-method
> PERCPU: Embedded 2 pages/cpu @fe07fff9 s41472 r8192 d81408 u131072
> Detected PIPT I-cache on CPU0
> Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 523776
> Kernel command line: BOOT_IMAGE=/vmlinuz-4.5.0-rc1+ 
> root=UUID=0e305934-49d5-4f66-b10b-23ec6029fab9 ro acpi=force 
> earlycon=uart8250,mmio32,0x1c02 console=ttyS0,115200 LANG=en_US.UTF-8
> PID hash table entries: 4096 (order: -1, 32768 bytes)
> Dentry cache hash table entries: 4194304 (order: 9, 33554432 bytes)
> Inode-cache hash table entries: 2097152 (order: 8, 16777216 bytes)
> software IO TLB [mem 0x40dbff-0x40dfff] (64MB) mapped at 
> [fe00dbff-fe00dffe]
> Memory: 32856256K/33554432K available (6158K kernel code, 725K rwdata, 4032K 
> rodata, 768K init, 399K bss, 173888K reserved, 524288K cma-reserved)
> Virtual kernel memory layout:
> vmalloc : 0xfc00 - 0xfdfedfff   (  2043 GB)
> vmemmap : 0xfdfee000 - 0xfdffe000   ( 4 GB maximum)
>   0xfdfef000 - 0xfdfef200   (32 MB actual)
> fixed   : 0xfdfffa7d - 0xfdfffac0   (  4288 KB)
> PCI I/O : 0xfdfffae0 - 0xfdfffbe0   (16 MB)
> modules : 0xfdfffc00 - 0xfe00   (64 MB)
> memory  : 0xfe00 - 0xfe08   ( 32768 MB)
>   .init : 0xfea8 - 0xfeb4   (   768 KB)
>   .text : 0xfe08 - 0xfea75b44   ( 10199 KB)
>   .data : 0xfeb4 - 0xfebf5400   (   725 KB)
> SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
> Preemptible hierarchical RCU implementation.
> Build-time adjustment of leaf fanout to 64.
> RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=1.
> RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=1
> NR_IRQS:64 nr_irqs:64 0
> GIC: Using split EOI/Deactivate mode
> Architected cp15 timer(s) running at 50.00MHz (phys).
> clocksource: arch_sys_counter: mask: 0xff max_cycles: 
> 0xb8812736b, max_idle_ns: 440795202655 ns
> sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
> Console: colour dummy device 80x25
> Calibrating delay loop (skipped), value calculated using timer frequency.. 
> 100.00 BogoMIPS (lpj=20)
> pid_max: default: 32768 minimum: 301
> ACPI: Core revision 20160108
> ACPI: 4 ACPI AML tables successfully acquired and loaded
>
> Security Framework initialized
> Mount-cache hash table entries: 65536 (order: 3, 524288 bytes)
> Mountpoint-cache hash table entries: 65536 (order: 3, 524288 bytes)
> ASID allocator initialised with 65536 entries
>
> Is the patch set supposed to be evaluated without ACPI?

The patch set is

Re: [PATCH 2/2 v2] devicetree: Add DTS file to support the Nexus7 2013 (flo) device.

2016-02-12 Thread John Stultz

On Fri, Feb 5, 2016 at 11:21 AM, John Stultz  wrote:
> This patch adds a dts file to support the Nexus7 2013
> device. Its based off of the qcom-apq8064-ifc6410.dts
> which is similar hardware.
>
> Also includes some comments and context folded in
> from Vinay Simha BN 
>
> Cc: Rob Herring 
> Cc: Arnd Bergmann 
> Cc: Pawel Moll 
> Cc: Mark Rutland 
> Cc: Ian Campbell 
> Cc: Kumar Gala 
> Cc: Andy Gross 
> Cc: Russell King 
> Cc: Vinay Simha BN 
> Cc: Bjorn Andersson 
> Cc: Stephen Boyd 
> Cc: linux-arm-...@vger.kernel.org
> Cc: devicet...@vger.kernel.org
> Acked-by: Rob Herring 
> Signed-off-by: John Stultz 
> ---
> v2: Fix dts/dtb typeo in makefile pointed out by Rob


Ping?  Is there someone I'm missing on the CC list for this? Or is
there some special submission process?

thanks
-john

RE: [PATCH v6 0/3] mailbox: Add APM X-Gene platform mailbox driver

2016-02-12 Thread Itaru Kitayama


Hi Duc,

I've been testing your patch set for v4.5-rc1 on Mustang, with ACPI, 
however the boot hangs in the middle of it:


EFI stub: Booting Linux Kernel...
EFI stub: Using DTB from configuration table
EFI stub: Exiting boot services and installing virtual address map...
L3c Cache: 8MB
Booting Linux on physical CPU 0x0
Linux version 4.5.0-rc1+ (itaru.kitayama@r2-a21) (gcc version 5.3.1 
20151207 (Red Hat 5.3.1-2) (GCC) ) #96 SMP PREEMPT Fri Feb 12 22:43:54 
CST 2016

Boot CPU: AArch64 Processor [500f0001]
earlycon: Early serial console at MMIO32 0x1c02 (options '')
bootconsole [uart0] enabled
efi: Getting EFI parameters from FDT:
EFI v2.40 by X-Gene Mustang Board EFI Nov 24 2015 13:22:41
efi:  ACPI=0x47fa869000  ACPI 2.0=0x47fa869014  SMBIOS 3.0=0x47fa867000
cma: Reserved 512 MiB at 0x0040e000
ACPI: Early table checksum verification disabled
ACPI: RSDP 0x0047FA869014 24 (v02 APM   )
ACPI: XSDT 0x0047FA8680E8 6C (v01 APMXGENE0003 
0113)
ACPI: FACP 0x0047FA85F000 00010C (v05 APMXGENE0003 INTL 
20140724)
ACPI: DSDT 0x0047FA86 00495F (v05 APMAPM88xxx 0001 INTL 
20140724)
ACPI: DBG2 0x0047FA865000 AA (v00 APMC0D XGENEDBG  INTL 
20140724)
ACPI: GTDT 0x0047FA85D000 E0 (v02 APMXGENE0001 INTL 
20140724)
ACPI: MCFG 0x0047FA85C000 3C (v01 APMXGENE0002 INTL 
20140724)
ACPI: SPCR 0x0047FA85B000 50 (v02 APMC0D XGENESPC  INTL 
20140724)
ACPI: SSDT 0x0047FA85A000 2D (v02 APMXGENE0001 INTL 
20140724)
ACPI: APIC 0x0047FA859000 0002A4 (v03 APMXGENE0003 
0113)
ACPI: SSDT 0x0047FA858000 78 (v02 REDHAT MACADDRS 0001 
0113)
ACPI: SSDT 0x0047FA857000 32 (v02 REDHAT UARTCLKS 0001 
0113)

psci: is not implemented in ACPI.
Unsupported ACPI enable-method
Unsupported ACPI enable-method
Unsupported ACPI enable-method
Unsupported ACPI enable-method
Unsupported ACPI enable-method
Unsupported ACPI enable-method
Unsupported ACPI enable-method
Unsupported ACPI enable-method
PERCPU: Embedded 2 pages/cpu @fe07fff9 s41472 r8192 d81408 u131072
Detected PIPT I-cache on CPU0
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 523776
Kernel command line: BOOT_IMAGE=/vmlinuz-4.5.0-rc1+ 
root=UUID=0e305934-49d5-4f66-b10b-23ec6029fab9 ro acpi=force 
earlycon=uart8250,mmio32,0x1c02 console=ttyS0,115200 LANG=en_US.UTF-8

PID hash table entries: 4096 (order: -1, 32768 bytes)
Dentry cache hash table entries: 4194304 (order: 9, 33554432 bytes)
Inode-cache hash table entries: 2097152 (order: 8, 16777216 bytes)
software IO TLB [mem 0x40dbff-0x40dfff] (64MB) mapped at 
[fe00dbff-fe00dffe]
Memory: 32856256K/33554432K available (6158K kernel code, 725K rwdata, 
4032K rodata, 768K init, 399K bss, 173888K reserved, 524288K cma-reserved)

Virtual kernel memory layout:
vmalloc : 0xfc00 - 0xfdfedfff   (  2043 GB)
vmemmap : 0xfdfee000 - 0xfdffe000   ( 4 GB maximum)
  0xfdfef000 - 0xfdfef200   (32 MB actual)
fixed   : 0xfdfffa7d - 0xfdfffac0   (  4288 KB)
PCI I/O : 0xfdfffae0 - 0xfdfffbe0   (16 MB)
modules : 0xfdfffc00 - 0xfe00   (64 MB)
memory  : 0xfe00 - 0xfe08   ( 32768 MB)
  .init : 0xfea8 - 0xfeb4   (   768 KB)
  .text : 0xfe08 - 0xfea75b44   ( 10199 KB)
  .data : 0xfeb4 - 0xfebf5400   (   725 KB)
SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Preemptible hierarchical RCU implementation.
Build-time adjustment of leaf fanout to 64.
RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=1.
RCU: Adjusting geometry for rcu_fanout_leaf=64, nr_cpu_ids=1
NR_IRQS:64 nr_irqs:64 0
GIC: Using split EOI/Deactivate mode
Architected cp15 timer(s) running at 50.00MHz (phys).
clocksource: arch_sys_counter: mask: 0xff max_cycles: 
0xb8812736b, max_idle_ns: 440795202655 ns

sched_clock: 56 bits at 50MHz, resolution 20ns, wraps every 4398046511100ns
Console: colour dummy device 80x25
Calibrating delay loop (skipped), value calculated using timer 
frequency.. 100.00 BogoMIPS (lpj=20)

pid_max: default: 32768 minimum: 301
ACPI: Core revision 20160108
ACPI: 4 ACPI AML tables successfully acquired and loaded

Security Framework initialized
Mount-cache hash table entries: 65536 (order: 3, 524288 bytes)
Mountpoint-cache hash table entries: 65536 (order: 3, 524288 bytes)
ASID allocator initialised with 65536 entries

Is the patch set supposed to be evaluated without ACPI?

Re: [PATCH v2 0/2] DAX bdev fixes - move flushing calls to FS

2016-02-12 Thread Ross Zwisler

On Sat, Feb 13, 2016 at 01:38:49PM +1100, Dave Chinner wrote:
> On Fri, Feb 12, 2016 at 12:03:20PM -0700, Ross Zwisler wrote:
> > On Thu, Feb 11, 2016 at 01:43:04PM +0100, Jan Kara wrote:
> > > On Wed 10-02-16 13:48:54, Ross Zwisler wrote:
> > > > 3) In filemap_write_and_wait() and filemap_write_and_wait_range(), 
> > > > continue
> > > > the writeback in the case that DAX is enabled but we only have a nonzero
> > > > mapping->nrpages.  As with 1) and 2), I believe this is necessary to
> > > > properly writeback metadata changes.  If this sounds wrong, please let 
> > > > me
> > > > know and I'll get more info.
> > > 
> > > And I'm surprised here as well. If there are dax_mapping() inodes that 
> > > have
> > > pagecache pages, then we have issues with radix tree handling as well. So
> > > how come dax_mapping() inodes have pages attached? If it is about block
> > > device inodes, then I find it buggy, that S_DAX gets set for such inodes
> > > when filesystem is mounted on them because in such cases we are IMO asking
> > > for data corruption sooner rather than later...
> > 
> > I think I've figured this one out, at least partially.
> > 
> > For ext2 the issues I was seeing were due to the fact that directory inodes
> > have S_DAX set, but have dirty page cache pages.   In testing with
> > generic/002, I see two ext2 inodes with S_DAX trying to do a writeback while
> > they have dirty page cache pages.  The first has i_ino=2, which is the
> > EXT2_ROOT_INO.
> 
> > As far as I can see, XFS does not have these issues - returning immediately
> > having done just the DAX writeback in xfs_vm_writepages() lets all my 
> > xfstests
> > pass.
> 
> XFS will not have issues because it does not dirty directory inodes
> at the VFS level, nor does it use the page cache for directory data.
> However, looking at the code I think it does still set S_DAX on
> directory inodes, which it shouldn't be doing.
> 
> I've got a couple of fixes I need to do in this area - hopefully
> I'll get it done on Monday.

Cool.  I've got a quick patch that stops S_DAX from being set on everything
but regular inodes for ext2 and ext4.  This solved a lot of my xfstests
failures.

Even after that I'm seeing two last failures with ext4 - I'll keep working on
those.

- Ross

Re: [GIT PULL] bcm2835 DT changes for 4.6

2016-02-12 Thread Florian Fainelli

On 12/02/2016 16:53, Eric Anholt wrote:
> Florian Fainelli  writes:
> 
>> On 10/02/2016 10:51, Eric Anholt wrote:
>>> Martin Sperl  writes:
>>>
> On 09.02.2016, at 01:32, Eric Anholt  wrote:
>
> Hi Florian.  Here's the first set of patches for bcm2835 for 4.6.
> We've got more DT patches that are going to happen for new boards,
> too, but they're still getting polished.
>
> The following changes since commit 
> 92e963f50fc74041b5e9e744c330dca48e04f08d:
>
>  Linux 4.5-rc1 (2016-01-24 13:06:47 -0800)
>
> are available in the git repository at:
>
>  g...@github.com:anholt/linux.git tags/bcm2835-dt-next-2016-02-04
>
> for you to fetch changes up to 5ec6f2cd8ec4bcd38ba199ea8711a5ec906d85e7:
>
>  ARM: bcm2835: Add the Raspberry Pi power domain driver to the DT. 
> (2016-02-02 20:02:45 -0800)
>
> 
> This pull request covers mostly DT changes that didn't make it into
> 4.5 because required header files went through other trees.
>
> 
> Alexander Aring (1):
>  ARM: bcm2835: Add the Raspberry Pi power domain driver to the DT.
>
> Lubomir Rintel (1):
>  ARM: bcm2835: dt: Add Raspberry Pi Model A
>
> Martin Sperl (2):
>  ARM: bcm2835: add the auxiliary spi1 and spi2 to the device tree
>  ARM: bcm2835: follow dt uart node-naming convention

 Do you want me to resend a rebased version of:
  ARM: bcm2835: add bcm2835-aux-uart support to default DT

 The corresponding driver has been added to tty/tty-next.
>>>
>>> It hadn't landed last time I checked.  A rebased version that you've
>>> tested would be great!
>>
>> OK, please submit this in the next week or so at most, so we can get
>> this pull request merged, thanks!
>>
>> Eric, do you have other changes outside of Device Tree?
> 
> We've got bcm2835_defconfig changes that I need to test and tag.

Ok, so that one needs to be a pull request.

> 
> There are also the multi_v7_defconfig updates to enable bcm2835.  Would
> I be pulling those, or someone above me?

This one could too, or it could be an individual patch that the arm-soc
maintainers take directly, either way is fine AFAIR.
--
Florian

[PATCH v6 3/3] arm64: dts: mailbox device tree node for APM X-Gene platform.

2016-02-12 Thread Duc Dang

Mailbox device tree node for APM X-Gene platform.

Signed-off-by: Feng Kan 
Signed-off-by: Duc Dang 
---
Changes since v5:
- None

Changes since v4:
- Rebase over v4.5-rc1
- Change node name to mailbox@1054

Changes since v3:
- Rebase over v4.4

 arch/arm64/boot/dts/apm/apm-storm.dtsi | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/arch/arm64/boot/dts/apm/apm-storm.dtsi 
b/arch/arm64/boot/dts/apm/apm-storm.dtsi
index fe30f76..d91338a 100644
--- a/arch/arm64/boot/dts/apm/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm/apm-storm.dtsi
@@ -691,6 +691,20 @@
msi-parent = <>;
};
 
+   mailbox: mailbox@1054 {
+   compatible = "apm,xgene-slimpro-mbox";
+   reg = <0x0 0x1054 0x0 0xa000>;
+   #mbox-cells = <1>;
+   interrupts =<0x0 0x0 0x4>,
+   <0x0 0x1 0x4>,
+   <0x0 0x2 0x4>,
+   <0x0 0x3 0x4>,
+   <0x0 0x4 0x4>,
+   <0x0 0x5 0x4>,
+   <0x0 0x6 0x4>,
+   <0x0 0x7 0x4>;
+   };
+
serial0: serial@1c02 {
status = "disabled";
device_type = "serial";
-- 
1.9.1

[PATCH v6 1/3] mailbox: Add support for APM X-Gene platform mailbox driver

2016-02-12 Thread Duc Dang

X-Gene mailbox controller provides 8 mailbox channels, with
each channel has a dedicated interrupt line.

Signed-off-by: Feng Kan 
Signed-off-by: Duc Dang 
---
Changes since v5:
- Add description for struct slimpro_mbox_chan
and struct slimpro_mbox

Changes since v4:
- Rebase over v4.5-rc1
- Fix section mistmatch warning by removing
__init in slimpro_mbox_probe declaration
- Correctly print channel number when
there is no IRQ for that channel

Changes since v3:
- Rebase over v4.4
- Remove 'id' in slimpro_mbox_chan structure
- Remove small functions that are only called once
and fold them into the other callers
- Remove void* pointer type cast
- Relax the number of mailbox IRQs condition
- Use subsys_initcall to guarantee mailbox driver
will be registered before any other dependent driver
is loaded.

 drivers/mailbox/Kconfig |   9 +
 drivers/mailbox/Makefile|   2 +
 drivers/mailbox/mailbox-xgene-slimpro.c | 284 
 3 files changed, 295 insertions(+)
 create mode 100644 drivers/mailbox/mailbox-xgene-slimpro.c

diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
index 546d05f..678e434 100644
--- a/drivers/mailbox/Kconfig
+++ b/drivers/mailbox/Kconfig
@@ -85,4 +85,13 @@ config MAILBOX_TEST
  Test client to help with testing new Controller driver
  implementations.
 
+config XGENE_SLIMPRO_MBOX
+   tristate "APM SoC X-Gene SLIMpro Mailbox Controller"
+   depends on ARCH_XGENE
+   help
+ An implementation of the APM X-Gene Interprocessor Communication
+ Mailbox (IPCM) between the ARM 64-bit cores and SLIMpro controller.
+ It is used to send short messages between ARM64-bit cores and
+ the SLIMpro Management Engine, primarily for PM. Say Y here if you
+ want to use the APM X-Gene SLIMpro IPCM support.
 endif
diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
index 92435ef..b602ef8 100644
--- a/drivers/mailbox/Makefile
+++ b/drivers/mailbox/Makefile
@@ -17,3 +17,5 @@ obj-$(CONFIG_ALTERA_MBOX) += mailbox-altera.o
 obj-$(CONFIG_BCM2835_MBOX) += bcm2835-mailbox.o
 
 obj-$(CONFIG_STI_MBOX) += mailbox-sti.o
+
+obj-$(CONFIG_XGENE_SLIMPRO_MBOX) += mailbox-xgene-slimpro.o
diff --git a/drivers/mailbox/mailbox-xgene-slimpro.c 
b/drivers/mailbox/mailbox-xgene-slimpro.c
new file mode 100644
index 000..b5f5106
--- /dev/null
+++ b/drivers/mailbox/mailbox-xgene-slimpro.c
@@ -0,0 +1,284 @@
+/*
+ * APM X-Gene SLIMpro MailBox Driver
+ *
+ * Copyright (c) 2015, Applied Micro Circuits Corporation
+ * Author: Feng Kan f...@apm.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 of
+ * the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define MBOX_CON_NAME  "slimpro-mbox"
+#define MBOX_REG_SET_OFFSET0x1000
+#define MBOX_CNT   8
+#define MBOX_STATUS_AVAIL_MASK BIT(16)
+#define MBOX_STATUS_ACK_MASK   BIT(0)
+
+/* Configuration and Status Registers */
+#define REG_DB_IN  0x00
+#define REG_DB_DIN00x04
+#define REG_DB_DIN10x08
+#define REG_DB_OUT 0x10
+#define REG_DB_DOUT0   0x14
+#define REG_DB_DOUT1   0x18
+#define REG_DB_STAT0x20
+#define REG_DB_STATMASK0x24
+
+/**
+ * X-Gene SlimPRO mailbox channel information
+ *
+ * @dev:   Device to which it is attached
+ * @chan:  Pointer to mailbox communication channel
+ * @reg:   Base address to access channel registers
+ * @irq:   Interrupt number of the channel
+ * @rx_msg:Received message storage
+ */
+struct slimpro_mbox_chan {
+   struct device   *dev;
+   struct mbox_chan*chan;
+   void __iomem*reg;
+   int irq;
+   u32 rx_msg[3];
+};
+
+/**
+ * X-Gene SlimPRO Mailbox controller data
+ *
+ * X-Gene SlimPRO Mailbox controller has 8 commnunication channels.
+ * Each channel has a separate IRQ number assgined to it.
+ *
+ * @mb_ctrl:   Representation of the commnunication channel controller
+ * @mc:Array of SlimPRO mailbox channels of the controller
+ * @chans:

[PATCH v6 0/3] mailbox: Add APM X-Gene platform mailbox driver

2016-02-12 Thread Duc Dang

APM X-Gene SoC has a mailbox controller that provides
communication mechanism for X-Gene Arm64 cores to communicate
with X-Gene SoC's Cortex M3 (SLIMpro) processor.

X-Gene mailbox controller provides 8 mailbox channels, with
each channel has a dedicated interrupt line.

Changes since v5:
- Add more description into SlimPRO
mailbox data structure

Changes since v4:
- Rebase over v4.5-rc1
- Fix section mismatch warning during compiling
- Correctly print channel number when there is
no IRQ for that channel
- Change node name to mailbox@1054
- Correct the number of IRQs in documentation

Changes since v3:
- Rebase over v4.4
- Remove 'id' in slimpro_mbox_chan structure
- Remove functions that are only called once
and fold them into the other callers
- Remove void* pointer type cast
- Relax the number of mailbox IRQs condition
- Fix error and address comment in documentation
(xgene-slimpro-mailbox.txt)

Changes since v2:
- Rebase Feng's patch set over v4.3-rc5
- Remove uneccessary 'inline' in function definition
- Use module_platform_driver instead of subsys_initcall
- Minor coding stype clean up

Changes since v1:
- Add ACPI support
- Use defines for reg offset

Duc Dang (3):
  mailbox: Add support for APM X-Gene platform mailbox driver
  Documentation: mailbox: Add APM X-Gene SLIMpro mailbox dts
documentation
  arm64: dts: mailbox device tree node for APM X-Gene platform.

 .../bindings/mailbox/xgene-slimpro-mailbox.txt |  35 +++
 arch/arm64/boot/dts/apm/apm-storm.dtsi |  14 +
 drivers/mailbox/Kconfig|   9 +
 drivers/mailbox/Makefile   |   2 +
 drivers/mailbox/mailbox-xgene-slimpro.c| 284 +
 5 files changed, 344 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/mailbox/xgene-slimpro-mailbox.txt
 create mode 100644 drivers/mailbox/mailbox-xgene-slimpro.c

-- 
1.9.1

[PATCH v6 2/3] Documentation: mailbox: Add APM X-Gene SLIMpro mailbox dts documentation

2016-02-12 Thread Duc Dang

This adds the APM X-Gene SLIMpro mailbox device tree
node documentation.

Signed-off-by: Feng Kan 
Signed-off-by: Duc Dang 
Acked-by: Rob Herring 
---
Changes since v5:
- None

Changes since v4:
- Rebase over v4.5-rc1
- Fix number of total interrupts in
introduction text
- Change node name to mailbox@1054

Changes since v3:
- Rebase over v4.4
- Change number of mailbox IRQs to 8
- Fix white spaces, typos.

 .../bindings/mailbox/xgene-slimpro-mailbox.txt | 35 ++
 1 file changed, 35 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/mailbox/xgene-slimpro-mailbox.txt

diff --git 
a/Documentation/devicetree/bindings/mailbox/xgene-slimpro-mailbox.txt 
b/Documentation/devicetree/bindings/mailbox/xgene-slimpro-mailbox.txt
new file mode 100644
index 000..e46451b
--- /dev/null
+++ b/Documentation/devicetree/bindings/mailbox/xgene-slimpro-mailbox.txt
@@ -0,0 +1,35 @@
+The APM X-Gene SLIMpro mailbox is used to communicate messages between
+the ARM64 processors and the Cortex M3 (dubbed SLIMpro). It uses a simple
+interrupt based door bell mechanism and can exchange simple messages using the
+internal registers.
+
+There are total of 8 interrupts in this mailbox. Each used for an individual
+door bell (or mailbox channel).
+
+Required properties:
+- compatible:  Should be as "apm,xgene-slimpro-mbox".
+
+- reg: Contains the mailbox register address range.
+
+- interrupts:  8 interrupts must be from 0 to 7, interrupt 0 define the
+   the interrupt for mailbox channel 0 and interrupt 1 for
+   mailbox channel 1 and so likewise for the reminder.
+
+- #mbox-cells: only one to specify the mailbox channel number.
+
+Example:
+
+Mailbox Node:
+   mailbox: mailbox@1054 {
+   compatible = "apm,xgene-slimpro-mbox";
+   reg = <0x0 0x1054 0x0 0xa000>;
+   #mbox-cells = <1>;
+   interrupts =<0x0 0x0 0x4>,
+   <0x0 0x1 0x4>,
+   <0x0 0x2 0x4>,
+   <0x0 0x3 0x4>,
+   <0x0 0x4 0x4>,
+   <0x0 0x5 0x4>,
+   <0x0 0x6 0x4>,
+   <0x0 0x7 0x4>,
+   };
-- 
1.9.1

[PATCH v7 0/8] Patchset enabling hardware based cross-timestamps for next gen Intel platforms

2016-02-12 Thread Christopher S. Hall

Modern Intel hardware adds an Always Running Timer (ART) that allows the
network and audio device clocks to precisely cross timestamp the device
clock with the system clock. This allows a precise correlation of the
device time and system time.

This patchset adds interfaces to the timekeeping code allowing drivers
to translate ART time to system time.

Changelog:

Changes from v6 to v7:

*   Reorder several patches
*   Removed correlated clocksource
*   Fixed 32-bit compile issues
*   Added multiplication overflow detection to history computation
*   Added invariant tsc CPU feature - this is related to ART, but
is a separate feature


Changes from v5 to v6:

*   Pulled supporting code for snapshotting, correlated
clocksource, and cycles to nanoseconds translation to separate
patches. Added patches are marked as NEW below. There is,
however, very little *actually* new code, just reorganized
code
*   Renamed and moved clocksource change sequence to timekeeper
struct (out of tk_read_base)
*   Renamed structs for system counter and synced device time
callback to system_counterval_t and sync_device_time_cb,
respectively
*   Changed PTP cross-timestamp callback name to getcrosststamp
for consistency with the timekeeping code - corresponding
function name changes in e1000e driver
*   Simplified PTP time calculations making use of ktime_to_* code

Changes from v4 to v5:

*   Changes the history mechanism to interpolate system time using
a single historic system time pair (monotonic raw, realtime)
rather than implementing a precise history using shadow
timekeeper (see v4 changes). The advantage of this approach is
that the history can be arbitrarily long. This approach may
also be simpler in terms of coding. The major disadvantage is
that the realtime clock can be adjusted.  When adjusted, the
realtime clock time (when interpolating from history) is
always approximate. In general, the longer the interpolation
period the larger the potential error. There isn't any error
interpolating the monotonic raw clock time.
*   This patchset also addresses objections to the previous
patchsets overly complex correlated timestamp structure. This
patchset splits that structure into several smaller
structures.  The correlated timestamp interface is renamed
cross timestamp to avoid any confusion with the correlated
clocksource.
*   The correlated clocksource is separated from the cross
timestamp mechanism.
*   Add monotonic raw to the PTP user interface
*   Add e1000e driver configuration option that wraps Intel PCH
specific code

Changes v3 to v4: 

*   Adds a history mechanism to accomodate slower devices. In this
case the response time for timestamp reads to the Intel DSP
are too slow to be accomodated by the original correlated time
mechanism. The history mechanism turns shadow timekeeper into
an array where the history is stored.

Christopher S. Hall (8):
  time: Add cycles to nanoseconds translation
  time: Add timekeeping snapshot code capturing system time and counter
  time: Remove duplicated code in ktime_get_raw_and_real()
  time: Add driver cross timestamp interface for higher precision time
synchronization
  time: Add history to cross timestamp interface supporting slower
devices
  x86: tsc: Always Running Timer (ART) correlated clocksource
  ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping
  net: e1000e: Adds hardware supported cross timestamp on e1000e nic

 Documentation/ptp/testptp.c |   6 +-
 arch/x86/include/asm/cpufeature.h   |   3 +-
 arch/x86/include/asm/tsc.h  |   2 +
 arch/x86/kernel/cpu/scattered.c |   1 +
 arch/x86/kernel/tsc.c   |  50 +
 drivers/net/ethernet/intel/Kconfig  |   9 +
 drivers/net/ethernet/intel/e1000e/defines.h |   5 +
 drivers/net/ethernet/intel/e1000e/ptp.c |  85 
 drivers/net/ethernet/intel/e1000e/regs.h|   4 +
 drivers/ptp/ptp_chardev.c   |  27 +++
 include/linux/pps_kernel.h  |  17 +-
 include/linux/ptp_clock_kernel.h|   8 +
 include/linux/timekeeper_internal.h |   2 +
 include/linux/timekeeping.h |  58 ++
 include/uapi/linux/ptp_clock.h  |  13 +-
 kernel/time/timekeeping.c   | 289 +---
 16 files changed, 539 insertions(+), 40 deletions(-)

-- 
2.1.4

[PATCH v7 2/8] time: Add timekeeping snapshot code capturing system time and counter

2016-02-12 Thread Christopher S. Hall

In the current timekeeping code there isn't any interface to
atomically capture the current relationship between the system counter
and system time. ktime_get_snapshot() returns this triple (counter,
monotonic raw, realtime) in the system_time_snapshot struct.

Signed-off-by: Christopher S. Hall 
[jstultz: Moved structure definitions around to clean things up]
Signed-off-by: John Stultz 
---
 include/linux/timekeeping.h | 18 ++
 kernel/time/timekeeping.c   | 30 ++
 2 files changed, 48 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ec89d84..af220e1 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -267,6 +267,24 @@ extern void ktime_get_raw_and_real_ts64(struct timespec64 
*ts_raw,
struct timespec64 *ts_real);
 
 /*
+ * struct system_time_snapshot - simultaneous raw/real time capture with
+ * counter value
+ * @cycles:Clocksource counter value to produce the system times
+ * @real:  Realtime system time
+ * @raw:   Monotonic raw system time
+ */
+struct system_time_snapshot {
+   cycles_tcycles;
+   ktime_t real;
+   ktime_t raw;
+};
+
+/*
+ * Simultaneously snapshot realtime and monotonic raw clocks
+ */
+extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
+
+/*
  * Persistent clock related interfaces
  */
 extern int persistent_clock_is_local;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4243d28..89b4695 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -874,6 +874,36 @@ time64_t __ktime_get_real_seconds(void)
return tk->xtime_sec;
 }
 
+/**
+ * ktime_get_snapshot - snapshots the realtime/monotonic raw clocks with 
counter
+ * @systime_snapshot:  pointer to struct receiving the system time snapshot
+ */
+void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
+{
+   struct timekeeper *tk = _core.timekeeper;
+   unsigned long seq;
+   ktime_t base_raw;
+   ktime_t base_real;
+   s64 nsec_raw;
+   s64 nsec_real;
+   cycle_t now;
+
+   do {
+   seq = read_seqcount_begin(_core.seq);
+
+   now = tk->tkr_mono.read(tk->tkr_mono.clock);
+   base_real = ktime_add(tk->tkr_mono.base,
+ tk_core.timekeeper.offs_real);
+   base_raw = tk->tkr_raw.base;
+   nsec_real = timekeeping_cycles_to_ns(>tkr_mono, now);
+   nsec_raw  = timekeeping_cycles_to_ns(>tkr_raw, now);
+   } while (read_seqcount_retry(_core.seq, seq));
+
+   systime_snapshot->cycles = now;
+   systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
+   systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
+}
+EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
 #ifdef CONFIG_NTP_PPS
 
-- 
2.1.4

[PATCH v7 4/8] time: Add driver cross timestamp interface for higher precision time synchronization

2016-02-12 Thread Christopher S. Hall

ACKNOWLEDGMENT: cross timestamp code was developed by Thomas Gleixner
. It has changed considerably and any mistakes are
mine.

The precision with which events on multiple networked systems can be
synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited
by the precision of the cross timestamps between the system clock and
the device (timestamp) clock. Precision here is the degree of
simultaneity when capturing the cross timestamp.

Currently the PTP cross timestamp is captured in software using the
PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are
interleaved with reads of the realtime clock. At best, the precision
of this cross timestamp is on the order of several microseconds due to
software latencies. Sub-microsecond precision is required for
industrial control and some media applications. To achieve this level
of precision hardware supported cross timestamping is needed.

The function get_device_system_crosstimestamp() allows device drivers
to return a cross timestamp with system time properly scaled to
nanoseconds.  The realtime value is needed to discipline that clock
using PTP and the monotonic raw value is used for applications that
don't require a "real" time, but need an unadjusted clock time.  The
get_device_system_crosstimestamp() code calls back into the driver to
ensure that the system counter is within the current timekeeping
update interval.

Modern Intel hardware provides an Always Running Timer (ART) which is
exactly related to TSC through a known frequency ratio. The ART is
routed to devices on the system and is used to precisely and
simultaneously capture the device clock with the ART.

Signed-off-by: Christopher S. Hall 
[jstultz: Reworked to remove extra structures and simplify calling]
Signed-off-by: John Stultz 
---
 include/linux/timekeeping.h | 35 +++
 kernel/time/timekeeping.c   | 58 +
 2 files changed, 93 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index af220e1..75bb836 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -280,6 +280,41 @@ struct system_time_snapshot {
 };
 
 /*
+ * struct system_device_crosststamp - system/device cross-timestamp
+ * (syncronized capture)
+ * @device:Device time
+ * @sys_realtime:  Realtime simultaneous with device time
+ * @sys_monoraw:   Monotonic raw simultaneous with device time
+ */
+struct system_device_crosststamp {
+   ktime_t device;
+   ktime_t sys_realtime;
+   ktime_t sys_monoraw;
+};
+
+/*
+ * struct system_counterval_t - system counter value with the pointer to the
+ * corresponding clocksource
+ * @cycles:System counter value
+ * @cs:Clocksource corresponding to system counter value. Used 
by
+ * timekeeping code to verify comparibility of two cycle values
+ */
+struct system_counterval_t {
+   cycle_t cycles;
+   struct clocksource  *cs;
+};
+
+/*
+ * Get cross timestamp between system clock and device clock
+ */
+extern int get_device_system_crosststamp(
+   int (*get_time_fn)(ktime_t *device_time,
+   struct system_counterval_t *system_counterval,
+   void *ctx),
+   void *ctx,
+   struct system_device_crosststamp *xtstamp);
+
+/*
  * Simultaneously snapshot realtime and monotonic raw clocks
  */
 extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index f1a1c97..8c53398 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -908,6 +908,64 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
 /**
+ * get_device_system_crosststamp - Synchronously capture system/device 
timestamp
+ * @sync_devicetime:   Callback to get simultaneous device time and
+ * system counter from the device driver
+ * @xtstamp:   Receives simultaneously captured system and device time
+ *
+ * Reads a timestamp from a device and correlates it to system time
+ */
+int get_device_system_crosststamp(int (*get_time_fn)
+ (ktime_t *device_time,
+  struct system_counterval_t *sys_counterval,
+  void *ctx),
+ void *ctx,
+ struct system_device_crosststamp *xtstamp)
+{
+   struct timekeeper *tk = _core.timekeeper;
+   unsigned long seq;
+   struct system_counterval_t system_counterval;
+   ktime_t base_raw;
+   ktime_t base_real;
+   s64 nsec_raw;
+   s64 nsec_real;
+   int ret;
+
+   do {
+   seq = read_seqcount_begin(_core.seq);
+   /*
+* Try to synchronously capture device

Re: [PATCH 2/3] tpm: Get rid of chip->pdev

2016-02-12 Thread Jason Gunthorpe

On Fri, Feb 12, 2016 at 08:31:21PM -0500, Stefan Berger wrote:
> > I'll send you something else that might work for vtpm...'
> 
> The vtpm driver will introduce chip->priv, which will point to vtpm_dev. For
> this reason we need to hold a reference to the vtpm_dev->dev in the
> front end.

This should take care of it for all drivers including vtpm.

https://github.com/jgunthorpe/linux/commits/for-jarkko

At the very least this turns silent use after free into a null pointer
oops.

We should also discuss if we want to continue to have the driver
module locked while /dev/tpmX is open, that is no longer needed for
corectness.

>From 52c5710ff585e936687e57ca5e267e82e334ebc5 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe 
Date: Fri, 12 Feb 2016 20:29:53 -0700
Subject: [PATCH 3/3] tpm: Provide strong locking for device removal

Add a read/write semaphore around the ops function pointers so
ops can be set to null when the driver un-registers.

Previously the tpm core expected module locking to be enough to
ensure that tpm_unregister could not be called during certain times,
however that hasn't been sufficient for a long time.

Introduce a read/write semaphore around 'ops' so the core can set
it to null when unregistering. This provides a strong fence around
the driver callbacks, guaranteeing to the driver that no callbacks
are running or will run again.

For now the ops_lock is placed very high in the call stack, it could
be pushed down and made more granular in future if necessary.

Signed-off-by: Jason Gunthorpe 
---
 drivers/char/tpm/tpm-chip.c  | 72 
 drivers/char/tpm/tpm-dev.c   | 10 +-
 drivers/char/tpm/tpm-interface.c | 18 +-
 drivers/char/tpm/tpm-sysfs.c |  5 +++
 drivers/char/tpm/tpm.h   | 14 +---
 5 files changed, 98 insertions(+), 21 deletions(-)

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index df4132e8b982..647fdf327537 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -36,9 +36,59 @@ static DEFINE_SPINLOCK(driver_lock);
 struct class *tpm_class;
 dev_t tpm_devt;
 
-/*
- * tpm_chip_find_get - return tpm_chip for a given chip number
- * @chip_num the device number for the chip
+/**
+ * tpm_try_get_ops() - Get a ref to the tpm_chip
+ * @chip: Chip to ref
+ *
+ * The caller must already have some kind of locking to ensure that chip is
+ * valid. This function will lock the chip so that the ops member can be
+ * accessed safely. The locking prevents tpm_chip_unregister from
+ * completing, so it should not be held for long periods.
+ *
+ * Returns -ERRNO if the chip could not be got.
+ */
+int tpm_try_get_ops(struct tpm_chip *chip)
+{
+   int rc = -EIO;
+
+   get_device(>dev);
+
+   down_read(>ops_sem);
+   if (!chip->ops)
+   goto out_lock;
+
+   if (!try_module_get(chip->dev.parent->driver->owner))
+   goto out_lock;
+
+   return 0;
+out_lock:
+   up_read(>ops_sem);
+   put_device(>dev);
+   return rc;
+}
+EXPORT_SYMBOL_GPL(tpm_try_get_ops);
+
+/**
+ * tpm_put_ops() - Release a ref to the tpm_chip
+ * @chip: Chip to put
+ *
+ * This is the opposite pair to tpm_try_get_ops(). After this returns chip may
+ * be kfree'd.
+ */
+void tpm_put_ops(struct tpm_chip *chip)
+{
+   module_put(chip->dev.parent->driver->owner);
+   up_read(>ops_sem);
+   put_device(>dev);
+}
+EXPORT_SYMBOL_GPL(tpm_put_ops);
+
+/**
+ * tpm_chip_find_get() - return tpm_chip for a given chip number
+ * @chip_num: id to find
+ *
+ * The return'd chip has been tpm_try_get_ops'd and must be released via
+ * tpm_put_ops
  */
 struct tpm_chip *tpm_chip_find_get(int chip_num)
 {
@@ -49,10 +99,10 @@ struct tpm_chip *tpm_chip_find_get(int chip_num)
if (chip_num != TPM_ANY_NUM && chip_num != pos->dev_num)
continue;
 
-   if (try_module_get(pos->dev.parent->driver->owner)) {
+   /* rcu prevents chip from being free'd */
+   if (!tpm_try_get_ops(pos))
chip = pos;
-   break;
-   }
+   break;
}
rcu_read_unlock();
return chip;
@@ -95,6 +145,7 @@ struct tpm_chip *tpmm_chip_alloc(struct device *dev,
return ERR_PTR(-ENOMEM);
 
mutex_init(>tpm_mutex);
+   init_rwsem(>ops_sem);
INIT_LIST_HEAD(>list);
 
chip->ops = ops;
@@ -174,6 +225,12 @@ static int tpm_dev_add_device(struct tpm_chip *chip)
 static void tpm_dev_del_device(struct tpm_chip *chip)
 {
cdev_del(>cdev);
+
+   /* Make the driver uncallable. */
+   down_write(>ops_sem);
+   chip->ops = NULL;
+   up_write(>ops_sem);
+
device_unregister(>dev);
 }
 
@@ -259,6 +316,9 @@ EXPORT_SYMBOL_GPL(tpm_chip_register);
  * Takes the chip first away from the list of available TPM chips and then
  * cleans up all the resources reserved by tpm_chip_register().
  *
+ * Once

[PATCH v7 6/8] x86: tsc: Always Running Timer (ART) correlated clocksource

2016-02-12 Thread Christopher S. Hall

On modern Intel systems TSC is derived from the new Always Running Timer
(ART). ART can be captured simultaneous to the capture of
audio and network device clocks, allowing a correlation between timebases
to be constructed. Upon capture, the driver converts the captured ART
value to the appropriate system clock using the correlated clocksource
mechanism.

On systems that support ART a new CPUID leaf (0x15) returns parameters
“m” and “n” such that:

TSC_value = (ART_value * m) / n + k [n >= 2]

[k is an offset that can adjusted by a privileged agent. The
IA32_TSC_ADJUST MSR is an example of an interface to adjust k.
See 17.14.4 of the Intel SDM for more details]

Signed-off-by: Christopher S. Hall 
[jstultz: Tweaked to fix build issue, also reworked math for
64bit division on 32bit systems]
Signed-off-by: John Stultz 
---
 arch/x86/include/asm/cpufeature.h |  3 ++-
 arch/x86/include/asm/tsc.h|  2 ++
 arch/x86/kernel/cpu/scattered.c   |  1 +
 arch/x86/kernel/tsc.c | 50 +++
 4 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 7ad8c94..111b892 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -85,7 +85,7 @@
 #define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */
 #define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */
 #define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */
-/* free, was #define X86_FEATURE_FXSAVE_LEAK ( 3*32+10) * "" FXSAVE leaks 
FOP/FIP/FOP */
+#define X86_FEATURE_ART(3*32+10) /* Platform has always 
running timer (ART) */
 #define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */
 #define X86_FEATURE_PEBS   ( 3*32+12) /* Precise-Event Based Sampling */
 #define X86_FEATURE_BTS( 3*32+13) /* Branch Trace Store */
@@ -188,6 +188,7 @@
 
 #define X86_FEATURE_CPB( 7*32+ 2) /* AMD Core Performance 
Boost */
 #define X86_FEATURE_EPB( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS 
support */
+#define X86_FEATURE_INVARIANT_TSC (7*32+4) /* Intel Invariant TSC */
 
 #define X86_FEATURE_HW_PSTATE  ( 7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 6d7c547..174c421 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -29,6 +29,8 @@ static inline cycles_t get_cycles(void)
return rdtsc();
 }
 
+extern struct system_counterval_t convert_art_to_tsc(cycle_t art);
+
 extern void tsc_init(void);
 extern void mark_tsc_unstable(char *reason);
 extern int unsynchronized_tsc(void);
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 8cb57df..af0ecd7 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -35,6 +35,7 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
{ X86_FEATURE_APERFMPERF,   CR_ECX, 0, 0x0006, 0 },
{ X86_FEATURE_EPB,  CR_ECX, 3, 0x0006, 0 },
{ X86_FEATURE_HW_PSTATE,CR_EDX, 7, 0x8007, 0 },
+   { X86_FEATURE_INVARIANT_TSC,CR_EDX, 8, 0x8007, 0 },
{ X86_FEATURE_CPB,  CR_EDX, 9, 0x8007, 0 },
{ X86_FEATURE_PROC_FEEDBACK,CR_EDX,11, 0x8007, 0 },
{ 0, 0, 0, 0, 0 }
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3d743da..0ee3b62 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -43,6 +43,10 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc);
 
 int tsc_clocksource_reliable;
 
+static u32 art_to_tsc_numerator;
+static u32 art_to_tsc_denominator;
+struct clocksource *art_related_clocksource;
+
 /*
  * Use a ring-buffer like data structure, where a writer advances the head by
  * writing a new data entry and a reader advances the tail when it observes a
@@ -949,10 +953,35 @@ static struct notifier_block time_cpufreq_notifier_block 
= {
.notifier_call  = time_cpufreq_notifier
 };
 
+#define ART_CPUID_LEAF (0x15)
+/* The denominator will never be less that 2 */
+#define ART_MIN_DENOMINATOR (2)
+
+
+/*
+ * If ART is present detect the numerator:denominator to convert to TSC
+ */
+static void detect_art(void)
+{
+   unsigned int unused[2];
+
+   if (boot_cpu_data.cpuid_level >= ART_CPUID_LEAF) {
+   cpuid(ART_CPUID_LEAF, _to_tsc_denominator,
+ _to_tsc_numerator, unused, unused+1);
+
+   if (boot_cpu_has(X86_FEATURE_INVARIANT_TSC) &&
+   art_to_tsc_denominator >= ART_MIN_DENOMINATOR)
+   set_cpu_cap(_cpu_data, X86_FEATURE_ART);
+   }
+}
+
 static int __init cpufreq_tsc(void)
 {
if (!cpu_has_tsc)
return 0;
+
+   detect_art();
+
if

[PATCH v7 5/8] time: Add history to cross timestamp interface supporting slower devices

2016-02-12 Thread Christopher S. Hall

Another representative use case of time sync and the correlated
clocksource (in addition to PTP noted above) is PTP synchronized
audio.

In a streaming application, as an example, samples will be sent and/or
received by multiple devices with a presentation time that is in terms
of the PTP master clock. Synchronizing the audio output on these
devices requires correlating the audio clock with the PTP master
clock. The more precise this correlation is, the better the audio
quality (i.e. out of sync audio sounds bad).

>From an application standpoint, to correlate the PTP master clock with
the audio device clock, the system clock is used as a intermediate
timebase. The transforms such an application would perform are:

System Clock <-> Audio clock
System Clock <-> Network Device Clock [<-> PTP Master Clock]

Modern Intel platforms can perform a more accurate cross timestamp in
hardware (ART,audio device clock).  The audio driver requires
ART->system time transforms -- the same as required for the network
driver. These platforms offload audio processing (including
cross-timestamps) to a DSP which to ensure uninterrupted audio
processing, communicates and response to the host only once every
millsecond. As a result is takes up to a millisecond for the DSP to
receive a request, the request is processed by the DSP, the audio
output hardware is polled for completion, the result is copied into
shared memory, and the host is notified. All of these operation occur
on a millisecond cadence.  This transaction requires about 2 ms, but
under heavier workloads it may take up to 4 ms.

Adding a history allows these slow devices the option of providing an
ART value outside of the current interval. In this case, the callback
provided is an accessor function for the previously obtained counter
value. If get_system_device_crosststamp() receives a counter value
previous to cycle_last, it consults the history provided as an
argument in history_ref and interpolates the realtime and monotonic
raw system time using the provided counter value. If there are any
clock discontinuities, e.g. from calling settimeofday(), the monotonic
raw time is interpolated in the usual way, but the realtime clock time
is adjusted by scaling the monotonic raw adjustment.

When an accessor function is used a history argument *must* be
provided. The history is initialized using ktime_get_snapshot() and
must be called before the counter values are read.

Signed-off-by: Christopher S. Hall 
Signed-off-by: John Stultz 
---
 include/linux/timekeeper_internal.h |   2 +
 include/linux/timekeeping.h |   5 ++
 kernel/time/timekeeping.c   | 172 +++-
 3 files changed, 178 insertions(+), 1 deletion(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index 2524722..e880054 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -50,6 +50,7 @@ struct tk_read_base {
  * @offs_tai:  Offset clock monotonic -> clock tai
  * @tai_offset:The current UTC to TAI offset in seconds
  * @clock_was_set_seq: The sequence number of clock was set events
+ * @cs_was_changed_seq:The sequence number of clocksource change events
  * @next_leap_ktime:   CLOCK_MONOTONIC time value of a pending leap-second
  * @raw_time:  Monotonic raw base time in timespec64 format
  * @cycle_interval:Number of clock cycles in one NTP interval
@@ -91,6 +92,7 @@ struct timekeeper {
ktime_t offs_tai;
s32 tai_offset;
unsigned intclock_was_set_seq;
+   u8  cs_was_changed_seq;
ktime_t next_leap_ktime;
struct timespec64   raw_time;
 
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 75bb836..8b90d06 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -272,11 +272,15 @@ extern void ktime_get_raw_and_real_ts64(struct timespec64 
*ts_raw,
  * @cycles:Clocksource counter value to produce the system times
  * @real:  Realtime system time
  * @raw:   Monotonic raw system time
+ * @clock_was_set_seq: The sequence number of clock was set events
+ * @cs_was_changed_seq:The sequence number of clocksource change events
  */
 struct system_time_snapshot {
cycles_tcycles;
ktime_t real;
ktime_t raw;
+   unsigned intclock_was_set_seq;
+   u8  cs_was_changed_seq;
 };
 
 /*
@@ -312,6 +316,7 @@ extern int get_device_system_crosststamp(
struct system_counterval_t *system_counterval,
void *ctx),
void *ctx,
+   struct system_time_snapshot *history,
struct system_device_crosststamp *xtstamp);
 
 /*
diff --git a/kernel/time/timekeeping.c

[PATCH v7 7/8] ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping

2016-02-12 Thread Christopher S. Hall

Currently, network /system cross-timestamping is performed in the
PTP_SYS_OFFSET ioctl. The PTP clock driver reads gettimeofday() and
the gettime64() callback provided by the driver. The cross-timestamp
is best effort where the latency between the capture of system time
(getnstimeofday()) and the device time (driver callback) may be
significant.

The getcrosststamp() callback and corresponding PTP_SYS_OFFSET_PRECISE
ioctl allows the driver to perform this device/system correlation when
for example cross timestamp hardware is available. Modern Intel
systems can do this for onboard Ethernet controllers using the ART
counter. There is virtually zero latency between captures of the ART
and network device clock.

The capabilities ioctl (PTP_CLOCK_GETCAPS), is augmented allowing
applications to query whether or not drivers implement the
getcrosststamp callback, providing more precise cross timestamping.

Acked-by: Richard Cochran 
Signed-off-by: Christopher S. Hall 
[jstultz: Commit subject tweaks]
Signed-off-by: John Stultz 
---
 Documentation/ptp/testptp.c  |  6 --
 drivers/ptp/ptp_chardev.c| 27 +++
 include/linux/ptp_clock_kernel.h |  8 
 include/uapi/linux/ptp_clock.h   | 13 -
 4 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
index 6c6247a..d99012f 100644
--- a/Documentation/ptp/testptp.c
+++ b/Documentation/ptp/testptp.c
@@ -277,13 +277,15 @@ int main(int argc, char *argv[])
   "  %d external time stamp channels\n"
   "  %d programmable periodic signals\n"
   "  %d pulse per second\n"
-  "  %d programmable pins\n",
+  "  %d programmable pins\n"
+  "  %d cross timestamping\n",
   caps.max_adj,
   caps.n_alarm,
   caps.n_ext_ts,
   caps.n_per_out,
   caps.pps,
-  caps.n_pins);
+  caps.n_pins,
+  caps.cross_timestamping);
}
}
 
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index da7bae9..579fd65 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ptp_private.h"
 
@@ -120,11 +121,13 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
struct ptp_clock_caps caps;
struct ptp_clock_request req;
struct ptp_sys_offset *sysoff = NULL;
+   struct ptp_sys_offset_precise precise_offset;
struct ptp_pin_desc pd;
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
struct ptp_clock_info *ops = ptp->info;
struct ptp_clock_time *pct;
struct timespec64 ts;
+   struct system_device_crosststamp xtstamp;
int enable, err = 0;
unsigned int i, pin_index;
 
@@ -138,6 +141,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
caps.n_per_out = ptp->info->n_per_out;
caps.pps = ptp->info->pps;
caps.n_pins = ptp->info->n_pins;
+   caps.cross_timestamping = ptp->info->getcrosststamp != NULL;
if (copy_to_user((void __user *)arg, , sizeof(caps)))
err = -EFAULT;
break;
@@ -180,6 +184,29 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
err = ops->enable(ops, , enable);
break;
 
+   case PTP_SYS_OFFSET_PRECISE:
+   if (!ptp->info->getcrosststamp) {
+   err = -EOPNOTSUPP;
+   break;
+   }
+   err = ptp->info->getcrosststamp(ptp->info, );
+   if (err)
+   break;
+
+   ts = ktime_to_timespec64(xtstamp.device);
+   precise_offset.device.sec = ts.tv_sec;
+   precise_offset.device.nsec = ts.tv_nsec;
+   ts = ktime_to_timespec64(xtstamp.sys_realtime);
+   precise_offset.sys_realtime.sec = ts.tv_sec;
+   precise_offset.sys_realtime.nsec = ts.tv_nsec;
+   ts = ktime_to_timespec64(xtstamp.sys_monoraw);
+   precise_offset.sys_monoraw.sec = ts.tv_sec;
+   precise_offset.sys_monoraw.nsec = ts.tv_nsec;
+   if (copy_to_user((void __user *)arg, _offset,
+sizeof(precise_offset)))
+   err = -EFAULT;
+   break;
+
case PTP_SYS_OFFSET:
sysoff = kmalloc(sizeof(*sysoff), GFP_KERNEL);
if (!sysoff) {
diff --git a/include/linux/ptp_clock_kernel.h

[PATCH v7 8/8] net: e1000e: Adds hardware supported cross timestamp on e1000e nic

2016-02-12 Thread Christopher S. Hall

Modern Intel systems supports cross timestamping of the network device
clock and Always Running Timer (ART) in hardware.  This allows the
device time and system time to be precisely correlated. The timestamp
pair is returned through e1000e_phc_get_syncdevicetime() used by
get_system_device_crosststamp().  The hardware cross-timestamp result
is made available to applications through the PTP_SYS_OFFSET_PRECISE
ioctl which calls e1000e_phc_getcrosststamp().

Signed-off-by: Christopher S. Hall 
[jstultz: Reworked to use new interface, commit message tweaks]
Signed-off-by: John Stultz 
---
 drivers/net/ethernet/intel/Kconfig  |  9 +++
 drivers/net/ethernet/intel/e1000e/defines.h |  5 ++
 drivers/net/ethernet/intel/e1000e/ptp.c | 85 +
 drivers/net/ethernet/intel/e1000e/regs.h|  4 ++
 4 files changed, 103 insertions(+)

diff --git a/drivers/net/ethernet/intel/Kconfig 
b/drivers/net/ethernet/intel/Kconfig
index fa593dd..3772f3a 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -83,6 +83,15 @@ config E1000E
  To compile this driver as a module, choose M here. The module
  will be called e1000e.
 
+config E1000E_HWTS
+   bool "Support HW cross-timestamp on PCH devices"
+   default y
+   depends on E1000E && X86
+   ---help---
+Say Y to enable hardware supported cross-timestamping on PCH
+devices. The cross-timestamp is available through the PTP clock
+driver precise cross-timestamp ioctl (PTP_SYS_OFFSET_PRECISE).
+
 config IGB
tristate "Intel(R) 82575/82576 PCI-Express Gigabit Ethernet support"
depends on PCI
diff --git a/drivers/net/ethernet/intel/e1000e/defines.h 
b/drivers/net/ethernet/intel/e1000e/defines.h
index f7c7804..0641c00 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -528,6 +528,11 @@
 #define E1000_RXCW_C  0x2000/* Receive config */
 #define E1000_RXCW_SYNCH  0x4000/* Receive config synch */
 
+/* HH Time Sync */
+#define E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK  0xF000 /* max delay */
+#define E1000_TSYNCTXCTL_SYNC_COMP 0x4000 /* sync complete */
+#define E1000_TSYNCTXCTL_START_SYNC0x8000 /* initiate sync */
+
 #define E1000_TSYNCTXCTL_VALID 0x0001 /* Tx timestamp valid */
 #define E1000_TSYNCTXCTL_ENABLED   0x0010 /* enable Tx timestamping */
 
diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c 
b/drivers/net/ethernet/intel/e1000e/ptp.c
index 25a0ad5..e2ff3ef 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -26,6 +26,12 @@
 
 #include "e1000.h"
 
+#ifdef CONFIG_E1000E_HWTS
+#include 
+#include 
+#include 
+#endif
+
 /**
  * e1000e_phc_adjfreq - adjust the frequency of the hardware clock
  * @ptp: ptp clock structure
@@ -98,6 +104,78 @@ static int e1000e_phc_adjtime(struct ptp_clock_info *ptp, 
s64 delta)
return 0;
 }
 
+#ifdef CONFIG_E1000E_HWTS
+#define MAX_HW_WAIT_COUNT (3)
+
+/**
+ * e1000e_phc_get_syncdevicetime - Callback given to timekeeping code reads 
system/device registers
+ * @device: current device time
+ * @system: system counter value read synchronously with device time
+ * @ctx: context provided by timekeeping code
+ *
+ * Read device and system (ART) clock simultaneously and return the corrected
+ * clock values in ns.
+ **/
+static int e1000e_phc_get_syncdevicetime(ktime_t *device,
+struct system_counterval_t *system,
+void *ctx)
+{
+   struct e1000_adapter *adapter = (struct e1000_adapter *)ctx;
+   struct e1000_hw *hw = >hw;
+   unsigned long flags;
+   int i;
+   u32 tsync_ctrl;
+   cycle_t dev_cycles;
+   cycle_t sys_cycles;
+
+   tsync_ctrl = er32(TSYNCTXCTL);
+   tsync_ctrl |= E1000_TSYNCTXCTL_START_SYNC |
+   E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK;
+   ew32(TSYNCTXCTL, tsync_ctrl);
+   for (i = 0; i < MAX_HW_WAIT_COUNT; ++i) {
+   udelay(1);
+   tsync_ctrl = er32(TSYNCTXCTL);
+   if (tsync_ctrl & E1000_TSYNCTXCTL_SYNC_COMP)
+   break;
+   }
+
+   if (i == MAX_HW_WAIT_COUNT)
+   return -ETIMEDOUT;
+
+   dev_cycles = er32(SYSSTMPH);
+   dev_cycles <<= 32;
+   dev_cycles |= er32(SYSSTMPL);
+   spin_lock_irqsave(>systim_lock, flags);
+   *device = ns_to_ktime(timecounter_cyc2time(>tc, dev_cycles));
+   spin_unlock_irqrestore(>systim_lock, flags);
+
+   sys_cycles = er32(PLTSTMPH);
+   sys_cycles <<= 32;
+   sys_cycles |= er32(PLTSTMPL);
+   *system = convert_art_to_tsc(sys_cycles);
+
+   return 0;
+}
+
+/**
+ * e1000e_phc_getsynctime - Reads the current system/device cross timestamp
+ * @ptp: ptp clock structure
+ * @cts: structure containing timestamp
+ *
+ *

[PATCH v7 1/8] time: Add cycles to nanoseconds translation

2016-02-12 Thread Christopher S. Hall

The timekeeping code does not currently provide a way to translate
externally provided clocksource cycles to system time. The cycle count
is always provided by the result clocksource read() method internal to
the timekeeping code. The added function timekeeping_cycles_to_ns()
calculated a nanosecond value from a cycle count that can be added to
tk_read_base.base value yielding the current system time. This allows
clocksource cycle values external to the timekeeping code to provide a
cycle count that can be transformed to system time.

Signed-off-by: Christopher S. Hall 
Signed-off-by: John Stultz 
---
 kernel/time/timekeeping.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 34b4ced..4243d28 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -298,17 +298,34 @@ u32 (*arch_gettimeoffset)(void) = 
default_arch_gettimeoffset;
 static inline u32 arch_gettimeoffset(void) { return 0; }
 #endif
 
+static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
+ cycle_t delta)
+{
+   s64 nsec;
+
+   nsec = delta * tkr->mult + tkr->xtime_nsec;
+   nsec >>= tkr->shift;
+
+   /* If arch requires, add in get_arch_timeoffset() */
+   return nsec + arch_gettimeoffset();
+}
+
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
cycle_t delta;
-   s64 nsec;
 
delta = timekeeping_get_delta(tkr);
+   return timekeeping_delta_to_ns(tkr, delta);
+}
 
-   nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
+static inline s64 timekeeping_cycles_to_ns(struct tk_read_base *tkr,
+   cycle_t cycles)
+{
+   cycle_t delta;
 
-   /* If arch requires, add in get_arch_timeoffset() */
-   return nsec + arch_gettimeoffset();
+   /* calculate the delta since the last update_wall_time */
+   delta = clocksource_delta(cycles, tkr->cycle_last, tkr->mask);
+   return timekeeping_delta_to_ns(tkr, delta);
 }
 
 /**
-- 
2.1.4

[PATCH v7 3/8] time: Remove duplicated code in ktime_get_raw_and_real()

2016-02-12 Thread Christopher S. Hall

The code in ktime_get_snapshot() is a superset of the code in
ktime_get_raw_and_real() code. Further, ktime_get_raw_and_real() is
called only by the PPS code, pps_get_ts(). Consolidate the
pps_get_ts() code into a single function calling ktime_get_snapshot()
and eliminate ktime_get_raw_and_real(). A side effect of this is that
the raw and real results of pps_get_ts() correspond to exactly the
same clock cycle. Previously these values represented separate reads
of the system clock.

Signed-off-by: Christopher S. Hall 
Signed-off-by: John Stultz 
---
 include/linux/pps_kernel.h | 17 ++---
 kernel/time/timekeeping.c  | 40 ++--
 2 files changed, 8 insertions(+), 49 deletions(-)

diff --git a/include/linux/pps_kernel.h b/include/linux/pps_kernel.h
index 54bf148..35ac903 100644
--- a/include/linux/pps_kernel.h
+++ b/include/linux/pps_kernel.h
@@ -111,22 +111,17 @@ static inline void timespec_to_pps_ktime(struct pps_ktime 
*kt,
kt->nsec = ts.tv_nsec;
 }
 
-#ifdef CONFIG_NTP_PPS
-
 static inline void pps_get_ts(struct pps_event_time *ts)
 {
-   ktime_get_raw_and_real_ts64(>ts_raw, >ts_real);
-}
+   struct system_time_snapshot snap;
 
-#else /* CONFIG_NTP_PPS */
-
-static inline void pps_get_ts(struct pps_event_time *ts)
-{
-   ktime_get_real_ts64(>ts_real);
+   ktime_get_snapshot();
+   ts->ts_real = ktime_to_timespec64(snap.real);
+#ifdef CONFIG_NTP_PPS
+   ts->ts_raw = ktime_to_timespec64(snap.raw);
+#endif
 }
 
-#endif /* CONFIG_NTP_PPS */
-
 /* Subtract known time delay from PPS event time(s) */
 static inline void pps_sub_ts(struct pps_event_time *ts, struct timespec64 
delta)
 {
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 89b4695..f1a1c97 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -888,6 +888,8 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
s64 nsec_real;
cycle_t now;
 
+   WARN_ON(timekeeping_suspended);
+
do {
seq = read_seqcount_begin(_core.seq);
 
@@ -905,44 +907,6 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
 }
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
-#ifdef CONFIG_NTP_PPS
-
-/**
- * ktime_get_raw_and_real_ts64 - get day and raw monotonic time in timespec 
format
- * @ts_raw:pointer to the timespec to be set to raw monotonic time
- * @ts_real:   pointer to the timespec to be set to the time of day
- *
- * This function reads both the time of day and raw monotonic time at the
- * same time atomically and stores the resulting timestamps in timespec
- * format.
- */
-void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw, struct timespec64 
*ts_real)
-{
-   struct timekeeper *tk = _core.timekeeper;
-   unsigned long seq;
-   s64 nsecs_raw, nsecs_real;
-
-   WARN_ON_ONCE(timekeeping_suspended);
-
-   do {
-   seq = read_seqcount_begin(_core.seq);
-
-   *ts_raw = tk->raw_time;
-   ts_real->tv_sec = tk->xtime_sec;
-   ts_real->tv_nsec = 0;
-
-   nsecs_raw  = timekeeping_get_ns(>tkr_raw);
-   nsecs_real = timekeeping_get_ns(>tkr_mono);
-
-   } while (read_seqcount_retry(_core.seq, seq));
-
-   timespec64_add_ns(ts_raw, nsecs_raw);
-   timespec64_add_ns(ts_real, nsecs_real);
-}
-EXPORT_SYMBOL(ktime_get_raw_and_real_ts64);
-
-#endif /* CONFIG_NTP_PPS */
-
 /**
  * do_gettimeofday - Returns the time of day in a timeval
  * @tv:pointer to the timeval to be set
-- 
2.1.4

Re: [PATCH v5 1/3] mailbox: Add support for APM X-Gene platform mailbox driver

2016-02-12 Thread Duc Dang

On Wed, Feb 10, 2016 at 6:41 AM, Mathieu Poirier
 wrote:
> On 9 February 2016 at 20:46, Duc Dang  wrote:
>> On Tue, Feb 9, 2016 at 8:40 AM, Mathieu Poirier
>>  wrote:
>>> On 8 February 2016 at 15:04, Duc Dang  wrote:
 X-Gene mailbox controller provides 8 mailbox channels, with
 each channel has a dedicated interrupt line.

 Signed-off-by: Feng Kan 
 Signed-off-by: Duc Dang 
 ---
 Changes since v4:
 - Rebase over v4.5-rc1
 - Fix section mistmatch warning by removing
 __init in slimpro_mbox_probe declaration
 - Correctly print channel number when
 there is no IRQ for that channel

 Changes since v3:
 - Rebase over v4.4
 - Remove 'id' in slimpro_mbox_chan structure
 - Remove small functions that are only called once
 and fold them into the other callers
 - Remove void* pointer type cast
 - Relax the number of mailbox IRQs condition
 - Use subsys_initcall to guarantee mailbox driver
 will be registered before any other dependent driver
 is loaded.

  drivers/mailbox/Kconfig |   9 ++
  drivers/mailbox/Makefile|   2 +
  drivers/mailbox/mailbox-xgene-slimpro.c | 264 
 
  3 files changed, 275 insertions(+)
  create mode 100644 drivers/mailbox/mailbox-xgene-slimpro.c

 diff --git a/drivers/mailbox/Kconfig b/drivers/mailbox/Kconfig
 index 546d05f..678e434 100644
 --- a/drivers/mailbox/Kconfig
 +++ b/drivers/mailbox/Kconfig
 @@ -85,4 +85,13 @@ config MAILBOX_TEST
   Test client to help with testing new Controller driver
   implementations.

 +config XGENE_SLIMPRO_MBOX
 +   tristate "APM SoC X-Gene SLIMpro Mailbox Controller"
 +   depends on ARCH_XGENE
 +   help
 + An implementation of the APM X-Gene Interprocessor Communication
 + Mailbox (IPCM) between the ARM 64-bit cores and SLIMpro 
 controller.
 + It is used to send short messages between ARM64-bit cores and
 + the SLIMpro Management Engine, primarily for PM. Say Y here if 
 you
 + want to use the APM X-Gene SLIMpro IPCM support.
  endif
 diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
 index 92435ef..b602ef8 100644
 --- a/drivers/mailbox/Makefile
 +++ b/drivers/mailbox/Makefile
 @@ -17,3 +17,5 @@ obj-$(CONFIG_ALTERA_MBOX) += mailbox-altera.o
  obj-$(CONFIG_BCM2835_MBOX) += bcm2835-mailbox.o

  obj-$(CONFIG_STI_MBOX) += mailbox-sti.o
 +
 +obj-$(CONFIG_XGENE_SLIMPRO_MBOX) += mailbox-xgene-slimpro.o
 diff --git a/drivers/mailbox/mailbox-xgene-slimpro.c 
 b/drivers/mailbox/mailbox-xgene-slimpro.c
 new file mode 100644
 index 000..0ea1eb8
 --- /dev/null
 +++ b/drivers/mailbox/mailbox-xgene-slimpro.c
 @@ -0,0 +1,264 @@
 +/*
 + * APM X-Gene SLIMpro MailBox Driver
 + *
 + * Copyright (c) 2015, Applied Micro Circuits Corporation
 + * Author: Feng Kan f...@apm.com
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public License as
 + * published by the Free Software Foundation; either version 2 of
 + * the License, or (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, see .
 + *
 + */
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +#include 
 +
 +#define MBOX_CON_NAME  "slimpro-mbox"
 +#define MBOX_REG_SET_OFFSET0x1000
 +#define MBOX_CNT   8
 +#define MBOX_STATUS_AVAIL_MASK BIT(16)
 +#define MBOX_STATUS_ACK_MASK   BIT(0)
 +
 +/* Configuration and Status Registers */
 +#define REG_DB_IN  0x00
 +#define REG_DB_DIN00x04
 +#define REG_DB_DIN10x08
 +#define REG_DB_OUT 0x10
 +#define REG_DB_DOUT0   0x14
 +#define REG_DB_DOUT1   0x18
 +#define REG_DB_STAT0x20
 +#define REG_DB_STATMASK0x24
 +
 +struct slimpro_mbox_chan {
 +   struct device *dev;
 +   struct mbox_chan *chan;
 +   void __iomem *reg;
 +   int irq;
 +   u32 rx_msg[3];
 +};
 +
 +struct

Re: [RFC] A first shot at asciidoc-based formatted docs

2016-02-12 Thread Keith Packard

Keith Packard  writes:

> The goal would be to create an html document which could be used without
> javascript, and that would work without css as well.

I've managed to hack up asciidoc to generate the TOC within the
document, rather than requiring javascript. The changes are fairly
minor, and seem to add a nice generalization to the asciidoc environment
which should be useful in other contexts.

The changes consist of two bits -- the first is to allow the diversion
of some text from .conf file sections, the second is to postpone some
attribute processing to a second pass over the document so that the TOC
can be inserted in the desired location, instead of requiring that it be
placed at the bottom.

I've sent these changes upstream, and also pushed them to a personal
asciidoc git repository at :

git clone git://keithp.com/git/asciidoc

-- 
-keith

signature.asc
Description: PGP signature

[PATCH 1/3] f2fs: use correct errno

2016-02-12 Thread Jaegeuk Kim

This patch is to fix misused error number.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/segment.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 57a5f7b..47fbb72 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -505,7 +505,7 @@ static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
 
 bool discard_next_dnode(struct f2fs_sb_info *sbi, block_t blkaddr)
 {
-   int err = -ENOTSUPP;
+   int err = -EOPNOTSUPP;
 
if (test_opt(sbi, DISCARD)) {
struct seg_entry *se = get_seg_entry(sbi,
-- 
2.6.3

[PATCH 3/3] f2fs: avoid garbage lenghs in dentries

2016-02-12 Thread Jaegeuk Kim

This patch fixes to eliminate garbage name lengths in dentries in order
to provide correct answers of readdir.

For example, if a valid dentry consists of:
 bitmap : 1   1 1 1
 len: 32  0 x 0,

readdir can start with second bit_pos having len = 0.
Or, it can start with third bit_pos having garbage.

In both of cases, we should avoid to try filling dentries.
So, this patch not only removes any garbage length, but also avoid entering
zero length case in readdir.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/dir.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 8950fc3..ca41b2a 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -511,8 +511,12 @@ void f2fs_update_dentry(nid_t ino, umode_t mode, struct 
f2fs_dentry_ptr *d,
memcpy(d->filename[bit_pos], name->name, name->len);
de->ino = cpu_to_le32(ino);
set_de_type(de, mode);
-   for (i = 0; i < slots; i++)
+   for (i = 0; i < slots; i++) {
test_and_set_bit_le(bit_pos + i, (void *)d->bitmap);
+   /* avoid wrong garbage data for readdir */
+   if (i)
+   (de + i)->name_len = 0;
+   }
 }
 
 /*
@@ -792,6 +796,12 @@ bool f2fs_fill_dentries(struct dir_context *ctx, struct 
f2fs_dentry_ptr *d,
break;
 
de = >dentry[bit_pos];
+   if (de->name_len == 0) {
+   bit_pos++;
+   ctx->pos = start_pos + bit_pos;
+   continue;
+   }
+
if (de->file_type < F2FS_FT_MAX)
d_type = f2fs_filetype_table[de->file_type];
else
-- 
2.6.3

[PATCH 2/3] f2fs crypto: sync with ext4's fname padding

2016-02-12 Thread Jaegeuk Kim

This patch fixes wrong adoption on fname padding.

Signed-off-by: Jaegeuk Kim 
---
 fs/f2fs/crypto_fname.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/crypto_fname.c b/fs/f2fs/crypto_fname.c
index 905c065..73741fb 100644
--- a/fs/f2fs/crypto_fname.c
+++ b/fs/f2fs/crypto_fname.c
@@ -267,13 +267,13 @@ int f2fs_fname_crypto_alloc_buffer(struct inode *inode,
   u32 ilen, struct f2fs_str *crypto_str)
 {
unsigned int olen;
-   int padding = 16;
+   int padding = 32;
struct f2fs_crypt_info *ci = F2FS_I(inode)->i_crypt_info;
 
if (ci)
padding = 4 << (ci->ci_flags & F2FS_POLICY_FLAGS_PAD_MASK);
-   if (padding < F2FS_CRYPTO_BLOCK_SIZE)
-   padding = F2FS_CRYPTO_BLOCK_SIZE;
+   if (ilen < F2FS_CRYPTO_BLOCK_SIZE)
+   ilen = F2FS_CRYPTO_BLOCK_SIZE;
olen = f2fs_fname_crypto_round_up(ilen, padding);
crypto_str->len = olen;
if (olen < F2FS_FNAME_CRYPTO_DIGEST_SIZE * 2)
-- 
2.6.3

Re: [PATCH v2 0/2] DAX bdev fixes - move flushing calls to FS

2016-02-12 Thread Dave Chinner

On Fri, Feb 12, 2016 at 12:03:20PM -0700, Ross Zwisler wrote:
> On Thu, Feb 11, 2016 at 01:43:04PM +0100, Jan Kara wrote:
> > On Wed 10-02-16 13:48:54, Ross Zwisler wrote:
> > > 3) In filemap_write_and_wait() and filemap_write_and_wait_range(), 
> > > continue
> > > the writeback in the case that DAX is enabled but we only have a nonzero
> > > mapping->nrpages.  As with 1) and 2), I believe this is necessary to
> > > properly writeback metadata changes.  If this sounds wrong, please let me
> > > know and I'll get more info.
> > 
> > And I'm surprised here as well. If there are dax_mapping() inodes that have
> > pagecache pages, then we have issues with radix tree handling as well. So
> > how come dax_mapping() inodes have pages attached? If it is about block
> > device inodes, then I find it buggy, that S_DAX gets set for such inodes
> > when filesystem is mounted on them because in such cases we are IMO asking
> > for data corruption sooner rather than later...
> 
> I think I've figured this one out, at least partially.
> 
> For ext2 the issues I was seeing were due to the fact that directory inodes
> have S_DAX set, but have dirty page cache pages.   In testing with
> generic/002, I see two ext2 inodes with S_DAX trying to do a writeback while
> they have dirty page cache pages.  The first has i_ino=2, which is the
> EXT2_ROOT_INO.

> As far as I can see, XFS does not have these issues - returning immediately
> having done just the DAX writeback in xfs_vm_writepages() lets all my xfstests
> pass.

XFS will not have issues because it does not dirty directory inodes
at the VFS level, nor does it use the page cache for directory data.
However, looking at the code I think it does still set S_DAX on
directory inodes, which it shouldn't be doing.

I've got a couple of fixes I need to do in this area - hopefully
I'll get it done on Monday.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH v2 0/2] clk: sunxi: Fix APBS clock for Allwinner A80

2016-02-12 Thread Chen-Yu Tsai

Hi,

On Fri, Feb 12, 2016 at 5:31 PM, Chen-Yu Tsai  wrote:
> Hi everyone,
>
> This is v2 of the A80 APBS clock fixes series.
>
> When I did the A80 PRCM support, I failed to notice the A80's APBS clock
> was not the same as the A23's APB0 clock. The former is a zero-based
> divider, while the latter is a power-of-two divider. But the lowest 2
> dividers are the same.

It turns out that clk-sun8i-a23-apb0 was wrong. It's a zero-based divider,
not a power-of-two divider as the driver says. Thanks to Vishnu for pointing
it out. So NACK on this series.

I'll send a patch fixing this, but what should we do for old kernels?
I don't think backporting the CLK_OF_DECLARE changes is acceptable.
Maybe a stable patch based on the fix?

Sorry for the noise.

ChenYu

> The hardware defaults to the lowest setting, or a /1 divider. Since the
> child gates do not propagate clk_set_rate up, and no consumers here do
> clk_set_rate, this actually works.
>
> I realized my mistake while reviewing the A83T's PRCM patches. The A83T
> shares the same PRCM clocks as the A80.
>
> Maxime, since this was introduced in 4.5-rc1, please apply this series
> for 4.5 so we fix it before the release.
>
>
> Changes since v1:
>
>   - Replace the CLK_OF_DECLARE version of sun8i-a23-apb0-clk with the
> A80 APBS version, instead of writing a new driver.
>
> Regards
> ChenYu
>
>
> Chen-Yu Tsai (2):
>   clk: sunxi: Add support for A80 APBS clock
>   ARM: dts: sun9i: Fix apbs clock compatible
>
>  Documentation/devicetree/bindings/clock/sunxi.txt |  1 +
>  arch/arm/boot/dts/sun9i-a80.dtsi  |  2 +-
>  drivers/clk/sunxi/clk-sun8i-apb0.c| 23 
> ---
>  3 files changed, 10 insertions(+), 16 deletions(-)
>
> --
> 2.7.0
>

Re: [RFC v2b 5/5] fs: xfs: change inode times to use vfs_time data type

2016-02-12 Thread Dave Chinner

On Fri, Feb 12, 2016 at 01:45:49AM -0800, Deepa Dinamani wrote:
> This is in preparation for changing VFS inode timestamps to
> use 64 bit time.
> The VFS inode timestamps are not y2038 safe as they use
> struct timespec. These will be changed to use struct timespec64
> instead and that is y2038 safe.
> But, since the above data type conversion will break the
> end file systems, use vfs_time functions to access inode times.
> 
> current_fs_time() will change along with vfs timestamp data
> type changes.
> 
> xfs_vn_update_time() is a .update callback for inode operations
> and this needs to change along with vfs inode times.

This code is all different in the current XFS for-next branch.
XFS no longer has it's own internal timestamps - it only uses the
timestamps in the struct inode now.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com

Re: [PATCH 2/3] tpm: Get rid of chip->pdev

2016-02-12 Thread Jason Gunthorpe

On Fri, Feb 12, 2016 at 08:31:21PM -0500, Stefan Berger wrote:

> The vtpm driver will introduce chip->priv, which will point to
> vtpm_dev. For

Why not just use chip->vendor.priv? Aka TPM_VPRIV

> this reason we need to hold a reference to the vtpm_dev->dev in the
> front end.

Yes, but all drivers are like this. Most will just kfree their priv immediately

All sane Linux core subsystems guarentee that after their unregister
returns the driver callbacks will be done and uncallable, it is a bug
that tpm does not do this.

> So we could optimize it:
> 
> if (chip->priv)
> get_device(chip->dev.parent);

That doesn't address the race

Jason

Re: [PATCH net-next v8 05/19] net: ethtool: add new ETHTOOL_GSETTINGS/SSETTINGS API

2016-02-12 Thread Ben Hutchings

On Tue, 2016-02-09 at 16:29 -0800, David Decotigny wrote:
> From: David Decotigny 
> 
> This patch defines a new ETHTOOL_GSETTINGS/SSETTINGS API, handled by
> the new get_ksettings/set_ksettings callbacks. This API provides
> support for most legacy ethtool_cmd fields, adds support for larger
> link mode masks (up to 4064 bits, variable length), and removes
> ethtool_cmd deprecated fields (transceiver/maxrxpkt/maxtxpkt).
[...]

I previously asked you to include 'link' in the command names and
structure name.  This would clarify that these are now only for link
settings and reduce the risk of confusion between old and new commands.
However, you didn't reply to that review.  Do you have any objection to
doing this?

Ben.

-- 
Ben Hutchings
Sturgeon's Law: Ninety percent of everything is crap.

signature.asc
Description: This is a digitally signed message part

Re: [RFC PATCH 1/7] arm64/perf: Basic uncore counter support for Cavium ThunderX

2016-02-12 Thread David Daney


On 02/12/2016 09:36 AM, Mark Rutland wrote:

On Fri, Feb 12, 2016 at 05:55:06PM +0100, Jan Glauber wrote:

[...]

2) Counters are summarized across the different units of the same type,
e.g. L2C TAD 0..7 is presented as a single counter (adding the
values from TAD 0 to 7). Although losing the ability to read a
single value the merged values are easier to use and yield
enough information.


I'm not sure I follow this. What is easier? What are you doing, and what
are you comparing that with to say that your approach is easier?

It sounds like it should be possible to handle multiple counters like
this, so I don't follow why you want to amalgamate them in-kernel.



The values of the individual counters are close to meaningless.  The 
only thing that is meaningful to someone reading the counters is the 
aggregate sum of all the counts.




[...]


+#include 
+#include 


I don't see why you should need these two if this is truly an uncore
device probed solely from PCI.


+void thunder_uncore_read(struct perf_event *event)
+{
+   struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+   struct hw_perf_event *hwc = >hw;
+   u64 prev, new = 0;
+   s64 delta;
+   int i;
+
+   /*
+* since we do not enable counter overflow interrupts,
+* we do not have to worry about prev_count changing on us
+*/


Without overflow interrupts, how do you ensure that you account for
overflow in a reasonable time window (i.e. before the counter runs past
its initial value)?


Two reasons:

  1) There are no interrupts.

  2) The counters are 64-bit, they never overflow.




+
+   prev = local64_read(>prev_count);
+
+   /* read counter values from all units */
+   for (i = 0; i < uncore->nr_units; i++)
+   new += readq(map_offset(hwc->event_base, uncore, i));


There's no bit to determine whether an overflow occurred?


No.





+
+   local64_set(>prev_count, new);
+   delta = new - prev;
+   local64_add(delta, >count);
+}
+
+void thunder_uncore_del(struct perf_event *event, int flags)
+{
+   struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+   struct hw_perf_event *hwc = >hw;
+   int i;
+
+   event->pmu->stop(event, PERF_EF_UPDATE);
+
+   for (i = 0; i < uncore->num_counters; i++) {
+   if (cmpxchg(>events[i], event, NULL) == event)
+   break;
+   }
+   hwc->idx = -1;
+}


Why not just place the event at uncode->events[hwc->idx] ?

Theat way removing the event is trivial.


+int thunder_uncore_event_init(struct perf_event *event)
+{
+   struct hw_perf_event *hwc = >hw;
+   struct thunder_uncore *uncore;
+
+   if (event->attr.type != event->pmu->type)
+   return -ENOENT;
+
+   /* we do not support sampling */
+   if (is_sampling_event(event))
+   return -EINVAL;
+
+   /* counters do not have these bits */
+   if (event->attr.exclude_user ||
+   event->attr.exclude_kernel   ||
+   event->attr.exclude_host ||
+   event->attr.exclude_guest||
+   event->attr.exclude_hv   ||
+   event->attr.exclude_idle)
+   return -EINVAL;


We should _really_ make these features opt-in at the core level. It's
crazy that each and every PMU drivers has to explicitly test and reject
things it doesn't support.


+
+   /* and we do not enable counter overflow interrupts */


That statement raises far more questions than it answers.

_why_ do we not user overflow interrupts?


As stated above, there are *no* overflow interrupts.

The events we are counting cannot be attributed to any one (or even any) 
CPU, they run asynchronous to the CPU, so even if there were interrupts, 
they would be meaningless.



David Daney

[PATCH] selftests: add a new test for Media Controller API

2016-02-12 Thread Shuah Khan

This test opens user specified Media Device and calls
MEDIA_IOC_DEVICE_INFO ioctl in a loop once every 10
seconds. This test is for detecting errors in device
removal path.

Usage:
sudo ./media_devkref_test -d /dev/mediaX

While test is running, remove the device and
ensure there are no use after free errors and
other Oops in the dmesg. Enable KaSan kernel
config option for use-after-free error detection.

Signed-off-by: Shuah Khan 
---
 tools/testing/selftests/media_tests/.gitignore |  1 +
 tools/testing/selftests/media_tests/Makefile   |  7 ++
 .../selftests/media_tests/media_device_test.c  | 94 ++
 3 files changed, 102 insertions(+)
 create mode 100644 tools/testing/selftests/media_tests/.gitignore
 create mode 100644 tools/testing/selftests/media_tests/Makefile
 create mode 100644 tools/testing/selftests/media_tests/media_device_test.c

diff --git a/tools/testing/selftests/media_tests/.gitignore 
b/tools/testing/selftests/media_tests/.gitignore
new file mode 100644
index 000..1c07117
--- /dev/null
+++ b/tools/testing/selftests/media_tests/.gitignore
@@ -0,0 +1 @@
+media_device_test
diff --git a/tools/testing/selftests/media_tests/Makefile 
b/tools/testing/selftests/media_tests/Makefile
new file mode 100644
index 000..7071bcc
--- /dev/null
+++ b/tools/testing/selftests/media_tests/Makefile
@@ -0,0 +1,7 @@
+TEST_PROGS := media_device_test
+all: $(TEST_PROGS)
+
+include ../lib.mk
+
+clean:
+   rm -fr media_device_test
diff --git a/tools/testing/selftests/media_tests/media_device_test.c 
b/tools/testing/selftests/media_tests/media_device_test.c
new file mode 100644
index 000..a47880b
--- /dev/null
+++ b/tools/testing/selftests/media_tests/media_device_test.c
@@ -0,0 +1,94 @@
+/*
+ * media_devkref_test.c - Media Controller Device Kref API Test
+ *
+ * Copyright (c) 2016 Shuah Khan 
+ * Copyright (c) 2016 Samsung Electronics Co., Ltd.
+ *
+ * This file is released under the GPLv2.
+ */
+
+/*
+ * This file adds a test for Media Controller API.
+ * This test should be run as root and should not be
+ * included in the Kselftest run. This test should be
+ * run when hardware and driver that makes use Media
+ * Controller API are present in the system.
+ *
+ * This test opens user specified Media Device and calls
+ * MEDIA_IOC_DEVICE_INFO ioctl in a loop once every 10
+ * seconds.
+ *
+ * Usage:
+ * sudo ./media_devkref_test -d /dev/mediaX
+ *
+ * While test is running, remove the device and
+ * ensure there are no use after free errors and
+ * other Oops in the dmesg. Enable KaSan kernel
+ * config option for use-after-free error detection.
+*/
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int main(int argc, char **argv)
+{
+   int opt;
+   char media_device[256];
+   int count = 0;
+   struct media_device_info mdi;
+   int ret;
+   int fd;
+
+   if (argc < 2) {
+   printf("Usage: %s [-d ]\n", argv[0]);
+   exit(-1);
+   }
+
+   /* Process arguments */
+   while ((opt = getopt(argc, argv, "d:")) != -1) {
+   switch (opt) {
+   case 'd':
+   strncpy(media_device, optarg, sizeof(media_device) - 1);
+   media_device[sizeof(media_device)-1] = '\0';
+   break;
+   default:
+   printf("Usage: %s [-d ]\n", argv[0]);
+   exit(-1);
+   }
+   }
+
+   if (getuid() != 0) {
+   printf("Please run the test as root - Exiting.\n");
+   exit(-1);
+   }
+
+   /* Open Media device and keep it open */
+   fd = open(media_device, O_RDWR);
+   if (fd == -1) {
+   printf("Media Device open errno %s\n", strerror(errno));
+   exit(-1);
+   }
+
+   printf("\nNote:\n"
+  "While test is running, remove the device and\n"
+  "ensure there are no use after free errors and\n"
+  "other Oops in the dmesg. Enable KaSan kernel\n"
+  "config option for use-after-free error detection.\n\n");
+
+   while (count < 100) {
+   ret = ioctl(fd, MEDIA_IOC_DEVICE_INFO, );
+   if (ret < 0)
+   printf("Media Device Info errno %s\n", strerror(errno));
+   printf("Media device model %s driver %s\n",
+   mdi.model, mdi.driver);
+   sleep(10);
+   count++;
+   }
+}
-- 
2.5.0

Re: [PATCH net-next v8 02/19] test_bitmap: unit tests for lib/bitmap.c

2016-02-12 Thread Ben Hutchings

On Tue, 2016-02-09 at 16:29 -0800, David Decotigny wrote:
> From: David Decotigny 
> 
> This is mainly testing bitmap construction and conversion to/from u32[]
> for now.
> 
> Tested:
>   qemu i386, x86_64, ppc, ppc64 BE and LE, ARM.
> 
> Signed-off-by: David Decotigny 
[...]
> diff --git a/tools/testing/selftests/lib/bitmap.sh 
> b/tools/testing/selftests/lib/bitmap.sh
> new file mode 100644

This needs to have mode 755.

Ben.

> index 000..2da187b
> --- /dev/null
> +++ b/tools/testing/selftests/lib/bitmap.sh
> @@ -0,0 +1,10 @@
> +#!/bin/sh
> +# Runs bitmap infrastructure tests using test_bitmap kernel module
> +
> +if /sbin/modprobe -q test_bitmap; then
> + /sbin/modprobe -q -r test_bitmap
> + echo "bitmap: ok"
> +else
> + echo "bitmap: [FAIL]"
> + exit 1
> +fi
-- 
Ben Hutchings
Sturgeon's Law: Ninety percent of everything is crap.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH v8 8/8] livepatch: Detect offset for the ftrace location during build

2016-02-12 Thread Balbir Singh

On Fri, 2016-02-12 at 17:45 +0100, Petr Mladek wrote:
> On Sat 2016-02-13 03:13:29, Balbir Singh wrote:
> > On Thu, 2016-01-28 at 16:32 +0100, Torsten Duwe wrote:
> > > From: Petr Mladek 
> > > 
> > > Livepatch works on x86_64 and s390 only when the ftrace call
> > > is at the very beginning of the function. But PPC is different.
> > > We need to handle TOC and save LR there before calling the
> > > global ftrace handler.
> > > 
> > > Now, the problem is that the extra operations have different
> > > length on PPC depending on the used gcc version. It is
> > > 4 instructions (16 bytes) before gcc-6 and only 3 instructions
> > > (12 bytes) with gcc-6.
> > > 
> > > This patch tries to detect the offset a generic way during
> > > build. It assumes that the offset of the ftrace location
> > > is the same for all functions. It modifies the existing
> > > recordmcount tool that is able to find read mcount locations
> > > directly from the object files. It adds an option -p
> > > to print the first found offset.
> > > 
> > > The recordmcount tool is then used in the kernel/livepatch
> > > subdirectory to generate a header file. It defines
> > > a constant that is used to compute the ftrace location
> > > from the function address.
> > > 
> > > Finally, we have to enable the C implementation of the
> > > recordmcount tool to be used on PPC and S390. It seems
> > > to work fine there. It should be more reliable because
> > > it reads the standardized elf structures. The old perl
> > > implementation uses rather complex regular expressions
> > > to parse objdump output and is therefore much more tricky.
> > 
> > I'm still missing something, I'm getting offset as 8
> > 
> > When I run, I get
> > 
> > scripts/recordmcount -p kernel/livepatch/core.o 
> > #define KLP_FTRACE_LOCATION 8
> > 
> > scripts/recordmcount -p kernel/livepatch/ftrace-test.o 
> > #define KLP_FTRACE_LOCATION 8
> > 
> > My sample fails as well, since the expected offset is 16.
> > I guess the script is being run against a not so good
> > test.
> 
> I guess that you used a broken gcc and cheated the check
> to pass the compilation. Did you, please?
> 
> The test used to detect the offset is using a minimalistic
> function is is afftected by the gcc bug.
> 
> The patch below might be used to cheat the offset check as well.
> 
> Torsten, please mention this somewhere if you, just by chance,
> send a new version of the patchset.
> 
> From f6a438a3f2f60cc1acc859b41d0cc9259c9a331e Mon Sep 17 00:00:00 2001
> From: root 
> Date: Tue, 2 Feb 2016 15:35:06 +0100
> Subject: [PATCH 2/2] livepatch: Make sure the TOC is handled when detecting
>  ftrace location
> 
> There seems to be a bug in gcc on PPC. It does not handle TOC
> if the function does not access global variables or functions
> by default. But it should when profiling is enabled.
> 

Yep.. Please see see http://marc.info/?l=linux-kernel=145518015816435=2
and my question at http://marc.info/?l=linuxppc-embedded=145518330317496=2

> This patch works around this problem by adding a call
> to a global function.
> 
> This patch is for testing only!
> ---
>  kernel/livepatch/ftrace-test.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/livepatch/ftrace-test.c b/kernel/livepatch/ftrace-test.c
> index 22f0c54bf7b3..a3b7aabb67e5 100644
> --- a/kernel/livepatch/ftrace-test.c
> +++ b/kernel/livepatch/ftrace-test.c
> @@ -1,6 +1,9 @@
>  /* Sample code to figure out mcount location offset */
> +#include 
> +
>  
>  int test(int a)
>  {
> + printk("%d\n", a);
>   return ++a;
>  }

This is much better, I see the offset of 16.

Balbir Singh

Re: [PATCH v42 5/6] clk: clk_put WARNs if user has not disabled clk

2016-02-12 Thread Stephen Boyd

On 02/11, Michael Turquette wrote:
> >From the clk_put kerneldoc in include/linux/clk.h:
> 
> """
> Note: drivers must ensure that all clk_enable calls made on this clock
> source are balanced by clk_disable calls prior to calling this function.
> """
> 
> The common clock framework implementation of the clk.h api has per-user
> reference counts for calls to clk_prepare and clk_disable. As such it
> can enforce the requirement to properly call clk_disable and
> clk_unprepare before calling clk_put.
> 
> Because this requirement is probably violated in many places, this patch
> starts with a simple warning. Once offending code has been fixed this
> check could additionally release the reference counts automatically.

Do we have any fixes for pm code in the works? I'm worried we're
going to be giving a warning and nobody will fix them or has a
plan to fix them.

> 
> Signed-off-by: Michael Turquette 
> ---

Reviewed-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH v42 4/6] clk: per-user clk prepare & enable ref counts

2016-02-12 Thread Stephen Boyd

On 02/11, Michael Turquette wrote:
> This patch adds prepare and enable reference counts for the per-user
> handles that clock consumers have for a clock node. This patch warns if
> an imbalance occurs while trying to disable or unprepare a clock and
> aborts, leaving the hardware unaffected.
> 
> Signed-off-by: Michael Turquette 
> ---

Reviewed-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH v42 2/6] clk: WARN_ON about to disable a critical clock

2016-02-12 Thread Stephen Boyd

On 02/11, Michael Turquette wrote:
> From: Lee Jones 
> 
> Signed-off-by: Lee Jones 
> Signed-off-by: Michael Turquette 
> ---

Reviewed-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH 3/3] tpm: Get rid of devname

2016-02-12 Thread Jason Gunthorpe

On Sat, Feb 13, 2016 at 09:01:06AM +0800, kbuild test robot wrote:

> url:
> https://github.com/0day-ci/linux/commits/Jason-Gunthorpe/tpm-Hold-the-kref-during-tpm_chip_find_get/20160213-080824
> config: xtensa-allyesconfig (attached as .config)
> reproduce:
> wget 
> https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
>  -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> make.cross ARCH=xtensa 
> 
> All warnings (new ones prefixed by >>):
>
>drivers/char/tpm/tpm-chip.c: In function 'tpm1_chip_register':
> >> drivers/char/tpm/tpm-chip.c:193:19: warning: passing argument 1 of 
> >> 'tpm_bios_log_setup' discards 'const' qualifier from pointer target type
>  chip->bios_dir = tpm_bios_log_setup(dev_name(>dev));
>   ^
>In file included from drivers/char/tpm/tpm-chip.c:30:0:
>drivers/char/tpm/tpm_eventlog.h:83:31: note: expected 'char *' but 
> argument is of type 'const char *'
> static inline struct dentry **tpm_bios_log_setup(char *name)

Got it, thanks, didn't notice that kconfig variation.

Jason

Re: [PATCH v42 1/6] clk: Allow clocks to be marked as CRITICAL

2016-02-12 Thread Stephen Boyd

On 02/11, Michael Turquette wrote:
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index b4db67a..993f775 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -2484,6 +2484,11 @@ static int __clk_init(struct device *dev, struct clk 
> *clk_user)
>   if (core->ops->init)
>   core->ops->init(core->hw);
>  
> + if (core->flags & CLK_IS_CRITICAL) {
> + clk_core_prepare(core);
> + clk_core_enable(core);
> + }

What do we do if this is an orphan clk? From what I can tell
we're not going to increment the ref count on the parents that
may or may not appear at some later time when this flag is set.
Furthermore, do we want to propagate the CLK_IS_CRITICAL flag up
to all the parent clocks so that the warning mechanism spits out
errors for parent clocks? I suppose that may not be very useful
assuming refcounts are correct, but it may be useful to know
which clocks are critical and which ones aren't during debug.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH 2/3] tpm: Get rid of chip->pdev

2016-02-12 Thread Jason Gunthorpe

On Fri, Feb 12, 2016 at 07:37:10PM -0500, Stefan Berger wrote:
>Jason Gunthorpe  wrote on 02/12/2016
>07:04:30 PM:
>>
>> This is a hold over from before the struct device conversion.
>>
>> - All prints should be using >dev, which is the Linux
>>   standard. This changes prints to use tpm0 as the device name,
>>   not the PnP/etc ID.
>> - The few places involving sysfs/modules that really do need the
>>   parent just use chip->dev.parent instead
>> - We no longer need to get_device(pdev) in any places since it is no
>>   longer used by any of the code. The kref on the parent is held
>>   by the device core during device_add and dropped in device_del
>That is exactly what was needed for the vtpm driver and now you're
>removing it. Is that still going to work after this change? Or do we
>need to re-add it as get/put_device(chip->dev.parent) ?

That code was not correct, the get_device side has racy
lack-of-locking problems and it serves no purpose for the tpm core or
any existing driver.

I already fixed this once in commit ba0ef85479c46a 'tpm: Fix
initialization of the cdev' - that solves the racing of get_device,
and grabs the correct device kref, but I forgot to delete the broken
residual get_device. Sigh.

It is unfortunate that bogus code sent you down this rabbit hole. My
bad :(

I'll send you something else that might work for vtpm...

Jason

Re: [PATCH v42 3/6] clk: Provide OF helper to mark clocks as CRITICAL

2016-02-12 Thread Stephen Boyd

On 02/11, Michael Turquette wrote:
> +int of_clk_mark_if_critical(struct device_node *np,
> +   int index, unsigned long *flags)
> +{
> + struct property *prop;
> + const __be32 *cur;
> + uint32_t idx;
> +
> + if (!np || !flags)
> + return -EINVAL;
> +
> + of_property_for_each_u32(np, "clock-critical", prop, cur, idx)
> + if (index == idx)
> + *flags |= CLK_IS_CRITICAL;
> +
> + return 0;
> +}

I hope we don't have to export this to modules...

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH net-next v8 01/19] lib/bitmap.c: conversion routines to/from u32 array

2016-02-12 Thread Ben Hutchings

On Tue, 2016-02-09 at 16:29 -0800, David Decotigny wrote:
> From: David Decotigny 
> 
> Aimed at transferring bitmaps to/from user-space in a 32/64-bit agnostic
> way.
> 
> Tested:
>   unit tests (next patch) on qemu i386, x86_64, ppc, ppc64 BE and LE,
>   ARM.
> 
> Signed-off-by: David Decotigny 

Reviewed-by: Ben Hutchings 

[...]

-- 
Ben Hutchings
Sturgeon's Law: Ninety percent of everything is crap.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 3/3] tpm: Get rid of devname

2016-02-12 Thread kbuild test robot

Hi Jason,

[auto build test WARNING on char-misc/char-misc-testing]
[also build test WARNING on v4.5-rc3 next-20160212]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improving the system]

url:
https://github.com/0day-ci/linux/commits/Jason-Gunthorpe/tpm-Hold-the-kref-during-tpm_chip_find_get/20160213-080824
config: xtensa-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   drivers/char/tpm/tpm-chip.c: In function 'tpm1_chip_register':
>> drivers/char/tpm/tpm-chip.c:193:19: warning: passing argument 1 of 
>> 'tpm_bios_log_setup' discards 'const' qualifier from pointer target type
 chip->bios_dir = tpm_bios_log_setup(dev_name(>dev));
  ^
   In file included from drivers/char/tpm/tpm-chip.c:30:0:
   drivers/char/tpm/tpm_eventlog.h:83:31: note: expected 'char *' but argument 
is of type 'const char *'
static inline struct dentry **tpm_bios_log_setup(char *name)
  ^

vim +193 drivers/char/tpm/tpm-chip.c

   177  {
   178  cdev_del(>cdev);
   179  device_unregister(>dev);
   180  }
   181  
   182  static int tpm1_chip_register(struct tpm_chip *chip)
   183  {
   184  int rc;
   185  
   186  if (chip->flags & TPM_CHIP_FLAG_TPM2)
   187  return 0;
   188  
   189  rc = tpm_sysfs_add_device(chip);
   190  if (rc)
   191  return rc;
   192  
 > 193  chip->bios_dir = tpm_bios_log_setup(dev_name(>dev));
   194  
   195  return 0;
   196  }
   197  
   198  static void tpm1_chip_unregister(struct tpm_chip *chip)
   199  {
   200  if (chip->flags & TPM_CHIP_FLAG_TPM2)
   201  return;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [GIT PULL] bcm2835 DT changes for 4.6

2016-02-12 Thread Eric Anholt

Florian Fainelli  writes:

> On 10/02/2016 10:51, Eric Anholt wrote:
>> Martin Sperl  writes:
>> 
 On 09.02.2016, at 01:32, Eric Anholt  wrote:

 Hi Florian.  Here's the first set of patches for bcm2835 for 4.6.
 We've got more DT patches that are going to happen for new boards,
 too, but they're still getting polished.

 The following changes since commit 
 92e963f50fc74041b5e9e744c330dca48e04f08d:

  Linux 4.5-rc1 (2016-01-24 13:06:47 -0800)

 are available in the git repository at:

  g...@github.com:anholt/linux.git tags/bcm2835-dt-next-2016-02-04

 for you to fetch changes up to 5ec6f2cd8ec4bcd38ba199ea8711a5ec906d85e7:

  ARM: bcm2835: Add the Raspberry Pi power domain driver to the DT. 
 (2016-02-02 20:02:45 -0800)

 This pull request covers mostly DT changes that didn't make it into
 4.5 because required header files went through other trees.

 Alexander Aring (1):
  ARM: bcm2835: Add the Raspberry Pi power domain driver to the DT.

 Lubomir Rintel (1):
  ARM: bcm2835: dt: Add Raspberry Pi Model A

 Martin Sperl (2):
  ARM: bcm2835: add the auxiliary spi1 and spi2 to the device tree
  ARM: bcm2835: follow dt uart node-naming convention
>>>
>>> Do you want me to resend a rebased version of:
>>>  ARM: bcm2835: add bcm2835-aux-uart support to default DT
>>>
>>> The corresponding driver has been added to tty/tty-next.
>> 
>> It hadn't landed last time I checked.  A rebased version that you've
>> tested would be great!
>
> OK, please submit this in the next week or so at most, so we can get
> this pull request merged, thanks!
>
> Eric, do you have other changes outside of Device Tree?

We've got bcm2835_defconfig changes that I need to test and tag.

There are also the multi_v7_defconfig updates to enable bcm2835.  Would
I be pulling those, or someone above me?

signature.asc
Description: PGP signature

Re: [PATCH v2 6/8] power: cros_usbpd-charger: Add EC-based USB PD charger driver

2016-02-12 Thread Stephen Boyd

On 02/12/2016 04:57 AM, Tomeu Vizoso wrote:
>
>
> diff --git a/drivers/power/cros_usbpd-charger.c 
> b/drivers/power/cros_usbpd-charger.c
> new file mode 100644
> index ..c1aa58b47f56
> --- /dev/null
> +++ b/drivers/power/cros_usbpd-charger.c
> @@ -0,0 +1,908 @@
> +/*
> + * Power supply driver for ChromeOS EC based USB PD Charger.
> + *
> + * Copyright (c) 2014 Google, Inc
> + *
> + * This software is licensed under the terms of the GNU General Public
> + * License version 2, as published by the Free Software Foundation, and
> + * may be copied, distributed, and modified under those terms.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define CROS_USB_PD_MAX_PORTS8
> +#define CROS_USB_PD_MAX_LOG_ENTRIES  30
> +
> +#define CROS_USB_PD_LOG_UPDATE_DELAY msecs_to_jiffies(6)
> +#define CROS_USB_PD_CACHE_UPDATE_DELAY msecs_to_jiffies(500)
> +
> +/* Buffer + macro for building PDLOG string */
> +#define BUF_SIZE 80
> +#define APPEND_STRING(buf, len, str, ...) ((len) += \
> + snprintf((buf) + (len), max(BUF_SIZE - (len), 0), (str), ##__VA_ARGS__))
> +
> +#define CHARGER_DIR_NAME "CROS_USB_PD_CHARGER%d"
> +#define CHARGER_DIR_NAME_LENGTH  sizeof(CHARGER_DIR_NAME)
> +
> +#define MANUFACTURER_MODEL_LENGTH32
> +
> +struct port_data {
> + int port_number;
> + char name[CHARGER_DIR_NAME_LENGTH];
> + char manufacturer[MANUFACTURER_MODEL_LENGTH];
> + char model_name[MANUFACTURER_MODEL_LENGTH];
> + struct power_supply *psy;
> + struct power_supply_desc psy_desc;
> + int psy_type;
> + int psy_online;
> + int psy_status;
> + int psy_current_max;
> + int psy_voltage_max_design;
> + int psy_voltage_now;
> + int psy_power_max;
> + struct charger_data *charger;
> + unsigned long last_update;
> +};
> +
> +struct charger_data {
> + struct device *dev;
> + struct cros_ec_dev *ec_dev;
> + struct cros_ec_device *ec_device;
> + int num_charger_ports;
> + int num_registered_psy;
> + struct port_data *ports[CROS_USB_PD_MAX_PORTS];
> + struct delayed_work log_work;
> + struct workqueue_struct *log_workqueue;
> + struct notifier_block notifier;
> + bool suspended;
> +};
> +
> +#define EC_MAX_IN_SIZE EC_PROTO2_MAX_REQUEST_SIZE
> +#define EC_MAX_OUT_SIZE EC_PROTO2_MAX_RESPONSE_SIZE
> +uint8_t ec_inbuf[EC_MAX_IN_SIZE];
> +uint8_t ec_outbuf[EC_MAX_OUT_SIZE];

static? Why can't these be part of charger_data?

>
> +
> +static int set_ec_usb_pd_override_ports(struct charger_data *charger,
> + int port_num)
> +{
> + struct device *dev = charger->dev;
> + struct ec_params_charge_port_override req;
> + int ret;
> +
> + req.override_port = port_num;
> +
> + ret = ec_command(charger->ec_dev, 0, EC_CMD_PD_CHARGE_PORT_OVERRIDE,
> +  (uint8_t *), sizeof(req),
> +  NULL, 0);
> + if (ret < 0) {
> + dev_info(dev, "Port Override command returned 0x%x\n", ret);

dev_err?

> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +
[...]
> +
> +static void cros_usb_pd_print_log_entry(struct ec_response_pd_log *r,
> + ktime_t tstamp)
> +{
> + static const char * const fault_names[] = {
> + "---", "OCP", "fast OCP", "OVP", "Discharge"
> + };
> + static const char * const role_names[] = {
> + "Disconnected", "SRC", "SNK", "SNK (not charging)"
> + };
> + static const char * const chg_type_names[] = {
> + "None", "PD", "Type-C", "Proprietary",
> + "DCP", "CDP", "SDP", "Other", "VBUS"
> + };
> + int i;
> + int role_idx, type_idx;
> + const char *fault, *role, *chg_type;
> + struct usb_chg_measures *meas;
> + struct mcdp_info *minfo;
> + struct rtc_time rt;
> + int len = 0;
> + char buf[BUF_SIZE + 1];
> +
> + /* the timestamp is the number of 1024th of seconds in the past */
> + tstamp = ktime_sub_us(tstamp,
> +  (uint64_t)r->timestamp << PD_LOG_TIMESTAMP_SHIFT);
> + rt = rtc_ktime_to_tm(tstamp);
> +
> + switch (r->type) {
> + case PD_EVENT_MCU_CHARGE:
> + if (r->data & CHARGE_FLAGS_OVERRIDE)
> + APPEND_STRING(buf, len, "override ");
> + if (r->data & CHARGE_FLAGS_DELAYED_OVERRIDE)
> + APPEND_STRING(buf, len, "pending_override ");
> + role_idx = r->data & CHARGE_FLAGS_ROLE_MASK;
> + role = role_idx < ARRAY_SIZE(role_names) ?
> + role_names[role_idx] :

Re: [kernel-hardening] [PATCH] arm64: vdso: Mark vDSO code as read-only

2016-02-12 Thread David Brown


On Thu, Feb 11, 2016 at 03:19:20PM +0100, Ard Biesheuvel wrote:


diff --git a/arch/arm64/kernel/vdso/vdso.S b/arch/arm64/kernel/vdso/vdso.S
index 60c1db5..db7c0f2 100644
--- a/arch/arm64/kernel/vdso/vdso.S
+++ b/arch/arm64/kernel/vdso/vdso.S
@@ -24,6 +24,7 @@
__PAGE_ALIGNED_DATA


^^ You can get rid of this now as well


Can we?  The page is getting mapped to userspace, and if we didn't
page align it, we could leak kernel read-only data to every userspace
process.

David

Re: [PATCH] af_unix: Don't set err in unix_stream_read_generic unless there was an error

2016-02-12 Thread Ben Hutchings

On Fri, 2016-02-05 at 22:30 +, Rainer Weikusat wrote:
> The present unix_stream_read_generic contains various code sequences of
> the form
> 
> err = -EDISASTER;
> if ()
>   goto out;
[...]

I wish people would stop writing code like this.  At one time it may
have been a useful micro-optimisation, avoiding an extra branch in the
successful case, but gcc now appears to do that itself.  So it makes
the code less clear and runs the risk of introducing this sort of bug,
for no obvious benefit.

Ben.

-- 
Ben Hutchings
Sturgeon's Law: Ninety percent of everything is crap.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH v3] dmi: Make dmi_walk and dmi_walk_early return real error codes

2016-02-12 Thread Darren Hart

On Fri, Feb 12, 2016 at 10:59:10AM -0800, Andy Lutomirski wrote:
> On Tue, Feb 2, 2016 at 9:00 AM, Darren Hart  wrote:
> > On Sat, Jan 30, 2016 at 08:18:50PM +0100, Jean Delvare wrote:
> >> On Sat, 30 Jan 2016 10:13:09 -0800, Andy Lutomirski wrote:
> >> > On Sat, Jan 30, 2016 at 10:05 AM, Darren Hart  
> >> > wrote:
> >> > > If I understand this correctly, this is the first of 5 patches, and 
> >> > > this one has
> >> > > some unanswered questions from Jean here. If this patch gets respun, 
> >> > > the
> >> > > following are also impacted:
> >> > >
> >> > > dell-wmi: Stop storing pointers to DMI tables
> >> > > dell-wmi, dell-laptop: select DMI
> >> > > dell-wmi: Clean up hotkey table size check
> >> > > dell-wmi: Support new hotkeys on the XPS 13 9350 (Skylake)
> >> > >
> >> > > Is that correct?
> >> >
> >> > Not really.  It's just the three patches here:
> >> >
> >> > http://article.gmane.org/gmane.linux.drivers.platform.x86.devel/8503
> >> >
> >> > This patch (the dmi_walk error code one) is no longer really related.
> >> > Due to Jean's earlier comment about what happens if DMI isn't enabled
> >> > at all, I no longer propagate the error code from dmi_walk in
> >> > dell-wmi, so the error code won't have any effect.  (Instead I just
> >> > warn and let the driver load in legacy mode, which matches the current
> >> > behavior.)
> >> >
> >> > I think the way to go is for the v3 "dell-wmi: DMI misuse fixes"
> >> > series to go in through your tree, and I'll hash out the error code
> >> > thing separately with Jean.
> >> >
> >> > Does that seem sensible?
> >>
> >> Yes, I agree that this patch is independent from the dell-wmi patch
> >> series now.
> >
> > Excellent, works for me.
> >
> 
> Is any further action from me needed here?

Now that the Dell SMBIOS update is in (yesterday), I am looking for a
consolidated patch series from you, ending with the skylake support.

-- 
Darren Hart
Intel Open Source Technology Center

[PATCH v2 03/11] vfio: Define sparse mmap capability for regions

2016-02-12 Thread Alex Williamson

We can't always support mmap across an entire device region, for
example we deny mmaps covering the MSI-X table of PCI devices, but
we don't really have a way to report it.  We expect the user to
implicitly know this restriction.  We also can't split the region
because vfio-pci defines an API with fixed region index to BAR
number mapping.  We therefore define a new capability which lists
areas within the region that may be mmap'd.  In addition to the
MSI-X case, this potentially enables in-kernel emulation and
extensions to devices.

Signed-off-by: Alex Williamson 
---
 include/uapi/linux/vfio.h |   26 +-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d508adf..fde7b1e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -221,13 +221,37 @@ struct vfio_region_info {
 #define VFIO_REGION_INFO_FLAG_READ (1 << 0) /* Region supports read */
 #define VFIO_REGION_INFO_FLAG_WRITE(1 << 1) /* Region supports write */
 #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */
+#define VFIO_REGION_INFO_FLAG_CAPS (1 << 3) /* Info supports caps */
__u32   index;  /* Region index */
-   __u32   resv;   /* Reserved for alignment */
+   __u32   cap_offset; /* Offset within info struct of first cap */
__u64   size;   /* Region size (bytes) */
__u64   offset; /* Region offset from start of device fd */
 };
 #define VFIO_DEVICE_GET_REGION_INFO_IO(VFIO_TYPE, VFIO_BASE + 8)
 
+/*
+ * The sparse mmap capability allows finer granularity of specifying areas
+ * within a region with mmap support.  When specified, the user should only
+ * mmap the offset ranges specified by the areas array.  mmaps outside of the
+ * areas specified may fail (such as the range covering a PCI MSI-X table) or
+ * may result in improper device behavior.
+ *
+ * The structures below define version 1 of this capability.
+ */
+#define VFIO_REGION_INFO_CAP_SPARSE_MMAP   1
+
+struct vfio_region_sparse_mmap_area {
+   __u64   offset; /* Offset of mmap'able area within region */
+   __u64   size;   /* Size of mmap'able area */
+};
+
+struct vfio_region_info_cap_sparse_mmap {
+   struct vfio_info_cap_header header;
+   __u32   nr_areas;
+   __u32   reserved;
+   struct vfio_region_sparse_mmap_area areas[];
+};
+
 /**
  * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
  * struct vfio_irq_info)

[PATCH v2 01/11] vfio: Define capability chains

2016-02-12 Thread Alex Williamson

We have a few cases where we need to extend the data returned from the
INFO ioctls in VFIO.  For instance we already have devices exposed
through vfio-pci where VFIO_DEVICE_GET_REGION_INFO reports the region
as mmap-capable, but really only supports sparse mmaps, avoiding the
MSI-X table.  If we wanted to provide in-kernel emulation or extended
functionality for devices, we'd also want the ability to tell the
user not to mmap various regions, rather than forcing them to figure
it out on their own.

Another example is VFIO_IOMMU_GET_INFO.  We'd really like to expose
the actual IOVA capabilities of the IOMMU rather than letting the
user assume the address space they have available to them.  We could
add IOVA base and size fields to struct vfio_iommu_type1_info, but
what if we have multiple IOVA ranges.  For instance x86 uses a range
of addresses at 0xfee0 for MSI vectors.  These typically are not
available for standard DMA IOVA mappings and splits our available IOVA
space into two ranges.  POWER systems have both an IOVA window below
4G as well as dynamic data window which they can use to remap all of
guest memory.

Representing variable sized arrays within a fixed structure makes it
very difficult to parse, we'd therefore like to put this data beyond
fixed fields within the data structures.  One way to do this is to
emulate capabilities in PCI configuration space.  A new flag indciates
whether capabilties are supported and a new fixed field reports the
offset of the first entry.  Users can then walk the chain to find
capabilities, adding capabilities does not require additional fields
in the fixed structure, and parsing variable sized data becomes
trivial.

This patch outlines the theory and base header structure, which
should be shared by all future users.

Signed-off-by: Alex Williamson 
---
 include/uapi/linux/vfio.h |   27 +++
 1 file changed, 27 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 7d7a4c6..d508adf 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -59,6 +59,33 @@
 #define VFIO_TYPE  (';')
 #define VFIO_BASE  100
 
+/*
+ * For extension of INFO ioctls, VFIO makes use of a capability chain
+ * designed after PCI/e capabilities.  A flag bit indicates whether
+ * this capability chain is supported and a field defined in the fixed
+ * structure defines the offset of the first capability in the chain.
+ * This field is only valid when the corresponding bit in the flags
+ * bitmap is set.  This offset field is relative to the start of the
+ * INFO buffer, as is the next field within each capability header.
+ * The id within the header is a shared address space per INFO ioctl,
+ * while the version field is specific to the capability id.  The
+ * contents following the header are specific to the capability id.
+ */
+struct vfio_info_cap_header {
+   __u16   id; /* Identifies capability */
+   __u16   version;/* Version specific to the capability ID */
+   __u32   next;   /* Offset of next capability */
+};
+
+/*
+ * Callers of INFO ioctls passing insufficiently sized buffers will see
+ * the capability chain flag bit set, a zero value for the first capability
+ * offset (if available within the provided argsz), and argsz will be
+ * updated to report the necessary buffer size.  For compatibility, the
+ * INFO ioctl will not report error in this case, but the capability chain
+ * will not be available.
+ */
+
 /*  IOCTLs for VFIO file descriptor (/dev/vfio/vfio)  */
 
 /**

[PATCH 2/3] tpm: Get rid of chip->pdev

2016-02-12 Thread Jason Gunthorpe

This is a hold over from before the struct device conversion.

- All prints should be using >dev, which is the Linux
  standard. This changes prints to use tpm0 as the device name,
  not the PnP/etc ID.
- The few places involving sysfs/modules that really do need the
  parent just use chip->dev.parent instead
- We no longer need to get_device(pdev) in any places since it is no
  longer used by any of the code. The kref on the parent is held
  by the device core during device_add and dropped in device_del

Signed-off-by: Jason Gunthorpe 
---
 drivers/char/tpm/tpm-chip.c | 15 ++-
 drivers/char/tpm/tpm-dev.c  |  4 +---
 drivers/char/tpm/tpm-interface.c| 30 --
 drivers/char/tpm/tpm-sysfs.c|  6 +++---
 drivers/char/tpm/tpm.h  |  3 +--
 drivers/char/tpm/tpm2-cmd.c |  8 
 drivers/char/tpm/tpm_atmel.c| 14 +++---
 drivers/char/tpm/tpm_i2c_atmel.c| 16 
 drivers/char/tpm/tpm_i2c_infineon.c |  6 +++---
 drivers/char/tpm/tpm_i2c_nuvoton.c  | 26 +-
 drivers/char/tpm/tpm_infineon.c | 22 +++---
 drivers/char/tpm/tpm_nsc.c  | 20 ++--
 drivers/char/tpm/tpm_tis.c  | 16 
 13 files changed, 91 insertions(+), 95 deletions(-)

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index ae2fed8a162b..b1364bf62492 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -49,7 +49,7 @@ struct tpm_chip *tpm_chip_find_get(int chip_num)
if (chip_num != TPM_ANY_NUM && chip_num != pos->dev_num)
continue;
 
-   if (try_module_get(pos->pdev->driver->owner)) {
+   if (try_module_get(pos->dev.parent->driver->owner)) {
chip = pos;
break;
}
@@ -114,13 +114,11 @@ struct tpm_chip *tpmm_chip_alloc(struct device *dev,
 
scnprintf(chip->devname, sizeof(chip->devname), "tpm%d", chip->dev_num);
 
-   chip->pdev = dev;
-
dev_set_drvdata(dev, chip);
 
chip->dev.class = tpm_class;
chip->dev.release = tpm_dev_release;
-   chip->dev.parent = chip->pdev;
+   chip->dev.parent = dev;
 #ifdef CONFIG_ACPI
chip->dev.groups = chip->groups;
 #endif
@@ -135,7 +133,7 @@ struct tpm_chip *tpmm_chip_alloc(struct device *dev,
device_initialize(>dev);
 
cdev_init(>cdev, _fops);
-   chip->cdev.owner = chip->pdev->driver->owner;
+   chip->cdev.owner = dev->driver->owner;
chip->cdev.kobj.parent = >dev.kobj;
 
return chip;
@@ -236,9 +234,8 @@ int tpm_chip_register(struct tpm_chip *chip)
chip->flags |= TPM_CHIP_FLAG_REGISTERED;
 
if (!(chip->flags & TPM_CHIP_FLAG_TPM2)) {
-   rc = __compat_only_sysfs_link_entry_to_kobj(>pdev->kobj,
-   >dev.kobj,
-   "ppi");
+   rc = __compat_only_sysfs_link_entry_to_kobj(
+   >dev.parent->kobj, >dev.kobj, "ppi");
if (rc && rc != -ENOENT) {
tpm_chip_unregister(chip);
return rc;
@@ -273,7 +270,7 @@ void tpm_chip_unregister(struct tpm_chip *chip)
synchronize_rcu();
 
if (!(chip->flags & TPM_CHIP_FLAG_TPM2))
-   sysfs_remove_link(>pdev->kobj, "ppi");
+   sysfs_remove_link(>dev.parent->kobj, "ppi");
 
tpm1_chip_unregister(chip);
tpm_dev_del_device(chip);
diff --git a/drivers/char/tpm/tpm-dev.c b/drivers/char/tpm/tpm-dev.c
index de0337ebd658..4009765c14fd 100644
--- a/drivers/char/tpm/tpm-dev.c
+++ b/drivers/char/tpm/tpm-dev.c
@@ -61,7 +61,7 @@ static int tpm_open(struct inode *inode, struct file *file)
 * by the check of is_open variable, which is protected
 * by driver_lock. */
if (test_and_set_bit(0, >is_open)) {
-   dev_dbg(chip->pdev, "Another process owns this TPM\n");
+   dev_dbg(>dev, "Another process owns this TPM\n");
return -EBUSY;
}
 
@@ -79,7 +79,6 @@ static int tpm_open(struct inode *inode, struct file *file)
INIT_WORK(>work, timeout_work);
 
file->private_data = priv;
-   get_device(chip->pdev);
return 0;
 }
 
@@ -166,7 +165,6 @@ static int tpm_release(struct inode *inode, struct file 
*file)
file->private_data = NULL;
atomic_set(>data_pending, 0);
clear_bit(0, >chip->is_open);
-   put_device(priv->chip->pdev);
kfree(priv);
return 0;
 }
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index e2fa89c88304..483f86ff6a0a 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -345,7 +345,7 @@ ssize_t tpm_transmit(struct tpm_chip *chip, const char *buf,
if (count ==

[PATCH 1/3] tpm: Hold the kref during tpm_chip_find_get

2016-02-12 Thread Jason Gunthorpe

This was missed during the struct device conversion, we
need to hold a kref on the chip to make sure it isn't freed.

Signed-off-by: Jason Gunthorpe 
---
 drivers/char/tpm/tpm-chip.c | 2 ++
 drivers/char/tpm/tpm.h  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index 45cc39aabeee..ae2fed8a162b 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -53,6 +53,8 @@ struct tpm_chip *tpm_chip_find_get(int chip_num)
chip = pos;
break;
}
+
+   get_device(>dev);
}
rcu_read_unlock();
return chip;
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 542a80cbfd9c..f6ba79d91857 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -207,6 +207,7 @@ struct tpm_chip {
 static inline void tpm_chip_put(struct tpm_chip *chip)
 {
module_put(chip->pdev->driver->owner);
+   put_device(>dev);
 }
 
 static inline int tpm_read_index(int base, int index)
-- 
2.1.4

[PATCH 3/3] tpm: Get rid of devname

2016-02-12 Thread Jason Gunthorpe

Now that we have a proper struct device just use dev_name() to
access this value instead of keeping two copies.

Signed-off-by: Jason Gunthorpe 
---
 drivers/char/tpm/tpm-chip.c| 17 +++--
 drivers/char/tpm/tpm.h |  1 -
 drivers/char/tpm/tpm_eventlog.c|  2 +-
 drivers/char/tpm/tpm_eventlog.h|  2 +-
 drivers/char/tpm/tpm_i2c_nuvoton.c |  2 +-
 drivers/char/tpm/tpm_tis.c |  2 +-
 6 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index b1364bf62492..caa52a6110ec 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -90,6 +90,7 @@ struct tpm_chip *tpmm_chip_alloc(struct device *dev,
 const struct tpm_class_ops *ops)
 {
struct tpm_chip *chip;
+   int err;
 
chip = kzalloc(sizeof(*chip), GFP_KERNEL);
if (chip == NULL)
@@ -112,8 +113,6 @@ struct tpm_chip *tpmm_chip_alloc(struct device *dev,
 
set_bit(chip->dev_num, dev_mask);
 
-   scnprintf(chip->devname, sizeof(chip->devname), "tpm%d", chip->dev_num);
-
dev_set_drvdata(dev, chip);
 
chip->dev.class = tpm_class;
@@ -128,7 +127,9 @@ struct tpm_chip *tpmm_chip_alloc(struct device *dev,
else
chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);
 
-   dev_set_name(>dev, "%s", chip->devname);
+   err = dev_set_name(>dev, "tpm%d", chip->dev_num);
+   if (err)
+   goto out;
 
device_initialize(>dev);
 
@@ -137,6 +138,10 @@ struct tpm_chip *tpmm_chip_alloc(struct device *dev,
chip->cdev.kobj.parent = >dev.kobj;
 
return chip;
+
+out:
+   put_device(>dev);
+   return ERR_PTR(err);
 }
 EXPORT_SYMBOL_GPL(tpmm_chip_alloc);
 
@@ -148,7 +153,7 @@ static int tpm_dev_add_device(struct tpm_chip *chip)
if (rc) {
dev_err(>dev,
"unable to cdev_add() %s, major %d, minor %d, err=%d\n",
-   chip->devname, MAJOR(chip->dev.devt),
+   dev_name(>dev), MAJOR(chip->dev.devt),
MINOR(chip->dev.devt), rc);
 
device_unregister(>dev);
@@ -159,7 +164,7 @@ static int tpm_dev_add_device(struct tpm_chip *chip)
if (rc) {
dev_err(>dev,
"unable to device_register() %s, major %d, minor %d, 
err=%d\n",
-   chip->devname, MAJOR(chip->dev.devt),
+   dev_name(>dev), MAJOR(chip->dev.devt),
MINOR(chip->dev.devt), rc);
 
return rc;
@@ -185,7 +190,7 @@ static int tpm1_chip_register(struct tpm_chip *chip)
if (rc)
return rc;
 
-   chip->bios_dir = tpm_bios_log_setup(chip->devname);
+   chip->bios_dir = tpm_bios_log_setup(dev_name(>dev));
 
return 0;
 }
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 371f75f4d2a7..a53fc699027b 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -181,7 +181,6 @@ struct tpm_chip {
unsigned int flags;
 
int dev_num;/* /dev/tpm# */
-   char devname[7];
unsigned long is_open;  /* only one allowed */
int time_expired;
 
diff --git a/drivers/char/tpm/tpm_eventlog.c b/drivers/char/tpm/tpm_eventlog.c
index bd72fb04225e..49e50976efc8 100644
--- a/drivers/char/tpm/tpm_eventlog.c
+++ b/drivers/char/tpm/tpm_eventlog.c
@@ -397,7 +397,7 @@ static int is_bad(void *p)
return 0;
 }
 
-struct dentry **tpm_bios_log_setup(char *name)
+struct dentry **tpm_bios_log_setup(const char *name)
 {
struct dentry **ret = NULL, *tpm_dir, *bin_file, *ascii_file;
 
diff --git a/drivers/char/tpm/tpm_eventlog.h b/drivers/char/tpm/tpm_eventlog.h
index 267bfbd1b7bb..f072a1a1d5cc 100644
--- a/drivers/char/tpm/tpm_eventlog.h
+++ b/drivers/char/tpm/tpm_eventlog.h
@@ -77,7 +77,7 @@ int read_log(struct tpm_bios_log *log);
 
 #if defined(CONFIG_TCG_IBMVTPM) || defined(CONFIG_TCG_IBMVTPM_MODULE) || \
defined(CONFIG_ACPI)
-extern struct dentry **tpm_bios_log_setup(char *);
+extern struct dentry **tpm_bios_log_setup(const char *name);
 extern void tpm_bios_log_teardown(struct dentry **);
 #else
 static inline struct dentry **tpm_bios_log_setup(char *name)
diff --git a/drivers/char/tpm/tpm_i2c_nuvoton.c 
b/drivers/char/tpm/tpm_i2c_nuvoton.c
index 8fb378f502e4..6dd74d114fb3 100644
--- a/drivers/char/tpm/tpm_i2c_nuvoton.c
+++ b/drivers/char/tpm/tpm_i2c_nuvoton.c
@@ -560,7 +560,7 @@ static int i2c_nuvoton_probe(struct i2c_client *client,
rc = devm_request_irq(dev, chip->vendor.irq,
  i2c_nuvoton_int_handler,
  IRQF_TRIGGER_LOW,
- chip->devname,
+ dev_name(>dev),
  chip);
if (rc) {
dev_err(dev,

[PATCH 0/3] Various struct device cleanups

2016-02-12 Thread Jason Gunthorpe

These little clean ups were missed during the struct device conversion of
tpm_chip. There were noticed when looking at Stefan's vtpm patch sets.

Nothing very significant
 - Add some missing krefs
 - Replace chip->devname with dev_name
 - Replace chip->pdev with chip->dev.parent

Jason Gunthorpe (3):
  tpm: Hold the kref during tpm_chip_find_get
  tpm: Get rid of chip->pdev
  tpm: Get rid of devname

 drivers/char/tpm/tpm-chip.c | 34 +++---
 drivers/char/tpm/tpm-dev.c  |  4 +---
 drivers/char/tpm/tpm-interface.c| 30 --
 drivers/char/tpm/tpm-sysfs.c|  6 +++---
 drivers/char/tpm/tpm.h  |  5 ++---
 drivers/char/tpm/tpm2-cmd.c |  8 
 drivers/char/tpm/tpm_atmel.c| 14 +++---
 drivers/char/tpm/tpm_eventlog.c |  2 +-
 drivers/char/tpm/tpm_eventlog.h |  2 +-
 drivers/char/tpm/tpm_i2c_atmel.c| 16 
 drivers/char/tpm/tpm_i2c_infineon.c |  6 +++---
 drivers/char/tpm/tpm_i2c_nuvoton.c  | 28 ++--
 drivers/char/tpm/tpm_infineon.c | 22 +++---
 drivers/char/tpm/tpm_nsc.c  | 20 ++--
 drivers/char/tpm/tpm_tis.c  | 18 +-
 15 files changed, 109 insertions(+), 106 deletions(-)

-- 
2.1.4

[PATCH v2 06/11] vfio/pci: Add infrastructure for additional device specific regions

2016-02-12 Thread Alex Williamson

Add support for additional regions with indexes started after the
already defined fixed regions.  Device specific code can register
these regions with the new vfio_pci_register_dev_region() function.
The ops structure per region currently only includes read/write
access and a release function, allowing automatic cleanup when the
device is closed.  mmap support is only missing here because it's
not needed by the first user queued for this support.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c |   81 +--
 drivers/vfio/pci/vfio_pci_private.h |   27 
 2 files changed, 103 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 4682207..813a2e6 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -175,7 +175,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
 static void vfio_pci_disable(struct vfio_pci_device *vdev)
 {
struct pci_dev *pdev = vdev->pdev;
-   int bar;
+   int i, bar;
 
/* Stop the device from further DMA */
pci_clear_master(pdev);
@@ -186,6 +186,13 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
 
vdev->virq_disabled = false;
 
+   for (i = 0; i < vdev->num_regions; i++)
+   vdev->region[i].ops->release(vdev, >region[i]);
+
+   vdev->num_regions = 0;
+   kfree(vdev->region);
+   vdev->region = NULL; /* don't krealloc a freed pointer */
+
vfio_config_free(vdev);
 
for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
@@ -463,6 +470,51 @@ static int msix_sparse_mmap_cap(struct vfio_pci_device 
*vdev,
return 0;
 }
 
+static int region_type_cap(struct vfio_pci_device *vdev,
+  struct vfio_info_cap *caps,
+  unsigned int type, unsigned int subtype)
+{
+   struct vfio_info_cap_header *header;
+   struct vfio_region_info_cap_type *cap;
+
+   header = vfio_info_cap_add(caps, sizeof(*cap),
+  VFIO_REGION_INFO_CAP_TYPE, 1);
+   if (IS_ERR(header))
+   return PTR_ERR(header);
+
+   cap = container_of(header, struct vfio_region_info_cap_type, header);
+   cap->type = type;
+   cap->subtype = subtype;
+
+   return 0;
+}
+
+int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
+unsigned int type, unsigned int subtype,
+const struct vfio_pci_regops *ops,
+size_t size, u32 flags, void *data)
+{
+   struct vfio_pci_region *region;
+
+   region = krealloc(vdev->region,
+ (vdev->num_regions + 1) * sizeof(*region),
+ GFP_KERNEL);
+   if (!region)
+   return -ENOMEM;
+
+   vdev->region = region;
+   vdev->region[vdev->num_regions].type = type;
+   vdev->region[vdev->num_regions].subtype = subtype;
+   vdev->region[vdev->num_regions].ops = ops;
+   vdev->region[vdev->num_regions].size = size;
+   vdev->region[vdev->num_regions].flags = flags;
+   vdev->region[vdev->num_regions].data = data;
+
+   vdev->num_regions++;
+
+   return 0;
+}
+
 static long vfio_pci_ioctl(void *device_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -485,7 +537,7 @@ static long vfio_pci_ioctl(void *device_data,
if (vdev->reset_works)
info.flags |= VFIO_DEVICE_FLAGS_RESET;
 
-   info.num_regions = VFIO_PCI_NUM_REGIONS;
+   info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions;
info.num_irqs = VFIO_PCI_NUM_IRQS;
 
return copy_to_user((void __user *)arg, , minsz);
@@ -494,7 +546,7 @@ static long vfio_pci_ioctl(void *device_data,
struct pci_dev *pdev = vdev->pdev;
struct vfio_region_info info;
struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
-   int ret;
+   int i, ret;
 
minsz = offsetofend(struct vfio_region_info, offset);
 
@@ -568,7 +620,21 @@ static long vfio_pci_ioctl(void *device_data,
 
break;
default:
-   return -EINVAL;
+   if (info.index >=
+   VFIO_PCI_NUM_REGIONS + vdev->num_regions)
+   return -EINVAL;
+
+   i = info.index - VFIO_PCI_NUM_REGIONS;
+
+   info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index);
+   info.size = vdev->region[i].size;
+   info.flags = vdev->region[i].flags;
+
+   ret = region_type_cap(vdev, ,
+ vdev->region[i].type,
+ vdev->region[i].subtype);
+

[PATCH v2 10/11] vfio/pci: Hide stolen memory from the user

2016-02-12 Thread Alex Williamson

We do not provide a way for the user to access the host stolen memory
therefore hide both the base address and the size.  This is mostly
useful for VM users where the driver may try to make use of the memory
referenced by these registers, whether the VM considers it reserved
for the GPU or not.

Unfortunately the format of the GMCH_CTRL register is not constant
between chip revisions, so we're stuck with ongoing maintenance to try
to track the latest device IDs.  Perhaps if the register is stable now
we should blacklist pre-SandyBridge devices, identify the old format
for Gen 6 and 7 devices, and assume anything else uses Gen 8 format.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci_igd.c |   80 +++
 1 file changed, 80 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 6394b16..03d916e 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -18,11 +18,15 @@
 #include 
 #include 
 
+#include 
+#include 
+
 #include "vfio_pci_private.h"
 
 #define OPREGION_SIGNATURE "IntelGraphicsMem"
 #define OPREGION_SIZE  (8 * 1024)
 #define OPREGION_PCI_ADDR  0xfc
+#define BDSM_PCI_ADDR  0x5c /* Base Data Stolen Memory */
 
 static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
  size_t count, loff_t *ppos, bool iswrite)
@@ -264,8 +268,64 @@ static int vfio_pci_igd_cfg_init(struct vfio_pci_device 
*vdev)
return 0;
 }
 
+struct vfio_pci_igd_info {
+   u16 gmch_gsm_mask;
+};
+
+static const struct vfio_pci_igd_info igd_gen6 = {
+   .gmch_gsm_mask = SNB_GMCH_GMS_MASK << SNB_GMCH_GMS_SHIFT,
+};
+
+static const struct vfio_pci_igd_info igd_gen8 = {
+   .gmch_gsm_mask = BDW_GMCH_GMS_MASK << BDW_GMCH_GMS_SHIFT,
+};
+
+static const struct pci_device_id vfio_pci_igd_ids[] = {
+   /* Gen6 - SandyBridge */
+   INTEL_SNB_D_IDS(_gen6),
+   INTEL_SNB_M_IDS(_gen6),
+   /* Gen7 - IvyBridge, ValleyView, Haswell */
+   INTEL_IVB_D_IDS(_gen6),
+   INTEL_IVB_M_IDS(_gen6),
+   INTEL_IVB_Q_IDS(_gen6),
+   INTEL_VLV_M_IDS(_gen6),
+   INTEL_VLV_D_IDS(_gen6),
+   INTEL_HSW_D_IDS(_gen6),
+   INTEL_HSW_M_IDS(_gen6),
+   /* Gen8 - BroadWell, CherryView */
+   INTEL_BDW_GT12D_IDS(_gen8),
+   INTEL_BDW_GT12M_IDS(_gen8),
+   INTEL_BDW_GT3D_IDS(_gen8),
+   INTEL_BDW_GT3M_IDS(_gen8),
+   INTEL_CHV_IDS(_gen8),
+   /* Gen9 - SkyLake, Broxton, KabyLake */
+   INTEL_SKL_GT1_IDS(_gen8),
+   INTEL_SKL_GT2_IDS(_gen8),
+   INTEL_SKL_GT3_IDS(_gen8),
+   INTEL_SKL_GT4_IDS(_gen8),
+   INTEL_BXT_IDS(_gen8),
+   INTEL_KBL_GT1_IDS(_gen8),
+   INTEL_KBL_GT2_IDS(_gen8),
+   INTEL_KBL_GT3_IDS(_gen8),
+   INTEL_KBL_GT4_IDS(_gen8),
+   { 0 }
+};
+
+static struct vfio_pci_igd_info *vfio_pci_igd_info(struct pci_dev *pdev)
+{
+   const struct pci_device_id *id;
+
+   id = pci_match_id(vfio_pci_igd_ids, pdev);
+   if (!id)
+   return NULL;
+
+   return (struct vfio_pci_igd_info *)id->driver_data;
+}
+
 int vfio_pci_igd_init(struct vfio_pci_device *vdev)
 {
+   struct vfio_pci_igd_info *info;
+   u16 gmch;
int ret;
 
ret = vfio_pci_igd_opregion_init(vdev);
@@ -276,5 +336,25 @@ int vfio_pci_igd_init(struct vfio_pci_device *vdev)
if (ret)
return ret;
 
+   memset(vdev->vconfig + BDSM_PCI_ADDR, 0, 4);
+   memset(vdev->pci_config_map + BDSM_PCI_ADDR,
+  PCI_CAP_ID_INVALID_VIRT, 4);
+
+   info = vfio_pci_igd_info(vdev->pdev);
+   if (!info) {
+   dev_warn(>pdev->dev,
+"Unknown/Unsupported Intel IGD device\n");
+   return 0;
+   }
+
+   ret = pci_read_config_word(vdev->pdev, SNB_GMCH_CTRL, );
+   if (ret)
+   return ret;
+
+   gmch &= ~info->gmch_gsm_mask;
+   *(__le16 *)(vdev->vconfig + SNB_GMCH_CTRL) = cpu_to_le16(gmch);
+   memset(vdev->pci_config_map + SNB_GMCH_CTRL,
+  PCI_CAP_ID_INVALID_VIRT, 2);
+
return 0;
 }

[PATCH v2 08/11] vfio/pci: Intel IGD OpRegion support

2016-02-12 Thread Alex Williamson

This is the first consumer of vfio device specific resource support,
providing read-only access to the OpRegion for Intel graphics devices.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/Kconfig|4 +
 drivers/vfio/pci/Makefile   |1 
 drivers/vfio/pci/vfio_pci.c |7 ++
 drivers/vfio/pci/vfio_pci_igd.c |  111 +++
 drivers/vfio/pci/vfio_pci_private.h |8 +++
 include/uapi/linux/vfio.h   |5 ++
 6 files changed, 136 insertions(+)
 create mode 100644 drivers/vfio/pci/vfio_pci_igd.c

diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig
index 02912f1..24ee260 100644
--- a/drivers/vfio/pci/Kconfig
+++ b/drivers/vfio/pci/Kconfig
@@ -26,3 +26,7 @@ config VFIO_PCI_MMAP
 config VFIO_PCI_INTX
depends on VFIO_PCI
def_bool y if !S390
+
+config VFIO_PCI_IGD
+   depends on VFIO_PCI
+   def_bool y if X86
diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile
index 1310792..76d8ec0 100644
--- a/drivers/vfio/pci/Makefile
+++ b/drivers/vfio/pci/Makefile
@@ -1,4 +1,5 @@
 
 vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o
+vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o
 
 obj-$(CONFIG_VFIO_PCI) += vfio-pci.o
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 813a2e6..cb2624d 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -169,6 +169,13 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev))
vdev->has_vga = true;
 
+
+   if (vfio_pci_is_vga(pdev) && pdev->vendor == PCI_VENDOR_ID_INTEL) {
+   if (vfio_pci_igd_opregion_init(vdev) == 0)
+   dev_info(>dev,
+"Intel IGD OpRegion support enabled\n");
+   }
+
return 0;
 }
 
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
new file mode 100644
index 000..3b6a6f7
--- /dev/null
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -0,0 +1,111 @@
+/*
+ * VFIO PCI Intel Graphics support
+ *
+ * Copyright (C) 2016 Red Hat, Inc.  All rights reserved.
+ * Author: Alex Williamson 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Register a device specific region through which to provide read-only
+ * access to the Intel IGD opregion.  The register defining the opregion
+ * address is also virtualized to prevent user modification.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "vfio_pci_private.h"
+
+#define OPREGION_SIGNATURE "IntelGraphicsMem"
+#define OPREGION_SIZE  (8 * 1024)
+#define OPREGION_PCI_ADDR  0xfc
+
+static size_t vfio_pci_igd_rw(struct vfio_pci_device *vdev, char __user *buf,
+ size_t count, loff_t *ppos, bool iswrite)
+{
+   unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+   void *base = vdev->region[i].data;
+   loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+
+   if (pos >= vdev->region[i].size || iswrite)
+   return -EINVAL;
+
+   count = min(count, (size_t)(vdev->region[i].size - pos));
+
+   if (copy_to_user(buf, base + pos, count))
+   return -EFAULT;
+
+   *ppos += count;
+
+   return count;
+}
+
+static void vfio_pci_igd_release(struct vfio_pci_device *vdev,
+struct vfio_pci_region *region)
+{
+   memunmap(region->data);
+}
+
+static const struct vfio_pci_regops vfio_pci_igd_regops = {
+   .rw = vfio_pci_igd_rw,
+   .release= vfio_pci_igd_release,
+};
+
+int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
+{
+   __le32 *dwordp = (__le32 *)(vdev->vconfig + OPREGION_PCI_ADDR);
+   u32 addr, size;
+   void *base;
+   int ret;
+
+   ret = pci_read_config_dword(vdev->pdev, OPREGION_PCI_ADDR, );
+   if (ret)
+   return ret;
+
+   if (!addr || !(~addr))
+   return -ENODEV;
+
+   base = memremap(addr, OPREGION_SIZE, MEMREMAP_WB);
+   if (!base)
+   return -ENOMEM;
+
+   if (memcmp(base, OPREGION_SIGNATURE, 16)) {
+   memunmap(base);
+   return -EINVAL;
+   }
+
+   size = le32_to_cpu(*(__le32 *)(base + 16));
+   if (!size) {
+   memunmap(base);
+   return -EINVAL;
+   }
+
+   size *= 1024; /* In KB */
+
+   if (size != OPREGION_SIZE) {
+   memunmap(base);
+   base = memremap(addr, size, MEMREMAP_WB);
+   if (!base)
+   return -ENOMEM;
+   }
+
+   ret = vfio_pci_register_dev_region(vdev,
+   PCI_VENDOR_ID_INTEL | VFIO_REGION_TYPE_PCI_VENDOR_TYPE,
+

[PATCH v2 11/11] vfio/pci: Expose shadow ROM as PCI option ROM

2016-02-12 Thread Alex Williamson

Integrated graphics may have their ROM shadowed at 0xc rather than
implement a PCI option ROM.  Make this ROM appear to the user using
the ROM BAR.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c|   10 --
 drivers/vfio/pci/vfio_pci_config.c |   11 ---
 drivers/vfio/pci/vfio_pci_rdwr.c   |9 ++---
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 74a3752..1ce1d36 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -609,8 +609,14 @@ static long vfio_pci_ioctl(void *device_data,
 
/* Report the BAR size, not the ROM size */
info.size = pci_resource_len(pdev, info.index);
-   if (!info.size)
-   break;
+   if (!info.size) {
+   /* Shadow ROMs appear as PCI option ROMs */
+   if (pdev->resource[PCI_ROM_RESOURCE].flags &
+   IORESOURCE_ROM_SHADOW)
+   info.size = 0x2;
+   else
+   break;
+   }
 
/* Is it really there? */
io = pci_map_rom(pdev, );
diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index 88dc646..142c533 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -475,14 +475,19 @@ static void vfio_bar_fixup(struct vfio_pci_device *vdev)
bar = (__le32 *)>vconfig[PCI_ROM_ADDRESS];
 
/*
-* NB. we expose the actual BAR size here, regardless of whether
-* we can read it.  When we report the REGION_INFO for the ROM
-* we report what PCI tells us is the actual ROM size.
+* NB. REGION_INFO will have reported zero size if we weren't able
+* to read the ROM, but we still return the actual BAR size here if
+* it exists (or the shadow ROM space).
 */
if (pci_resource_start(pdev, PCI_ROM_RESOURCE)) {
mask = ~(pci_resource_len(pdev, PCI_ROM_RESOURCE) - 1);
mask |= PCI_ROM_ADDRESS_ENABLE;
*bar &= cpu_to_le32((u32)mask);
+   } else if (pdev->resource[PCI_ROM_RESOURCE].flags &
+   IORESOURCE_ROM_SHADOW) {
+   mask = ~(0x2 - 1);
+   mask |= PCI_ROM_ADDRESS_ENABLE;
+   *bar &= cpu_to_le32((u32)mask);
} else
*bar = 0;
 
diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c
index 210db24..5ffd1d9 100644
--- a/drivers/vfio/pci/vfio_pci_rdwr.c
+++ b/drivers/vfio/pci/vfio_pci_rdwr.c
@@ -124,11 +124,14 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, 
char __user *buf,
void __iomem *io;
ssize_t done;
 
-   if (!pci_resource_start(pdev, bar))
+   if (pci_resource_start(pdev, bar))
+   end = pci_resource_len(pdev, bar);
+   else if (bar == PCI_ROM_RESOURCE &&
+pdev->resource[bar].flags & IORESOURCE_ROM_SHADOW)
+   end = 0x2;
+   else
return -EINVAL;
 
-   end = pci_resource_len(pdev, bar);
-
if (pos >= end)
return -EINVAL;

[PATCH v2 07/11] vfio/pci: Enable virtual register in PCI config space

2016-02-12 Thread Alex Williamson

Typically config space for a device is mapped out into capability
specific handlers and unassigned space.  The latter allows direct
read/write access to config space.  Sometimes we know about registers
living in this void space and would like an easy way to virtualize
them, similar to how BAR registers are managed.  To do this, create
one more pseudo (fake) PCI capability to be handled as purely virtual
space.  Reads and writes are serviced entirely from virtual config
space.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci_config.c  |   34 ++
 drivers/vfio/pci/vfio_pci_private.h |4 
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index fe2b470..88dc646 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -33,9 +33,8 @@
 
 #define PCI_CFG_SPACE_SIZE 256
 
-/* Useful "pseudo" capabilities */
+/* Fake capability ID for standard config space */
 #define PCI_CAP_ID_BASIC   0
-#define PCI_CAP_ID_INVALID 0xFF
 
 #define is_bar(offset) \
((offset >= PCI_BASE_ADDRESS_0 && offset < PCI_BASE_ADDRESS_5 + 4) || \
@@ -301,6 +300,23 @@ static int vfio_raw_config_read(struct vfio_pci_device 
*vdev, int pos,
return count;
 }
 
+/* Virt access uses only virtualization */
+static int vfio_virt_config_write(struct vfio_pci_device *vdev, int pos,
+ int count, struct perm_bits *perm,
+ int offset, __le32 val)
+{
+   memcpy(vdev->vconfig + pos, , count);
+   return count;
+}
+
+static int vfio_virt_config_read(struct vfio_pci_device *vdev, int pos,
+int count, struct perm_bits *perm,
+int offset, __le32 *val)
+{
+   memcpy(val, vdev->vconfig + pos, count);
+   return count;
+}
+
 /* Default capability regions to read-only, no-virtualization */
 static struct perm_bits cap_perms[PCI_CAP_ID_MAX + 1] = {
[0 ... PCI_CAP_ID_MAX] = { .readfn = vfio_direct_config_read }
@@ -319,6 +335,11 @@ static struct perm_bits unassigned_perms = {
.writefn = vfio_raw_config_write
 };
 
+static struct perm_bits virt_perms = {
+   .readfn = vfio_virt_config_read,
+   .writefn = vfio_virt_config_write
+};
+
 static void free_perm_bits(struct perm_bits *perm)
 {
kfree(perm->virt);
@@ -1332,6 +1353,8 @@ static int vfio_cap_init(struct vfio_pci_device *vdev)
pos + i, map[pos + i], cap);
}
 
+   BUILD_BUG_ON(PCI_CAP_ID_MAX >= PCI_CAP_ID_INVALID_VIRT);
+
memset(map + pos, cap, len);
ret = vfio_fill_vconfig_bytes(vdev, pos, len);
if (ret)
@@ -1419,9 +1442,9 @@ static int vfio_ecap_init(struct vfio_pci_device *vdev)
/*
 * Even though ecap is 2 bytes, we're currently a long way
 * from exceeding 1 byte capabilities.  If we ever make it
-* up to 0xFF we'll need to up this to a two-byte, byte map.
+* up to 0xFE we'll need to up this to a two-byte, byte map.
 */
-   BUILD_BUG_ON(PCI_EXT_CAP_ID_MAX >= PCI_CAP_ID_INVALID);
+   BUILD_BUG_ON(PCI_EXT_CAP_ID_MAX >= PCI_CAP_ID_INVALID_VIRT);
 
memset(map + epos, ecap, len);
ret = vfio_fill_vconfig_bytes(vdev, epos, len);
@@ -1597,6 +1620,9 @@ static ssize_t vfio_config_do_rw(struct vfio_pci_device 
*vdev, char __user *buf,
if (cap_id == PCI_CAP_ID_INVALID) {
perm = _perms;
cap_start = *ppos;
+   } else if (cap_id == PCI_CAP_ID_INVALID_VIRT) {
+   perm = _perms;
+   cap_start = *ppos;
} else {
if (*ppos >= PCI_CFG_SPACE_SIZE) {
WARN_ON(cap_id > PCI_EXT_CAP_ID_MAX);
diff --git a/drivers/vfio/pci/vfio_pci_private.h 
b/drivers/vfio/pci/vfio_pci_private.h
index 0710bda..b1e4032 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -25,6 +25,10 @@
 #define VFIO_PCI_INDEX_TO_OFFSET(index)((u64)(index) << 
VFIO_PCI_OFFSET_SHIFT)
 #define VFIO_PCI_OFFSET_MASK   (((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1)
 
+/* Special capability IDs predefined access */
+#define PCI_CAP_ID_INVALID 0xFF/* default raw access */
+#define PCI_CAP_ID_INVALID_VIRT0xFE/* default virt access 
*/
+
 struct vfio_pci_irq_ctx {
struct eventfd_ctx  *trigger;
struct virqfd   *unmask;

[PATCH v2 09/11] vfio/pci: Intel IGD host and LCP bridge config space access

2016-02-12 Thread Alex Williamson

Provide read-only access to PCI config space of the PCI host bridge
and LPC bridge through device specific regions.  This may be used to
configure a VM with matching register contents to satisfy driver
requirements.  Providing this through the vfio file descriptor removes
an additional userspace requirement for access through pci-sysfs and
removes the CAP_SYS_ADMIN requirement that doesn't appear to apply to
the specific devices we're accessing.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c |   15 ++-
 drivers/vfio/pci/vfio_pci_igd.c |  171 +++
 drivers/vfio/pci/vfio_pci_private.h |4 -
 include/uapi/linux/vfio.h   |3 +
 4 files changed, 186 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index cb2624d..74a3752 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -111,6 +111,7 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
 }
 
 static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
+static void vfio_pci_disable(struct vfio_pci_device *vdev);
 
 static int vfio_pci_enable(struct vfio_pci_device *vdev)
 {
@@ -170,10 +171,16 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev)
vdev->has_vga = true;
 
 
-   if (vfio_pci_is_vga(pdev) && pdev->vendor == PCI_VENDOR_ID_INTEL) {
-   if (vfio_pci_igd_opregion_init(vdev) == 0)
-   dev_info(>dev,
-"Intel IGD OpRegion support enabled\n");
+   if (vfio_pci_is_vga(pdev) &&
+   pdev->vendor == PCI_VENDOR_ID_INTEL &&
+   IS_ENABLED(CONFIG_VFIO_PCI_IGD)) {
+   ret = vfio_pci_igd_init(vdev);
+   if (ret) {
+   dev_warn(>pdev->dev,
+"Failed to setup Intel IGD regions\n");
+   vfio_pci_disable(vdev);
+   return ret;
+   }
}
 
return 0;
diff --git a/drivers/vfio/pci/vfio_pci_igd.c b/drivers/vfio/pci/vfio_pci_igd.c
index 3b6a6f7..6394b16 100644
--- a/drivers/vfio/pci/vfio_pci_igd.c
+++ b/drivers/vfio/pci/vfio_pci_igd.c
@@ -55,7 +55,7 @@ static const struct vfio_pci_regops vfio_pci_igd_regops = {
.release= vfio_pci_igd_release,
 };
 
-int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
+static int vfio_pci_igd_opregion_init(struct vfio_pci_device *vdev)
 {
__le32 *dwordp = (__le32 *)(vdev->vconfig + OPREGION_PCI_ADDR);
u32 addr, size;
@@ -109,3 +109,172 @@ int vfio_pci_igd_opregion_init(struct vfio_pci_device 
*vdev)
 
return ret;
 }
+
+static size_t vfio_pci_igd_cfg_rw(struct vfio_pci_device *vdev,
+ char __user *buf, size_t count, loff_t *ppos,
+ bool iswrite)
+{
+   unsigned int i = VFIO_PCI_OFFSET_TO_INDEX(*ppos) - VFIO_PCI_NUM_REGIONS;
+   struct pci_dev *pdev = vdev->region[i].data;
+   loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK;
+   size_t size;
+   int ret;
+
+   if (pos >= vdev->region[i].size || iswrite)
+   return -EINVAL;
+
+   size = count = min(count, (size_t)(vdev->region[i].size - pos));
+
+   if ((pos & 1) && size) {
+   u8 val;
+
+   ret = pci_user_read_config_byte(pdev, pos, );
+   if (ret)
+   return pcibios_err_to_errno(ret);
+
+   if (copy_to_user(buf + count - size, , 1))
+   return -EFAULT;
+
+   pos++;
+   size--;
+   }
+
+   if ((pos & 3) && size > 2) {
+   u16 val;
+
+   ret = pci_user_read_config_word(pdev, pos, );
+   if (ret)
+   return pcibios_err_to_errno(ret);
+
+   val = cpu_to_le16(val);
+   if (copy_to_user(buf + count - size, , 2))
+   return -EFAULT;
+
+   pos += 2;
+   size -= 2;
+   }
+
+   while (size > 3) {
+   u32 val;
+
+   ret = pci_user_read_config_dword(pdev, pos, );
+   if (ret)
+   return pcibios_err_to_errno(ret);
+
+   val = cpu_to_le32(val);
+   if (copy_to_user(buf + count - size, , 4))
+   return -EFAULT;
+
+   pos += 4;
+   size -= 4;
+   }
+
+   while (size >= 2) {
+   u16 val;
+
+   ret = pci_user_read_config_word(pdev, pos, );
+   if (ret)
+   return pcibios_err_to_errno(ret);
+
+   val = cpu_to_le16(val);
+   if (copy_to_user(buf + count - size, , 2))
+   return -EFAULT;
+
+   pos += 2;
+   size -= 2;
+   }
+
+   while (size) {
+   u8 val;
+
+   ret = pci_user_read_config_byte(pdev, pos, );
+   if

[PATCH v2 05/11] vfio: Define device specific region type capability

2016-02-12 Thread Alex Williamson

To this point vfio has only provided an interface to the user that
allows them to determine the number of regions and specifics about
each region.  What the region represents is left to the vfio bus
driver.  vfio-pci chooses to use fixed indexes for fixed resources,
index 0 is BAR0, 1 is BAR1,... 7 is config space, etc.  This works
pretty well since all PCI devices have these regions, even if they
don't necessarily populate all of them.  Then we start to add things
like VGA, which only certain device even support.  We added this the
same way, but now we've wasted a region index, and due to our offset
implementation the corresponding address space, for all devices.

Rather than continuing that process, let's try to make regions self
describing by including a capability that defines their type.  For
vfio-pci we'll make the current VFIO_PCI_NUM_REGIONS fixed, defining
the end of the static indexes and the beginning of self describing
regions.

Signed-off-by: Alex Williamson 
---
 include/uapi/linux/vfio.h |   31 ++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index fde7b1e..1c37a0e 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -252,6 +252,34 @@ struct vfio_region_info_cap_sparse_mmap {
struct vfio_region_sparse_mmap_area areas[];
 };
 
+/*
+ * The device specific type capability allows regions unique to a specific
+ * device or class of devices to be exposed.  This helps solve the problem for
+ * vfio bus drivers of defining which region indexes correspond to which region
+ * on the device, without needing to resort to static indexes, as done by
+ * vfio-pci.  For instance, if we were to go back in time, we might remove
+ * VFIO_PCI_VGA_REGION_INDEX and let vfio-pci simply define that all indexes
+ * greater than or equal to VFIO_PCI_NUM_REGIONS are device specific and we'd
+ * make a "VGA" device specific type to describe the VGA access space.  This
+ * means that non-VGA devices wouldn't need to waste this index, and thus the
+ * address space associated with it due to implementation of device file
+ * descriptor offsets in vfio-pci.
+ *
+ * The current implementation is now part of the user ABI, so we can't use this
+ * for VGA, but there are other upcoming use cases, such as opregions for Intel
+ * IGD devices and framebuffers for vGPU devices.  We missed VGA, but we'll
+ * use this for future additions.
+ *
+ * The structure below defines version 1 of this capability.
+ */
+#define VFIO_REGION_INFO_CAP_TYPE  2
+
+struct vfio_region_info_cap_type {
+   struct vfio_info_cap_header header;
+   __u32 type; /* global per bus driver */
+   __u32 subtype;  /* type specific */
+};
+
 /**
  * VFIO_DEVICE_GET_IRQ_INFO - _IOWR(VFIO_TYPE, VFIO_BASE + 9,
  * struct vfio_irq_info)
@@ -387,7 +415,8 @@ enum {
 * between described ranges are unimplemented.
 */
VFIO_PCI_VGA_REGION_INDEX,
-   VFIO_PCI_NUM_REGIONS
+   VFIO_PCI_NUM_REGIONS = 9 /* Fixed user ABI, region indexes >=9 use */
+/* device specific cap to define content. */
 };
 
 enum {

[PATCH v2 00/11] vfio: capability chains, sparse mmaps, device specific regions, IGD support

2016-02-12 Thread Alex Williamson

v2:

v2 includes more IGD support.  Read only access to the host bridge
and LPC bridge config space is provided to allow configuration of
emulated devices for VM use cases.  We also try to hide the stolen
memory window from the user.  This probably needs additional work as
I'd prefer not to need to update the code for every new graphics chip.
Perhaps we should instead blacklist anything pre-SandyBridge, identify
SandyBridge specifically, and hope that anything unknown after that
uses the gen8+ register layout.  Additionally since IGD doesn't have
a real ROM BAR, but does typically have its vBIOS in shadow ROM space,
expose that as if it were a ROM BAR.  This series should be used in
conjunction with:

Subject: [PATCH] pci: Wait for up to an additional 1000ms after FLR reset

for use on laptops.

v1:

We have a number of cases were we want to extend the vfio API to
provide further details in the vfio INFO ioctls.  For instance we take
it as implicit that we can't mmap over MSI-X vector tables of a BAR,
but we'd prefer to have the API define that explicitly as a sparse
mmap capable region.  We have some devices that need additional
regions, but we don't want to "burn" a region index for something
specific to a single device.  We also have the ongoing problem of
describing valid IOVA ranges for an IOMMU.  This series doesn't solve
every case of those problems, but it solves some and gives us the vfio
level API to solve the others.

To do this we use capability chains, much like they're used in PCI.
A flag bit in the INFO ioctl structure tells us whether a capability
chain is present and new fields are defined to provide the buffer
index of the first capability.  Each capability provides the start
index of the next capability along with an identifier and version of
itself.  The existing argsz field of is used to convey to the user the
necessary buffer size to retrieve all of the capabilities.  A few
helpers in the vfio core simplifies the mechanics of adding
capabilities for the bus and iommu drivers to make use of.

The sparse mmap capability solves the problem of regions which can
only be partially mmaped, such as when an MSI-X table is present.
This is also expected to be useful for vGPU support should a device
have a mix of direct access and emulated access within the same
region.

The device specific region capability allows us to easily add new
regions that are device specific.  Included here is the IGD OpRegion,
which is a host memory region exclusively for the configuration and
use of Intel graphics devices, but is not part of the device in the
PCI sense.  There are potentially other regions we can expose on this
device to further facilitate use of it.

I particularly welcome feedback on how we identify device specific
regions.  Here I've used a type and sub-type field where I've defined
one bit of the type field to identify a vendor specific type with a
mask to identify the vendor.  In the Opregion case here, that defines
an 8086 set of sub-types where I've simply defined sub-type 1 as an
IGD OpRegion.  We could of course get the vendor from the device
itself, but this method might promote code re-use if we eventually
have multiple vendors using regions for the same purpose.  At least
that's my thinking.

Appreciate feedback.  Thanks,

Alex

---

Alex Williamson (11):
  vfio: Define capability chains
  vfio: Add capability chain helpers
  vfio: Define sparse mmap capability for regions
  vfio/pci: Include sparse mmap capability for MSI-X table regions
  vfio: Define device specific region type capability
  vfio/pci: Add infrastructure for additional device specific regions
  vfio/pci: Enable virtual register in PCI config space
  vfio/pci: Intel IGD OpRegion support
  vfio/pci: Intel IGD host and LCP bridge config space access
  vfio/pci: Hide stolen memory from the user
  vfio/pci: Expose shadow ROM as PCI option ROM


 drivers/vfio/pci/Kconfig|4 
 drivers/vfio/pci/Makefile   |1 
 drivers/vfio/pci/vfio_pci.c |  176 -
 drivers/vfio/pci/vfio_pci_config.c  |   45 
 drivers/vfio/pci/vfio_pci_igd.c |  360 +++
 drivers/vfio/pci/vfio_pci_private.h |   39 
 drivers/vfio/pci/vfio_pci_rdwr.c|9 +
 drivers/vfio/vfio.c |   54 +
 include/linux/vfio.h|   11 +
 include/uapi/linux/vfio.h   |   92 +
 10 files changed, 772 insertions(+), 19 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_igd.c

[PATCH v2 04/11] vfio/pci: Include sparse mmap capability for MSI-X table regions

2016-02-12 Thread Alex Williamson

vfio-pci has never allowed the user to directly mmap the MSI-X vector
table, but we've always relied on implicit knowledge of the user that
they cannot do this.  Now that we have capability chains that we can
expose in the region info ioctl and a sparse mmap capability that
represents the sub-areas within the region that can be mmap'd, we can
make the mmap constraints more explicit.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/pci/vfio_pci.c |   73 ++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 2760a7b..4682207 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -421,6 +421,48 @@ static int vfio_pci_for_each_slot_or_bus(struct pci_dev 
*pdev,
return walk.ret;
 }
 
+static int msix_sparse_mmap_cap(struct vfio_pci_device *vdev,
+   struct vfio_info_cap *caps)
+{
+   struct vfio_info_cap_header *header;
+   struct vfio_region_info_cap_sparse_mmap *sparse;
+   size_t end, size;
+   int nr_areas = 2, i = 0;
+
+   end = pci_resource_len(vdev->pdev, vdev->msix_bar);
+
+   /* If MSI-X table is aligned to the start or end, only one area */
+   if (((vdev->msix_offset & PAGE_MASK) == 0) ||
+   (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) >= end))
+   nr_areas = 1;
+
+   size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas));
+
+   header = vfio_info_cap_add(caps, size,
+  VFIO_REGION_INFO_CAP_SPARSE_MMAP, 1);
+   if (IS_ERR(header))
+   return PTR_ERR(header);
+
+   sparse = container_of(header,
+ struct vfio_region_info_cap_sparse_mmap, header);
+   sparse->nr_areas = nr_areas;
+
+   if (vdev->msix_offset & PAGE_MASK) {
+   sparse->areas[i].offset = 0;
+   sparse->areas[i].size = vdev->msix_offset & PAGE_MASK;
+   i++;
+   }
+
+   if (PAGE_ALIGN(vdev->msix_offset + vdev->msix_size) < end) {
+   sparse->areas[i].offset = PAGE_ALIGN(vdev->msix_offset +
+vdev->msix_size);
+   sparse->areas[i].size = end - sparse->areas[i].offset;
+   i++;
+   }
+
+   return 0;
+}
+
 static long vfio_pci_ioctl(void *device_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -451,6 +493,8 @@ static long vfio_pci_ioctl(void *device_data,
} else if (cmd == VFIO_DEVICE_GET_REGION_INFO) {
struct pci_dev *pdev = vdev->pdev;
struct vfio_region_info info;
+   struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+   int ret;
 
minsz = offsetofend(struct vfio_region_info, offset);
 
@@ -479,8 +523,15 @@ static long vfio_pci_ioctl(void *device_data,
 VFIO_REGION_INFO_FLAG_WRITE;
if (IS_ENABLED(CONFIG_VFIO_PCI_MMAP) &&
pci_resource_flags(pdev, info.index) &
-   IORESOURCE_MEM && info.size >= PAGE_SIZE)
+   IORESOURCE_MEM && info.size >= PAGE_SIZE) {
info.flags |= VFIO_REGION_INFO_FLAG_MMAP;
+   if (info.index == vdev->msix_bar) {
+   ret = msix_sparse_mmap_cap(vdev, );
+   if (ret)
+   return ret;
+   }
+   }
+
break;
case VFIO_PCI_ROM_REGION_INDEX:
{
@@ -520,6 +571,26 @@ static long vfio_pci_ioctl(void *device_data,
return -EINVAL;
}
 
+   if (caps.size) {
+   info.flags |= VFIO_REGION_INFO_FLAG_CAPS;
+   if (info.argsz < sizeof(info) + caps.size) {
+   info.argsz = sizeof(info) + caps.size;
+   info.cap_offset = 0;
+   } else {
+   vfio_info_cap_shift(, sizeof(info));
+   ret = copy_to_user((void __user *)arg +
+  sizeof(info), caps.buf,
+  caps.size);
+   if (ret) {
+   kfree(caps.buf);
+   return ret;
+   }
+   info.cap_offset = sizeof(info);
+   }
+
+   kfree(caps.buf);
+   }
+
return copy_to_user((void __user *)arg, , minsz);
 
} else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) {

[PATCH v2 02/11] vfio: Add capability chain helpers

2016-02-12 Thread Alex Williamson

Allow sub-modules to easily reallocate a buffer for managing
capability chains for info ioctls.

Signed-off-by: Alex Williamson 
---
 drivers/vfio/vfio.c  |   54 ++
 include/linux/vfio.h |   11 ++
 2 files changed, 65 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 4cc961b..6fd6fa5 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1729,6 +1729,60 @@ long vfio_external_check_extension(struct vfio_group 
*group, unsigned long arg)
 EXPORT_SYMBOL_GPL(vfio_external_check_extension);
 
 /**
+ * Sub-module support
+ */
+/*
+ * Helper for managing a buffer of info chain capabilities, allocate or
+ * reallocate a buffer with additional @size, filling in @id and @version
+ * of the capability.  A pointer to the new capability is returned.
+ *
+ * NB. The chain is based at the head of the buffer, so new entries are
+ * added to the tail, vfio_info_cap_shift() should be called to fixup the
+ * next offsets prior to copying to the user buffer.
+ */
+struct vfio_info_cap_header *vfio_info_cap_add(struct vfio_info_cap *caps,
+  size_t size, u16 id, u16 version)
+{
+   void *buf;
+   struct vfio_info_cap_header *header, *tmp;
+
+   buf = krealloc(caps->buf, caps->size + size, GFP_KERNEL);
+   if (!buf) {
+   kfree(caps->buf);
+   caps->size = 0;
+   return ERR_PTR(-ENOMEM);
+   }
+
+   caps->buf = buf;
+   header = buf + caps->size;
+
+   /* Eventually copied to user buffer, zero */
+   memset(header, 0, size);
+
+   header->id = id;
+   header->version = version;
+
+   /* Add to the end of the capability chain */
+   for (tmp = caps->buf; tmp->next; tmp = (void *)tmp + tmp->next)
+   ; /* nothing */
+
+   tmp->next = caps->size;
+   caps->size += size;
+
+   return header;
+}
+EXPORT_SYMBOL_GPL(vfio_info_cap_add);
+
+void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset)
+{
+   struct vfio_info_cap_header *tmp;
+
+   for (tmp = caps->buf; tmp->next; tmp = (void *)tmp + tmp->next - offset)
+   tmp->next += offset;
+}
+EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
+
+/**
  * Module/class support
  */
 static char *vfio_devnode(struct device *dev, umode_t *mode)
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 610a86a..0ecae0b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -92,6 +92,17 @@ extern int vfio_external_user_iommu_id(struct vfio_group 
*group);
 extern long vfio_external_check_extension(struct vfio_group *group,
  unsigned long arg);
 
+/*
+ * Sub-module helpers
+ */
+struct vfio_info_cap {
+   struct vfio_info_cap_header *buf;
+   size_t size;
+};
+extern struct vfio_info_cap_header *vfio_info_cap_add(
+   struct vfio_info_cap *caps, size_t size, u16 id, u16 version);
+extern void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset);
+
 struct pci_dev;
 #ifdef CONFIG_EEH
 extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);

Re: [PATCH 0/5] ARM: dts: NSP: DT Updates

2016-02-12 Thread Florian Fainelli

On 05/02/16 14:43, Jon Mason wrote:
> Northstar Plus device tree changes.  The first 2 are bug fixes that
> probably should go in ASAP.  The other 3 enable new hardware and can be
> pushed into the next merge window.

Series applied to devicetree/next, with the update to the last patch
based on Sergei's comment, and I took the v2 of patch 4.

Thanks everyone.
-- 
Florian

Re: [PATCH 0/6] More updates for NS2 DT

2016-02-12 Thread Florian Fainelli

On 09/02/16 22:10, Anup Patel wrote:
> This patchset primarily adds more DT nodes for NS2 SVK. It also does
> minor update to arch/arm64/Kconfig.platforms and adds missing DT
> bindings document for sp805 driver.
> 
> The patchset is based on v4.5-rc3 tag and is available in ns2_dt2_v1
> branch of https://github.com/Broadcom/arm64-linux.git
> 
> All patches have been tested on Broadcom NS2 SVK.
> 
> Anup Patel (5):
>   arm64: Select COMMON_CLK_IPROC, PINCTRL and GPIOLIB for iProc SoCs
>   arm64: dts: Add SDHCI DT node for NS2
>   arm64: dts: Add ARM SP804 timer DT nodes for NS2
>   dt-bindings: watchdog: Add ARM SP805 DT bindings
>   arm64: dts: Add ARM SP805 watchdog DT node for NS2
> 
> Ray Jui (1):
>   arm64: dts: Add PCIe0 and PCIe4 DT nodes for NS2

Patch 1 applied to soc-arm64/next

Patch 2-6 applied to devicetree-arm64/next, with Rob's Acked-by and the
suggested rename from wdt@ to watchdog@ that he recommended, thanks
everyone!
-- 
Florian

[PATCH] pci: Wait for up to an additional 1000ms after FLR reset

2016-02-12 Thread Alex Williamson

Some devices take longer than the spec indicates to return from FLR
reset, a notable case of this is Intel integrated graphics (IGD),
which can often take an additional 300ms powering down an attached
LCD panel as part of the FLR.  Allow devices up to an additional
1000ms, testing every 100ms whether the first dword of config space
is read as -1.

Signed-off-by: Alex Williamson 
---

Copying KVM list as this patch is required for IGD assignment.

 drivers/pci/pci.c |   21 +
 1 file changed, 21 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 602eb42..3b90a42 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3414,6 +3414,25 @@ int pci_wait_for_pending_transaction(struct pci_dev *dev)
 }
 EXPORT_SYMBOL(pci_wait_for_pending_transaction);
 
+static void pci_wait_alive(struct pci_dev *dev)
+{
+   int i;
+   u32 id;
+
+   for (i = 0; i < 10; i++) {
+   pci_read_config_dword(dev, PCI_VENDOR_ID, );
+   if (~id != 0) {
+   if (i > 0)
+   dev_info(>dev, "Required additional %d"
+"ms to return from reset\n", i * 100);
+   return;
+   }
+   msleep(100);
+   }
+
+   dev_warn(>dev, "Failed to return from reset\n");
+}
+
 static int pcie_flr(struct pci_dev *dev, int probe)
 {
u32 cap;
@@ -3430,6 +3449,7 @@ static int pcie_flr(struct pci_dev *dev, int probe)
 
pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_BCR_FLR);
msleep(100);
+   pci_wait_alive(dev);
return 0;
 }
 
@@ -3460,6 +3480,7 @@ static int pci_af_flr(struct pci_dev *dev, int probe)
 
pci_write_config_byte(dev, pos + PCI_AF_CTRL, PCI_AF_CTRL_FLR);
msleep(100);
+   pci_wait_alive(dev);
return 0;
 }

[PATCH] mm/Kconfig: remove redundant arch depend for memory hotplug

2016-02-12 Thread Yang Shi

MEMORY_HOTPLUG already depends on ARCH_ENABLE_MEMORY_HOTPLUG which is selected
by the supported architectures, so the following arch depend is unnecessary.

Signed-off-by: Yang Shi 
---
 mm/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 03cbfa0..c077765 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -187,7 +187,6 @@ config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
depends on SPARSEMEM || X86_64_ACPI_NUMA
depends on ARCH_ENABLE_MEMORY_HOTPLUG
-   depends on (IA64 || X86 || PPC_BOOK3S_64 || SUPERH || S390)
 
 config MEMORY_HOTPLUG_SPARSE
def_bool y
-- 
2.0.2

[PATCH] ARM: dts: pxa: fix dma engine node to pxa3xx-nand

2016-02-12 Thread Robert Jarzmik

Since the switch from mmp_pdma to pxa_dma driver for pxa architectures,
the pxa_dma requires 2 arguments, namely the requestor line and the
requested priority.

Fix the only left device node which was still passing only one argument,
making the pxa3xx-nand driver misbehave in a device-tree configuration,
ie. failing all data transfers.

Fixes: c943646d1f49 ("ARM: dts: pxa: add dma engine node to pxa3xx-nand")
Signed-off-by: Robert Jarzmik 
---
 arch/arm/boot/dts/pxa3xx.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/pxa3xx.dtsi b/arch/arm/boot/dts/pxa3xx.dtsi
index bea454f50ff9..fec47bcd8292 100644
--- a/arch/arm/boot/dts/pxa3xx.dtsi
+++ b/arch/arm/boot/dts/pxa3xx.dtsi
@@ -31,7 +31,7 @@
reg = <0x4310 90>;
interrupts = <45>;
clocks = < CLK_NAND>;
-   dmas = < 97>;
+   dmas = < 97 3>;
dma-names = "data";
#address-cells = <1>;
#size-cells = <1>;  
-- 
2.1.4

[ANNOUNCE] 4.4.1-rt6

2016-02-12 Thread Sebastian Andrzej Siewior

Dear RT folks!

I'm pleased to announce the v4.4.1-rt6 patch set.
Changes since v4.4.1-rt5:

- The rtmutex wait_lock is taken with interrupts disabled again. It
  fixes a possible deadlock in the posix timer code. Patch by Thomas
  Gleixner.

- Don't disable interrupts around atomic_dec_and_lock() in
  wb_congested_put()

- use a RCU lock in call_step_hook() on ARM64 to avoid sleeping while
  atomic issue. Patch by Yang Shi.

- In migrate_disable() we use now the fast / atomic path if were are
  called with interrupts disabled. This avoids a recursion with lockdep
  in some cases.

- The migrate_disable()/_enable() invocation has been moved from the
  locking macro into the used rt_mutex functions. This makes the kernel a
  tiny bit smaller.

- We now try to invoke migrate_enable() before we schedule() out while
  waiting for a lock. This optimization should allow the scheduler to
  put the task on another CPU once it became runnable and the original
  CPU is busy. This does not work for nested locks. Patch by Thomas
  Gleixner.

- The stop_machine.c was converted to use raw_locks. This patch has been
  identified to cause problems during hotplug and was reverted.

- There is a useless rcu_bh thread which has been deactivated.

- Manish Jaggi reported that a sleeping while atomic issue on AMR64 with
  KVM. Josh Cartwright sent a patch.

Known issues:
  - bcache stays disabled

  - CPU hotplug got a little better but can deadlock.

  - The netlink_release() OOPS, reported by Clark, is still on the
list, but unsolved due to lack of information.
Since Clark can not reproduce it anymore and hasn't seen it, it will
be removed from this list and moved to the bugzilla.

The delta patch against 4.4.1-rt5 is appended below and can be found here:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.4/incr/patch-4.4.1-rt5-rt6.patch.xz

You can get this release via the git tree at:

git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git 
v4.4.1-rt6

The RT patch against 4.4.1 can be found here:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.4/patch-4.4.1-rt6.patch.xz

The split quilt queue is available at:


https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.4/patches-4.4.1-rt6.tar.xz

Sebastian

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 4f5c42a0924c..2ce9cc2717ac 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -568,7 +568,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 * involves poking the GIC, which must be done in a
 * non-preemptible context.
 */
-   preempt_disable();
+   migrate_disable();
kvm_timer_flush_hwstate(vcpu);
kvm_vgic_flush_hwstate(vcpu);
 
@@ -587,7 +587,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
local_irq_enable();
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
-   preempt_enable();
+   migrate_enable();
continue;
}
 
@@ -641,7 +641,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
kvm_vgic_sync_hwstate(vcpu);
 
-   preempt_enable();
+   migrate_enable();
 
ret = handle_exit(vcpu, run, ret);
}
diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index 8aee3aeec3e6..c1492ba1f6d1 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -186,20 +186,21 @@ static void clear_regs_spsr_ss(struct pt_regs *regs)
 
 /* EL1 Single Step Handler hooks */
 static LIST_HEAD(step_hook);
-static DEFINE_RWLOCK(step_hook_lock);
+static DEFINE_SPINLOCK(step_hook_lock);
 
 void register_step_hook(struct step_hook *hook)
 {
-   write_lock(_hook_lock);
-   list_add(>node, _hook);
-   write_unlock(_hook_lock);
+   spin_lock(_hook_lock);
+   list_add_rcu(>node, _hook);
+   spin_unlock(_hook_lock);
 }
 
 void unregister_step_hook(struct step_hook *hook)
 {
-   write_lock(_hook_lock);
-   list_del(>node);
-   write_unlock(_hook_lock);
+   spin_lock(_hook_lock);
+   list_del_rcu(>node);
+   spin_unlock(_hook_lock);
+   synchronize_rcu();
 }
 
 /*
@@ -213,15 +214,15 @@ static int call_step_hook(struct pt_regs *regs, unsigned 
int esr)
struct step_hook *hook;
int retval = DBG_HOOK_ERROR;
 
-   read_lock(_hook_lock);
+   rcu_read_lock();
 
-   list_for_each_entry(hook, _hook, node) {
+   list_for_each_entry_rcu(hook, _hook, node) {
retval = hook->fn(regs, esr);
if (retval == DBG_HOOK_HANDLED)
break;
}
 
-   read_unlock(_hook_lock);
+   rcu_read_unlock();
 
return retval;
 }
diff --git

Re: call_usermodehelper in containers

2016-02-12 Thread Ian Kent

On Fri, 2013-11-15 at 15:54 +0400, Stanislav Kinsbursky wrote:
> 15.11.2013 15:03, Eric W. Biederman пишет:
> > Stanislav Kinsbursky  writes:
> > 
> > > 12.11.2013 17:30, Jeff Layton пишет:
> > > > On Tue, 12 Nov 2013 17:02:36 +0400
> > > > Stanislav Kinsbursky  wrote:
> > > > 
> > > > > 12.11.2013 15:12, Jeff Layton пишет:
> > > > > > On Mon, 11 Nov 2013 16:47:03 -0800
> > > > > > Greg KH  wrote:
> > > > > > 
> > > > > > > On Mon, Nov 11, 2013 at 07:18:25AM -0500, Jeff Layton
> > > > > > > wrote:
> > > > > > > > We have a bit of a problem wrt to upcalls that use
> > > > > > > > call_usermodehelper
> > > > > > > > with containers and I'd like to bring this to some sort
> > > > > > > > of resolution...
> > > > > > > > 
> > > > > > > > A particularly problematic case (though there are
> > > > > > > > others) is the
> > > > > > > > nfsdcltrack upcall. It basically uses
> > > > > > > > call_usermodehelper to run a
> > > > > > > > program in userland to track some information on stable
> > > > > > > > storage for
> > > > > > > > nfsd.
> > > > > > > 
> > > > > > > I thought the discussion at the kernel summit about this
> > > > > > > issue was:
> > > > > > >   - don't do this.
> > > > > > >   - don't do it.
> > > > > > >   - if you really need to do this, fix nfsd
> > > > > > > 
> > > > > > 
> > > > > > Sorry, I couldn't make the kernel summit so I missed that
> > > > > > discussion. I
> > > > > > guess LWN didn't cover it?
> > > > > > 
> > > > > > In any case, I guess then that we'll either have to come up
> > > > > > with some
> > > > > > way to fix nfsd here, or simply ensure that nfsd can never
> > > > > > be started
> > > > > > unless root in the container has a full set of a full set of
> > > > > > capabilities.
> > > > > > 
> > > > > > One sort of Rube Goldberg possibility to fix nfsd is:
> > > > > > 
> > > > > > - when we start nfsd in a container, fork off an extra
> > > > > > kernel thread
> > > > > >  that just sits idle. That thread would need to be a
> > > > > > descendant of the
> > > > > >  userland process that started nfsd, so we'd need to
> > > > > > create it with
> > > > > >  kernel_thread().
> > > > > > 
> > > > > > - Have the kernel just start up the UMH program in the
> > > > > > init_ns mount
> > > > > >  namespace as it currently does, but also pass the pid
> > > > > > of the idle
> > > > > >  kernel thread to the UMH upcall.
> > > > > > 
> > > > > > - The program will then use /proc//root and
> > > > > > /proc//ns/* to set
> > > > > >  itself up for doing things properly.
> > > > > > 
> > > > > > Note that with this mechanism we can't actually run a
> > > > > > different binary
> > > > > > per container, but that's probably fine for most purposes.
> > > > > > 
> > > > > 
> > > > > Hmmm... Why we can't? We can go a bit further with userspace
> > > > > idea.
> > > > > 
> > > > > We use UMH some very limited number of user programs. For 2,
> > > > > actually:
> > > > > 1) /sbin/nfs_cache_getent
> > > > > 2) /sbin/nfsdcltrack
> > > > > 
> > > > 
> > > > No, the kernel uses them for a lot more than that. Pretty much
> > > > all of
> > > > the keys API upcalls use it. See all of the callers of
> > > > call_usermodehelper. All of them are running user binaries out
> > > > of the
> > > > kernel, and almost all of them are certainly broken wrt
> > > > containers.
> > > > 
> > > > > If we convert them into proxies, which use /proc//root
> > > > > and /proc//ns/*, this will allow us to lookup the right
> > > > > binary.
> > > > > The only limitation here is presence of this "proxy" binaries
> > > > > on "host".
> > > > > 
> > > > 
> > > > Suppose I spawn my own container as a user, using all of this
> > > > spiffy
> > > > new user namespace stuff. Then I make the kernel use
> > > > call_usermodehelper to call the upcall in the init_ns, and then
> > > > trick
> > > > it into running my new "escape_from_namespace" program with
> > > > "real" root
> > > > privileges.
> > > > 
> > > > I don't think we can reasonably assume that having the kernel
> > > > exec an
> > > > arbitrary binary inside of a container is safe. Doing so inside
> > > > of the
> > > > init_ns is marginally more safe, but only marginally so...
> > > > 
> > > > > And we don't need any significant changes in kernel.
> > > > > 
> > > > > BTW, Jeff, could you remind me, please, why exactly we need to
> > > > > use UMH to run the binary?
> > > > > What are this capabilities, which force us to do so?
> > > > > 
> > > > 
> > > > Nothing _forces_ us to do so, but upcalls are very difficult to
> > > > handle,
> > > > and UMH has a lot of advantages over a long-running daemon
> > > > launched by
> > > > userland.
> > > > 
> > > > Originally, I created the nfsdcltrack upcall as a running daemon
> > > > called
> > > > nfsdcld, and the kernel used rpc_pipefs to communicate with it.
> > > > 
> > > > Everyone hated it because no one likes to have to run daemons
> > > > for
> > > > infrequently used upcalls. It's a pain for

Re: [PATCH 3/3] ACPI: Change NFIT driver to set PMEM type to iomem entry

2016-02-12 Thread Dan Williams

On Fri, Feb 12, 2016 at 2:30 PM, Toshi Kani  wrote:
> On Fri, 2016-02-12 at 11:41 -0800, Dan Williams wrote:
>> On Tue, Feb 2, 2016 at 10:55 AM, Toshi Kani  wrote:
>> > Change acpi_nfit_register_region() to call iomem_set_desc() with
>> > IORES_DESC_PERSISTENT_MEMORY for NFIT_SPA_PM ranges found in ACPI
>> > NFIT table.
>> >
>> > When FW sets E820_PMEM in e820 and EFI_PERSISTENT_MEMORY in EFI,
>> > this code simply sets PMEM type again to "Persistent Memory" entries
>> > in the iomem table.  When FW sets reserved type for persistent
>> > memory ranges, it sets PMEM type to "reserved" entries covering
>> > PMEM ranges.
>> >
>> > This allows the EINJ driver, which calls region_intersects() with
>> > IORES_DESC_PERSISTENT_MEMORY to check persistent memory ranges,
>> > to work continuously even if FW sets reserved type to persistent
>> > memory in e820 and EFI.
>> >
>> > Signed-off-by: Toshi Kani 
>> > Cc: Rafael J. Wysocki 
>> > Cc: Dan Williams 
>> > Cc: Ingo Molnar 
>> > Cc: Borislav Petkov 
>> > Cc: Andrew Morton 
>> > ---
>> >  drivers/acpi/nfit.c |6 ++
>> >  1 file changed, 6 insertions(+)
>> >
>> > diff --git a/drivers/acpi/nfit.c b/drivers/acpi/nfit.c
>> > index ad6d8c6..add04f0 100644
>> > --- a/drivers/acpi/nfit.c
>> > +++ b/drivers/acpi/nfit.c
>> > @@ -1781,6 +1781,12 @@ static int acpi_nfit_register_region(struct
>> > acpi_nfit_desc *acpi_desc,
>> >
>> > nvdimm_bus = acpi_desc->nvdimm_bus;
>> > if (nfit_spa_type(spa) == NFIT_SPA_PM) {
>> > +   rc = iomem_set_desc(spa->address, spa->length,
>> > +   IORES_DESC_PERSISTENT_MEMORY);
>> > +   if (rc)
>> > +   dev_dbg(acpi_desc->dev,
>> > +   "error setting iomem desc: %d\n", rc);
>> > +
>>
>> Hmm, if we set the type on driver load, should we clear the type on
>> driver unload?
>
> I think this type update should stay for the life-cycle of this iomem entry
> itself since this range is PMEM even after the driver is unloaded.  This is
> an extension of the boot-time iomem table initialization from e820/EFI,
> which allows ACPI to set a correct type.  This is independent from driver's
> resource allocations.
>
>> Actually it might be more straightforward to specify a type at
>> request_region() time.  That way it gets released at release_region().
>> We're already setting a resource name at request_region time, adding a
>> type annotation at the time seems appropriate.
>
> I first considered simply setting "namespaceX.X" as PMEM.  However,
> region_intersects() and its friends only check the top-level entries, not
> their children, of the iomem table.  And I think a child should have the
> same type as the parent as I fixed it in patch 1/3.

Did we investigate updating region_intersects() to check children?
When a child sub-divides a region with different types it may be the
wrong answer to check the parent.  Is there a problem with moving
checking to the child?

RE: [lustre-devel] [PATCH 5/7] staging:lustre: simplify libcfs_psdev_[open|release]

2016-02-12 Thread Simmons, James A.

>> diff --git a/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c 
>> b/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c
>> index 33f6036..64f0fbf 100644
>> --- a/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c
>> +++ b/drivers/staging/lustre/lustre/libcfs/linux/linux-module.c
>> @@ -98,30 +98,22 @@ int libcfs_ioctl_popdata(void *arg, void *data, int size)
>>  static int
>>  libcfs_psdev_open(struct inode *inode, struct file *file)
>>  {
>> -intrc = 0;
>> -
>>  if (!inode)
>>  return -EINVAL;
>> -if (libcfs_psdev_ops.p_open != NULL)
>> -rc = libcfs_psdev_ops.p_open(0, NULL);
>> -else
>> -return -EPERM;
>> -return rc;
>> +
>> +try_module_get(THIS_MODULE);
>
>Note, code like this is racy and incorrect and never needed, please fix
>this up properly (hint, set the module in the file operations.)
>
>Again, if you ever see code with that line, it is incorrect.

So simple

static struct file_operations libcfs_fops = {
.module  = THIS_MODULE,
.unlocked_ioctl = libcfs_psdev_ioctl,
};

With the open and release deleted should do the trick then.

Re: [REGRESSION] i915: No HDMI output with 4.4

2016-02-12 Thread Ville Syrjälä

On Fri, Feb 12, 2016 at 09:52:03AM +0200, Oleksandr Natalenko wrote:
> Ville,
> 
> I've applied patch you've provided and did couple of replugging with 
> intel_reg in between. Here are the results.
> 
> I used additional VGA cable to see what actually I type in console :).
>

My life would have been a bit easier if you had included
the reg dumps in the mail. Copy paste manually this time.

> Both HDMI and VGA cables plugged: [1]
(0x000c4000): 0x0008
(0x000c4004): 0xf1b5
(0x000c4008): 0x
(0x000c400c): 0x
(0x000c4030): 0x00101010
> Both HDMI and VGA cables unplugged: [2]
(0x000c4000): 0x
(0x000c4004): 0xf1b5
(0x000c4008): 0x
(0x000c400c): 0x
(0x000c4030): 0x00101010
> Only HDMI cable plugged: [3]
(0x000c4000): 0x
(0x000c4004): 0xf1b5
(0x000c4008): 0x
(0x000c400c): 0x
(0x000c4030): 0x00101010
> Only VGA cable plugged: [4]
(0x000c4000): 0x0008
(0x000c4004): 0xf1b4
(0x000c4008): 0x
(0x000c400c): 0x
(0x000c4030): 0x00101010

What these show is that the live status for the digital ports never
goes to 1, which is rather wtf. VGA gets reported correctly. Everything
else looks normal to me.

> And here goes dmesg with all the stuff logged: [5]

лют 12 09:37:01 pfactum.lanet kernel: port C live status
  
__
  
__
  
__
  
__
  
__

Same deal here. The live status never indicates anything being 
present during the 250ms that we poll it.

Few other ideas:
- Was the monitor sleeping when you tried this? Can you maybe push
  some button on it and then immediately run the intel_reg read command
  again?
- Do you have another monitor to try?
- Do you have another cable to try?
- Maybe the pullup/down on the hpd line is misconfigured or something.
  Any chance of updating the BIOS on the machine?
- What does 'intel_reg read 0xc2000 0xc2004 0xc2020' say?
- The spec claims the TMDS vs. SDVO select has something to do with
  hpd generation. I can't see any difference on my IVB though, so not
  sure it's really true.

  What does 'intel_reg read 0xe1140 0xe1150 0xe1160' tell us?
  
  Let's try these anyway (with the cable plugged in):
 
  intel_reg write 0xe1140 0x0
  intel_reg write 0xe1150 0x0
  intel_reg write 0xe1160 0x0
  sleep 1
  intel_reg read 0xc4000

  intel_reg write 0xe1140 0x800
  intel_reg write 0xe1150 0x800
  intel_reg write 0xe1160 0x800
  sleep 1
  intel_reg read 0xc4000

  intel_reg write 0xe1140 0x800800
  intel_reg write 0xe1150 0x800800
  intel_reg write 0xe1160 0x800800
  sleep 1
  intel_reg read 0xc4000

  intel_reg write 0xe1140 0x80
  intel_reg write 0xe1150 0x80
  intel_reg write 0xe1160 0x80
  sleep 1
  intel_reg read 0xc4000

> 
> Hope this helps.
> 
> [1] https://gist.github.com/58a0eb50dcf84e104555
> [2] https://gist.github.com/7e8749a3e2cc58ea8aac
> [3] https://gist.github.com/9d76930da7380634b845
> [4] https://gist.github.com/c0d2e2f64242ad4f01f2
> [5] https://gist.github.com/fda3b9fed3ca4d31cd20
> 
> 11.02.2016 16:01, Ville Syrjälä wrote:
> > OK, so the hpd interrupt does happen, and yet the live status 
> > supposedly
> > claims that nothing is there. Port C live status definitely works here
> > on my IVB, so not sure what the deal is.
> > 
> > Can you grab intel-gpu-tools and run
> > intel_reg read 0xc4000 0xc4004 0xc4008 0xc400c 0xc4030
> > a couple of times after plugging the monitor in, and also run it when
> > nothing is plugged in.
> > 
> > Also you could try something like the following patch so we might
> > observe the live status with a bit more detail. Though the fact that it
> > doesn't seem to work for you even when the monitor was already plugged
> > in is somewhat troubling:
> > 
> > --- a/drivers/gpu/drm/i915/intel_hdmi.c
> > +++ b/drivers/gpu/drm/i915/intel_hdmi.c
> > @@ -1392,12 +1392,17 @@ intel_hdmi_detect(struct

Re: [PATCH v6 0/3] cpufreq: Replace timers with utilization update callbacks

2016-02-12 Thread Rafael J. Wysocki

On Fri, Feb 12, 2016 at 6:33 PM, Doug Smythies  wrote:
> On 2016.02.12 05:39 Rafael J. Wysocki wrote:
>> On Fri, Feb 12, 2016 at 8:25 AM, Doug Smythies  wrote:
>>> On 2016.02.11 14:50 Doug Smythies wrote:
 On 2016.02.10 22:03 Srinivas Pandruvada wrote:
> On Wednesday, February 10, 2016 03:11:43 PM Doug Smythies wrote:
>>>
>> My test computer has an older model i7 (Intel(R) Core(TM) i7-2600K CPU @ 
>> 3.40GHz)
 Thanks Doug. If you have specific workloads, please compare performance.
>>>
 My work so far has been testing functionality, with unrealistic workloads 
 specifically
 designed to exaggerate issues, in this case the duration problem.

 I'll look at some real world workload scenarios.
>>>
>>> Turbostat used for package power, starts before Phoronix tests starts,
>>> and ends after Phoronix test ends.
>>>
>>> Control Sample: Kernel 4.5-rc3:
>>> Phoronix ffmpeg: turbostat 180 Sec. 12.07 Sec. Ave. 27.14 Watts.
>>> Phoronix apache: turbostat 200 Sec. 19797.0 R.P.S. Ave. 34.01 Watts.
>>> Phoronix kernel: turbostat 180 Sec. 139.93 Sec. 49.09 Watts.
>>> Phoronix Postmark (Disk Test): turbostat 200 Sec. 5813 T.P.S. Ave. 21.33 
>>> Watts.
>>>
>>> Kernel 4.5-rc3 + RJW 3 patch set version 7:
>>> Phoronix ffmpeg: turbostat 180 Sec. 11.67 Sec. Ave. 27.35 Watts.
>>> Phoronix apache: turbostat 200 Sec. 19430.7 R.P.S. Ave. 34.18 Watts.
>>> Phoronix kernel: turbostat 180 Sec. 139.81 Sec. 48.80 Watts.
>>> Phoronix Postmark (Disk Test): turbostat 200 Sec. 5683 T.P.S. Ave. 22.41 
>>> Watts.
>
>> Thanks for the results!
>>
>> The Postmark result is somewhat below expectations (especially with
>> respect to the energy consumption), but we should be able to improve
>> that by using the util numbers intelligently.
>>
>> Do you have full turbostat reports from those runs by any chance?  I'm
>> wondering what happens to the idle state residencies, for example.
>
> I did not keep the turbostat output, however it is easy enough to
> re-do the tests. I'll send you the stuff off-list, and copy
> Srinivas.

Thanks!

> By the way, there is an anomaly in my 2 hour idle data (v7), where
> CPU 7 should have had sample passes through the intel_pstate driver.
> It did not, rather hitting the 4 second time limit instead.

That most likely means that we had not scheduled anything on that CPU
for that time.  Not entirely unlikely if the system was generally
mostly idle.

The CPU activity you observed might be related to interrupts in which
case we wouldn't receive updates from the scheduler.

> 10 occurrences in 7200 seconds. I sent you an off-list html format
> e-mail with more details. There may be other anomalies I didn't
> find yet.

Well, I guess we'll see.

Thanks,
Rafael

RE: [lustre-devel] [PATCH 06/11] staging: lustre: add missing spaces for LNet layer reported by checkpatch.pl

2016-02-12 Thread Simmons, James A.

>On Fri, 2016-02-12 at 12:06 -0500, James Simmons wrote:
>> Add missing spaces in the code reported by checkpatch.pl.
>[]
>> diff --git a/drivers/staging/lustre/include/linux/lnet/lib-types.h 
>> b/drivers/staging/lustre/include/linux/lnet/lib-types.h
>[]
>> @@ -112,7 +112,7 @@ typedef struct lnet_libhandle {
>>  } lnet_libhandle_t;
>>  
>>  #define lh_entry(ptr, type, member) \
>> -((type *)((char *)(ptr)-(char *)(&((type *)0)->member)))
>> +((type *)((char *)(ptr) - (char *)(&((type *)0)->member)))
>
>This could use offsetof(type, member)

Will send a later patch to cover this.

Re: [lxc-devel] CGroup Namespaces (v10)

2016-02-12 Thread Serge E. Hallyn

On Fri, Feb 12, 2016 at 11:09:06AM -0500, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 12, 2016 at 12:18:28AM +0100, Alban Crequy wrote:
> > I just noticed commit c38c4597e4bf ("netfilter: implement xt_cgroup
> > cgroup2 path match") which, as far as I understand, introduces a new
> > userland facing API containing the full cgroup path. Does it mean that
> > the cgroupns patchset should include cgroup path translation in
> > xt_cgroup?
> 
> I don't think so.  None of netfilter configuration is namespaced in
> any way.  They're system-global by nature.

I assume at some point you'll want the set ported onto for-4.6 or
linux-next?  My 2016-02-03/cgns set still cherrypick cleanly onto
for-4.6 at the moment, but I haven't tried linux-next, and I haven't
done build+test since 4.5-rc1 came out.

thanks,
-serge

[PATCH V2] AHCI: Workaround for ThunderX Errata#22536

2016-02-12 Thread tchalamarla

From: Tirumalesh Chalamarla 

Due to Errata in ThunderX, HOST_IRQ_STAT should be
cleared before leaving the interrupt handler.
The patch attempts to satisfy the need.

Changes from V1:
- Rebased on top of libata/for-4.6
- Moved ThunderX intr handler to new file

Signed-off-by: Tirumalesh Chalamarla 
---
 drivers/ata/Makefile|  2 +-
 drivers/ata/ahci.c  |  3 ++
 drivers/ata/ahci.h  |  1 +
 drivers/ata/ahci_thunderx.c | 73 +
 4 files changed, 78 insertions(+), 1 deletion(-)
 create mode 100644 drivers/ata/ahci_thunderx.c

diff --git a/drivers/ata/Makefile b/drivers/ata/Makefile
index 1857952..a36e70d 100644
--- a/drivers/ata/Makefile
+++ b/drivers/ata/Makefile
@@ -2,7 +2,7 @@
 obj-$(CONFIG_ATA)  += libata.o
 
 # non-SFF interface
-obj-$(CONFIG_SATA_AHCI)+= ahci.o libahci.o
+obj-$(CONFIG_SATA_AHCI)+= ahci.o libahci.o ahci_thunderx.o
 obj-$(CONFIG_SATA_ACARD_AHCI)  += acard-ahci.o libahci.o
 obj-$(CONFIG_SATA_AHCI_PLATFORM) += ahci_platform.o libahci.o 
libahci_platform.o
 obj-$(CONFIG_SATA_FSL) += sata_fsl.o
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 546a369..76e310e 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -1560,6 +1560,9 @@ static int ahci_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
if (ahci_broken_devslp(pdev))
hpriv->flags |= AHCI_HFLAG_NO_DEVSLP;
 
+   if (pdev->vendor == 0x177d && pdev->device == 0xa01c)
+   ahci_thunderx_init(>dev, hpriv);
+
/* save initial config */
ahci_pci_save_initial_config(pdev, hpriv);
 
diff --git a/drivers/ata/ahci.h b/drivers/ata/ahci.h
index 167ba7e..77ae20d 100644
--- a/drivers/ata/ahci.h
+++ b/drivers/ata/ahci.h
@@ -425,6 +425,7 @@ void ahci_print_info(struct ata_host *host, const char 
*scc_s);
 int ahci_host_activate(struct ata_host *host, struct scsi_host_template *sht);
 void ahci_error_handler(struct ata_port *ap);
 u32 ahci_handle_port_intr(struct ata_host *host, u32 irq_masked);
+void ahci_thunderx_init(struct device *dev, struct ahci_host_priv *hpriv);
 
 static inline void __iomem *__ahci_port_base(struct ata_host *host,
 unsigned int port_no)
diff --git a/drivers/ata/ahci_thunderx.c b/drivers/ata/ahci_thunderx.c
new file mode 100644
index 000..223e170
--- /dev/null
+++ b/drivers/ata/ahci_thunderx.c
@@ -0,0 +1,73 @@
+/*
+ * SATA glue for Cavium Thunder SOCs.
+ *
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License.  See the file "COPYING" in the main directory of this archive
+ * for more details.
+ *
+ * Copyright (C) 2010-2016 Cavium Networks
+ *
+ */
+
+#include 
+#include "ahci.h"
+#include "libata.h"
+
+static irqreturn_t ahci_thunderx_irq_intr(int irq, void *dev_instance)
+{
+   struct ata_host *host = dev_instance;
+   struct ahci_host_priv *hpriv;
+   unsigned int rc = 0;
+   void __iomem *mmio;
+   u32 irq_stat, irq_masked;
+   unsigned int handled = 1;
+
+   VPRINTK("ENTER\n");
+
+   hpriv = host->private_data;
+   mmio = hpriv->mmio;
+
+   /* sigh.  0x is a valid return from h/w */
+   irq_stat = readl(mmio + HOST_IRQ_STAT);
+   if (!irq_stat)
+   return IRQ_NONE;
+redo:
+
+   irq_masked = irq_stat & hpriv->port_map;
+
+   spin_lock(>lock);
+
+   rc = ahci_handle_port_intr(host, irq_masked);
+
+   if (!rc)
+   handled = 0;
+
+   writel(irq_stat, mmio + HOST_IRQ_STAT);
+
+   /* Due to ERRATA#22536, ThunderX need to handle
+* HOST_IRQ_STAT differently.
+* Work around is to make sure all pending IRQs
+* are served before leaving handler
+*/
+   irq_stat = readl(mmio + HOST_IRQ_STAT);
+
+   spin_unlock(>lock);
+
+   if (irq_stat)
+   goto redo;
+
+   VPRINTK("EXIT\n");
+
+   return IRQ_RETVAL(handled);
+}
+
+void ahci_thunderx_init(struct device *dev, struct ahci_host_priv *hpriv)
+{
+   hpriv->irq_handler = ahci_thunderx_irq_intr;
+}
+EXPORT_SYMBOL_GPL(ahci_thunderx_init);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Cavium, Inc. ");
+MODULE_DESCRIPTION("Cavium Inc. ThunderX sata config.");
-- 
2.1.0

[PATCH] media: au0828 set ctrl_input in au0828_s_input()

2016-02-12 Thread Shuah Khan

dev->ctrl_input is set in vidioc_s_input() and
doesn't get set in au0828_s_input(). As a result,
dev->ctrl_input is left uninitialized until user
space calls s_input. It works correctly because
the default input value is 0 and which is what
dev->ctrl_input gets initialized via kzalloc().

Change to set dev->ctrl_input in au0828_s_input().
Also optimize vidioc_s_input() to return if the
new input value is same as the current.

Signed-off-by: Shuah Khan 
---
 drivers/media/usb/au0828/au0828-video.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/media/usb/au0828/au0828-video.c 
b/drivers/media/usb/au0828/au0828-video.c
index 9304f96..20696a4 100644
--- a/drivers/media/usb/au0828/au0828-video.c
+++ b/drivers/media/usb/au0828/au0828-video.c
@@ -1345,9 +1345,11 @@ static void au0828_s_input(struct au0828_dev *dev, int 
index)
default:
dprintk(1, "unknown input type set [%d]\n",
AUVI_INPUT(index).type);
-   break;
+   return;
}
 
+   dev->ctrl_input = index;
+
v4l2_device_call_all(>v4l2_dev, 0, video, s_routing,
AUVI_INPUT(index).vmux, 0, 0);
 
@@ -1386,7 +1388,10 @@ static int vidioc_s_input(struct file *file, void *priv, 
unsigned int index)
return -EINVAL;
if (AUVI_INPUT(index).type == 0)
return -EINVAL;
-   dev->ctrl_input = index;
+
+   if (dev->ctrl_input == index)
+   return 0;
+
au0828_s_input(dev, index);
return 0;
 }
@@ -1901,6 +1906,7 @@ int au0828_analog_register(struct au0828_dev *dev,
dev->ctrl_ainput = 0;
dev->ctrl_freq = 960;
dev->std = V4L2_STD_NTSC_M;
+   /* Default input is TV Tuner */
au0828_s_input(dev, 0);
 
mutex_init(>vb_queue_lock);
-- 
2.5.0

Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

2016-02-12 Thread Rafael J. Wysocki

On Fri, Feb 12, 2016 at 6:02 PM, Doug Smythies  wrote:
> On 2016.02.12 08:01 Rafael J. Wysocki wrote:
>> On Fri, Feb 12, 2016 at 3:10 PM, Peter Zijlstra  wrote:
>>> On Thu, Feb 11, 2016 at 10:52:20AM -0800, Steve Muckle wrote:
 On 02/11/2016 09:30 AM, Peter Zijlstra wrote:
>> My concern above is that pokes are guaranteed to keep occurring when
>> there is only RT or DL activity so nothing breaks.
>
> The hook in their respective tick handler should ensure stuff is called
> sporadically and isn't stalled.

 But that's only true if the RT/DL tasks happen to be running when the
 tick arrives right?

 Couldn't we have RT/DL activity which doesn't overlap with the tick? And
 if no CFS tasks happen to be executing on that CPU, we'll never trigger
 the cpufreq update. This could go on for an arbitrarily long time
 depending on the periodicity of the work.
>>>
>>> Possible yes, but why do we care? Such a CPU would be so much idle that
>>> cpufreq doesn't matter one way or another, right?
>
>> Well, in theory you can get 50% or so of the time active in bursts
>> that happen to fit between ticks.  If we happen to do those in the
>> lowest P-state, we may burn more energy than necessary on platforms
>> where more idle is preferred.
>
> I believe this happens considerably more often than is commonly thought,
> and is the exact reason I was opposed to the introduction of the
> "duration" method into the intel_pstate driver in the first
> place. The probability of occurrence (of a relatively busy CPU being idle
> on jiffy boundaries) is very use dependant, occurring more on desktops than
> servers, and sometime more with video frame rate based tasks. Data to support
> my claim is a couple of years old and not very complete, but I see the issue
> often on trace data acquired from desktop users on bugzilla reports.

The approach with update callbacks from the scheduler should not be
affected by this, because it takes updates not only at the tick time,
but also on other scheduler events.

Thanks,
Rafael

Re: [BUG] random kernel crashes after THP rework on s390 (maybe also on PowerPC and ARM)

2016-02-12 Thread Kirill A. Shutemov

On Fri, Feb 12, 2016 at 06:16:40PM +0100, Gerald Schaefer wrote:
> On Fri, 12 Feb 2016 16:57:27 +0100
> Christian Borntraeger  wrote:
> 
> > On 02/12/2016 04:41 PM, Kirill A. Shutemov wrote:
> > > On Thu, Feb 11, 2016 at 08:57:02PM +0100, Gerald Schaefer wrote:
> > >> On Thu, 11 Feb 2016 21:09:42 +0200
> > >> "Kirill A. Shutemov"  wrote:
> > >>
> > >>> On Thu, Feb 11, 2016 at 07:22:23PM +0100, Gerald Schaefer wrote:
> >  Hi,
> > 
> >  Sebastian Ott reported random kernel crashes beginning with v4.5-rc1 
> >  and
> >  he also bisected this to commit 61f5d698 "mm: re-enable THP". Further
> >  review of the THP rework patches, which cannot be bisected, revealed
> >  commit fecffad "s390, thp: remove infrastructure for handling 
> >  splitting PMDs"
> >  (and also similar commits for other archs).
> > 
> >  This commit removes the THP splitting bit and also the architecture
> >  implementation of pmdp_splitting_flush(), which took care of the IPI 
> >  for
> >  fast_gup serialization. The commit message says
> > 
> >  pmdp_splitting_flush() is not needed too: on splitting PMD we will 
> >  do
> >  pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI 
> >  as
> >  needed for fast_gup
> > 
> >  The assumption that a TLB flush will also produce an IPI is wrong on 
> >  s390,
> >  and maybe also on other architectures, and I thought that this was 
> >  actually
> >  the main reason for having an arch-specific pmdp_splitting_flush().
> > 
> >  At least PowerPC and ARM also had an individual implementation of
> >  pmdp_splitting_flush() that used kick_all_cpus_sync() instead of a TLB
> >  flush to send the IPI, and those were also removed. Putting the arch
> >  maintainers and mailing lists on cc to verify.
> > 
> >  On s390 this will break the IPI serialization against fast_gup, which
> >  would certainly explain the random kernel crashes, please revert or fix
> >  the pmdp_splitting_flush() removal.
> > >>>
> > >>> Sorry for that.
> > >>>
> > >>> I believe, the problem was already addressed for PowerPC:
> > >>>
> > >>> http://lkml.kernel.org/g/454980831-16631-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com
> > >>>
> > >>> I think kick_all_cpus_sync() in arch-specific pmdp_invalidate() would do
> > >>> the trick, right?
> > >>
> > >> Hmm, not sure about that. After pmdp_invalidate(), a pmd_none() check in
> > >> fast_gup will still return false, because the pmd is not empty (at least
> > >> on s390). So I don't see spontaneously how it will help fast_gup to break
> > >> out to the slow path in case of THP splitting.
> > > 
> > > What pmdp_flush_direct() does in pmdp_invalidate()? It's hard to unwrap 
> > > for me :-/
> > > Does it make the pmd !pmd_present()?
> > 
> > It uses the idte instruction, which in an atomic fashion flushes the 
> > associated
> > TLB entry and changes the value of the pmd entry to invalid. This comes 
> > from the
> > HW requirement to not  change a PTE/PMD that might be still in use, other 
> > than 
> > with special instructions that does the tlb handling and the invalidation 
> > together.
> 
> Correct, and it does _not_ make the pmd !pmd_present(), that would only be the
> case after a _clear_flush(). It only marks the pmd as invalid and flushes,
> so that it cannot generate a new TLB entry before the following 
> pmd_populate(),
> but it keeps its other content. This is to fulfill the requirements outlined 
> in
> the comment in mm/huge_memory.c before the call to pmdp_invalidate(). And
> independent from that comment, we would need such an _invalidate() or
> _clear_flush() on s390 before the pmd_populate() because of the HW details
> that Christian described.
> 
> Reading the comment again, I do now notice that it also says "mark the current
> pmd notpresent", which we cannot do w/o losing the huge and (formerly) 
> splitting
> bits, but it also shouldn't be needed to provide the "single TLB guarantee" 
> that
> is required from the comment. So, a pmd_present() check on s390 in this state
> would still return true. Not sure yet if this is a problem, need more 
> thinking,
> this behavior was already present before the THP rework but maybe it was OK
> before and is not OK now.
> 
> At least for fast_gup this should not be a problem though.

I'm trying to wrap my head around the issue and I don't think missing
serialization with gup_fast is the cause -- we just don't need it
anymore.

Previously, __split_huge_page_splitting() required serialization against
gup_fast to make sure nobody can obtain new reference to the page after
__split_huge_page_splitting() returns. This was a way to stabilize page
references before starting to distribute them from head page to tail
pages.

With new refcounting, we don't care about this. Splitting PMD is now
decoupled from splitting underlying compound page. It's okay to get new
pins

[PATCH] lib/ucs2_string: Correct ucs2 -> utf8 conversion

2016-02-12 Thread Jason Andryuk

The comparisons should be >= since 0x800 and 0x80 require an additional bit
to store.

For the 3 byte case, the existing shift would drop off 2 more bits than
intended.

For the 2 byte case, there should be 5 bits bits in byte 1, and 6 bits in
byte 2.

Signed-off-by: Jason Andryuk 
---

Tested in user space, but not in the kernel.  Conversions now match
python's unicode conversions.

 lib/ucs2_string.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/ucs2_string.c b/lib/ucs2_string.c
index 17dd74e..f0b323a 100644
--- a/lib/ucs2_string.c
+++ b/lib/ucs2_string.c
@@ -59,9 +59,9 @@ ucs2_utf8size(const ucs2_char_t *src)
for (i = 0; i < ucs2_strlen(src); i++) {
u16 c = src[i];
 
-   if (c > 0x800)
+   if (c >= 0x800)
j += 3;
-   else if (c > 0x80)
+   else if (c >= 0x80)
j += 2;
else
j += 1;
@@ -88,19 +88,19 @@ ucs2_as_utf8(u8 *dest, const ucs2_char_t *src, unsigned 
long maxlength)
for (i = 0; maxlength && i < limit; i++) {
u16 c = src[i];
 
-   if (c > 0x800) {
+   if (c >= 0x800) {
if (maxlength < 3)
break;
maxlength -= 3;
dest[j++] = 0xe0 | (c & 0xf000) >> 12;
-   dest[j++] = 0x80 | (c & 0x0fc0) >> 8;
+   dest[j++] = 0x80 | (c & 0x0fc0) >> 6;
dest[j++] = 0x80 | (c & 0x003f);
-   } else if (c > 0x80) {
+   } else if (c >= 0x80) {
if (maxlength < 2)
break;
maxlength -= 2;
-   dest[j++] = 0xc0 | (c & 0xfe0) >> 5;
-   dest[j++] = 0x80 | (c & 0x01f);
+   dest[j++] = 0xc0 | (c & 0x7c0) >> 6;
+   dest[j++] = 0x80 | (c & 0x03f);
} else {
maxlength -= 1;
dest[j++] = c & 0x7f;
-- 
2.4.3

Re: [PATCH 0/3] cpufreq: Replace timers with utilization update callbacks

2016-02-12 Thread Rafael J. Wysocki

On Fri, Feb 12, 2016 at 5:53 PM, Ashwin Chaugule
 wrote:
> On 12 February 2016 at 11:15, Rafael J. Wysocki  wrote:
>> On Fri, Feb 12, 2016 at 5:01 PM, Rafael J. Wysocki  wrote:
>>> On Fri, Feb 12, 2016 at 3:10 PM, Peter Zijlstra  
>>> wrote:
 On Thu, Feb 11, 2016 at 10:52:20AM -0800, Steve Muckle wrote:
> On 02/11/2016 09:30 AM, Peter Zijlstra wrote:
> >> My concern above is that pokes are guaranteed to keep occurring when
> >> > there is only RT or DL activity so nothing breaks.
> >
> > The hook in their respective tick handler should ensure stuff is called
> > sporadically and isn't stalled.
>
> But that's only true if the RT/DL tasks happen to be running when the
> tick arrives right?
>
> Couldn't we have RT/DL activity which doesn't overlap with the tick? And
> if no CFS tasks happen to be executing on that CPU, we'll never trigger
> the cpufreq update. This could go on for an arbitrarily long time
> depending on the periodicity of the work.

 Possible yes, but why do we care? Such a CPU would be so much idle that
 cpufreq doesn't matter one way or another, right?
>>>
>>> Well, in theory you can get 50% or so of the time active in bursts
>>> that happen to fit between ticks.  If we happen to do those in the
>>> lowest P-state, we may burn more energy than necessary on platforms
>>> where more idle is preferred.
>>
>> At least intel_pstate should be able to figure out which P-state to
>> use then on the APERF/MPERF basis.
>
> Speaking for the generic case, it would be great to make use of such
> feedback counters for selecting the next freq request. Use (num of
> cycles used/total cycles) to figure out %ON time for the CPU. I
> understand its not the goal for this patch series, but in the future
> if we can do this in your callbacks where possible, then I think we
> will do better than Ondemand.

Yes, we can do that at least in principle.  intel_pstate is a proof of that.

Thanks,
Rafael

[PATCH] sched/deadline: Always calculate end of period on sched_yield()

2016-02-12 Thread Steven Rostedt

I'm writing a test case for SCHED_DEADLINE, and notice a strange
anomaly. Every so often, a deadline is missed and when I looked into
it, it happened because the sched_yield() had no effect (it didn't end
the previous period and let the start of the next runtime happen on the
end of the old period).

deadline-22287...1   116.778420: sys_enter_sched_yield: 
deadline-22287d..3   116.778421: hrtimer_cancel:   
hrtimer=0x88011ebd79a0
deadline-22287d..2   116.778422: rcu_utilization:  Start context switch
deadline-22287d..2   116.778423: rcu_utilization:  End context switch
deadline-22287d..4   116.778423: hrtimer_start:
hrtimer=0x88011ebd79a0 function=hrtick/0x0 expires=116124420428 
softexpires=116124420428
deadline-22287...1   116.778425: sys_exit_sched_yield: 0x0


Schedule was never called. A added some trace_printks() and discovered
that this happens when sched_yield() is called right after a tick that
updates its current bandwidth.

When the schedule tick happens that updates the current bandwidth,
update_curr_dl() is called, where it updates curr->se.exec_start to
rq_clock_task(rq).

The rq_clock_task(rq) gets updated by update_rq_clock_task() that gets
update by various points in the scheduler.

Now, if the user task calls sched_yield() just after a bandwidth update
synced curr->se.exec_start to rq_clock_task(rq), when sched_yield()
calls into update_curr_dl() we have:

delta_exec = rq_clock_task(rq) - curr->se.exec_start;
if (unlikely((s64)delta_exec <= 0))
return;

Coming in here from a sched_yield() will have delta_exec == 0 if the
sched_yield() was called after a DL tick and before another
update_rq_clock_task() is called.

This means that the task will not release its remaining runtime, and
the will start off in the current period when it expected to be in the
next period.

The fix that appears to work for me is to add a test in
update_curr_dl() to not exit if delta_exec is zero and
dl_se->dl_yielded is true.

Signed-off-by: Steven Rostedt 
---
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index cd64c979d0e1..1dd180cda574 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
 * approach need further study.
 */
delta_exec = rq_clock_task(rq) - curr->se.exec_start;
-   if (unlikely((s64)delta_exec <= 0))
+   if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
return;
 
schedstat_set(curr->se.statistics.exec_max,

Re: [PATCH v2 3/9] ACPI: introduce acpi_table_parse2()

2016-02-12 Thread Rafael J. Wysocki

On Fri, Feb 12, 2016 at 7:51 PM, Greg Kroah-Hartman
 wrote:
> On Fri, Feb 12, 2016 at 08:43:34PM +0300, Aleksey Makarov wrote:
>> The function acpi_table_parse() has some problems:
>> 1 It can be called only from __init code
>> 2 It does not pass any data to the handler
>> 3 It just throws out the value returned from the handler
>>
>> These issues are addressed in this patch
>
> Why not just fix acpi_table_parse(), like you have, and not add a new
> API call with a "2" at the end of it.  That seems crazy to try to
> maintain that level of apis.
>
> But I'm not the acpi maintainer(s), so it's their call...

The ACPI maintainer agrees.

Thanks,
Rafael

Re: [PATCH v2 3/9] ACPI: introduce acpi_table_parse2()

2016-02-12 Thread Rafael J. Wysocki

On Fri, Feb 12, 2016 at 6:43 PM, Aleksey Makarov
 wrote:
> The function acpi_table_parse() has some problems:
> 1 It can be called only from __init code
> 2 It does not pass any data to the handler
> 3 It just throws out the value returned from the handler

So why are those problems?

> These issues are addressed in this patch

How are they addressed?

Thanks,
Rafael

Applied "regulator: ltc3589: Make IRQ optional" to the regulator tree

2016-02-12 Thread Mark Brown

The patch

   regulator: ltc3589: Make IRQ optional

has been applied to the regulator tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From d4930cf0ae33e944427f974f33bc43b8e6c56456 Mon Sep 17 00:00:00 2001
From: Bernhard Walle 
Date: Wed, 10 Feb 2016 21:37:30 +0100
Subject: [PATCH] regulator: ltc3589: Make IRQ optional

It's perfectly valid to use the LTC3589 without an interrupt pin
connected to it. Currently, the driver probing fails when client->irq
is 0 (which means "no interrupt"). Don't register the interrupt
handler in that case but successfully finish the device probing instead.

Signed-off-by: Bernhard Walle 
Signed-off-by: Mark Brown 
---
 drivers/regulator/ltc3589.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/regulator/ltc3589.c b/drivers/regulator/ltc3589.c
index 972c386b2690..47bef328fb58 100644
--- a/drivers/regulator/ltc3589.c
+++ b/drivers/regulator/ltc3589.c
@@ -520,12 +520,15 @@ static int ltc3589_probe(struct i2c_client *client,
}
}
 
-   ret = devm_request_threaded_irq(dev, client->irq, NULL, ltc3589_isr,
-   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
-   client->name, ltc3589);
-   if (ret) {
-   dev_err(dev, "Failed to request IRQ: %d\n", ret);
-   return ret;
+   if (client->irq) {
+   ret = devm_request_threaded_irq(dev, client->irq, NULL,
+   ltc3589_isr,
+   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
+   client->name, ltc3589);
+   if (ret) {
+   dev_err(dev, "Failed to request IRQ: %d\n", ret);
+   return ret;
+   }
}
 
return 0;
-- 
2.7.0

[PATCH RT 2/6] kernel: migrate_disable() do fastpath in atomic & irqs-off

2016-02-12 Thread Sebastian Andrzej Siewior

With interrupts off it makes no sense to do the long path since we can't
leave the CPU anyway. Also we might end up in a recursion with lockdep.

Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/sched/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1217926b500d..40c1e29416c0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3157,7 +3157,7 @@ void migrate_disable(void)
 {
struct task_struct *p = current;
 
-   if (in_atomic()) {
+   if (in_atomic() || irqs_disabled()) {
 #ifdef CONFIG_SCHED_DEBUG
p->migrate_disable_atomic++;
 #endif
@@ -3188,7 +3188,7 @@ void migrate_enable(void)
 {
struct task_struct *p = current;
 
-   if (in_atomic()) {
+   if (in_atomic() || irqs_disabled()) {
 #ifdef CONFIG_SCHED_DEBUG
p->migrate_disable_atomic--;
 #endif
-- 
2.7.0

[PATCH RT 1/6] kernel: softirq: unlock with irqs on

2016-02-12 Thread Sebastian Andrzej Siewior

We unlock the lock while the interrupts are off. This isn't a problem
now but will get because the migrate_disable() + enable are not
symmetrical in regard to the status of interrupts.

Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/softirq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index d1e999e74d23..2ca63cc1469e 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -563,8 +563,10 @@ static void do_current_softirqs(void)
do_single_softirq(i);
}
softirq_clr_runner(i);
-   unlock_softirq(i);
WARN_ON(current->softirq_nestcnt != 1);
+   local_irq_enable();
+   unlock_softirq(i);
+   local_irq_disable();
}
 }
 
-- 
2.7.0

[PATCH RT 6/6] rcu: disable more spots of rcu_bh

2016-02-12 Thread Sebastian Andrzej Siewior

We don't use ru_bh on -RT but we still fork a thread for it and keep it
as a flavour. No more.

Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/rcu/tree.c | 6 ++
 kernel/rcu/tree.h | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5359091fecaa..64098d35de19 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -454,11 +454,13 @@ EXPORT_SYMBOL_GPL(rcu_batches_started_sched);
 /*
  * Return the number of RCU BH batches started thus far for debug & stats.
  */
+#ifndef CONFIG_PREEMPT_RT_FULL
 unsigned long rcu_batches_started_bh(void)
 {
return rcu_bh_state.gpnum;
 }
 EXPORT_SYMBOL_GPL(rcu_batches_started_bh);
+#endif
 
 /*
  * Return the number of RCU batches completed thus far for debug & stats.
@@ -563,9 +565,11 @@ void rcutorture_get_gp_data(enum rcutorture_type 
test_type, int *flags,
case RCU_FLAVOR:
rsp = rcu_state_p;
break;
+#ifndef CONFIG_PREEMPT_RT_FULL
case RCU_BH_FLAVOR:
rsp = _bh_state;
break;
+#endif
case RCU_SCHED_FLAVOR:
rsp = _sched_state;
break;
@@ -4695,7 +4699,9 @@ void __init rcu_init(void)
 
rcu_bootup_announce();
rcu_init_geometry();
+#ifndef CONFIG_PREEMPT_RT_FULL
rcu_init_one(_bh_state, _bh_data);
+#endif
rcu_init_one(_sched_state, _sched_data);
if (dump_tree)
rcu_dump_rcu_node_tree(_sched_state);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 588509d94bbd..2ba8f6c2e81e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -557,7 +557,9 @@ extern struct list_head rcu_struct_flavors;
  */
 extern struct rcu_state rcu_sched_state;
 
+#ifndef CONFIG_PREEMPT_RT_FULL
 extern struct rcu_state rcu_bh_state;
+#endif
 
 #ifdef CONFIG_PREEMPT_RCU
 extern struct rcu_state rcu_preempt_state;
-- 
2.7.0

[PATCH RT 5/6] kernel/stop_machine: partly revert "stop_machine: Use raw spinlocks"

2016-02-12 Thread Sebastian Andrzej Siewior

With completion using swait and so rawlocks we don't need this anymore.
Further, bisect thinks this patch is responsible for:

|BUG: unable to handle kernel NULL pointer dereference at   (null)
|IP: [] sched_cpu_active+0x53/0x70
|PGD 0
|Oops:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
|Dumping ftrace buffer:
|   (ftrace buffer empty)
|Modules linked in:
|CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.4.1+ #330
|Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 
04/01/2014
|task: 88013ae64b00 ti: 88013ae74000 task.ti: 88013ae74000
|RIP: 0010:[]  [] sched_cpu_active+0x53/0x70
|RSP: :88013ae77eb8  EFLAGS: 00010082
|RAX: 0001 RBX: 81c2cf20 RCX: 001050fb52fb
|RDX: 001050fb52fb RSI: 00105117ca1e RDI: 001c7723
|RBP:  R08:  R09: 0001
|R10:  R11: 0001 R12: 
|R13: 81c2cee0 R14:  R15: 0001
|FS:  () GS:88013b20() knlGS:
|CS:  0010 DS:  ES:  CR0: 8005003b
|CR2:  CR3: 01c09000 CR4: 06e0
|Stack:
| 810c446d 88013ae77f00 8107d8dd 000a
| 0001   
|  88013ae77f10 8107d90e 88013ae77f20
|Call Trace:
| [] ? debug_lockdep_rcu_enabled+0x1d/0x20
| [] ? notifier_call_chain+0x5d/0x80
| [] ? __raw_notifier_call_chain+0xe/0x10
| [] ? cpu_notify+0x23/0x40
| [] ? notify_cpu_starting+0x28/0x30

during hotplug. The rawlocks need to remain however.

Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/stop_machine.c | 40 
 1 file changed, 8 insertions(+), 32 deletions(-)

diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 2c5acc882bad..f84d3b45cda7 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -30,7 +30,7 @@ struct cpu_stop_done {
atomic_tnr_todo;/* nr left to execute */
boolexecuted;   /* actually executed? */
int ret;/* collected return value */
-   struct task_struct  *waiter;/* woken when nr_todo reaches 0 
*/
+   struct completion   completion; /* fired if nr_todo reaches 0 */
 };
 
 /* the actual stopper, one per every possible cpu, enabled on online cpus */
@@ -59,7 +59,7 @@ static void cpu_stop_init_done(struct cpu_stop_done *done, 
unsigned int nr_todo)
 {
memset(done, 0, sizeof(*done));
atomic_set(>nr_todo, nr_todo);
-   done->waiter = current;
+   init_completion(>completion);
 }
 
 /* signal completion unless @done is NULL */
@@ -68,10 +68,8 @@ static void cpu_stop_signal_done(struct cpu_stop_done *done, 
bool executed)
if (done) {
if (executed)
done->executed = true;
-   if (atomic_dec_and_test(>nr_todo)) {
-   wake_up_process(done->waiter);
-   done->waiter = NULL;
-   }
+   if (atomic_dec_and_test(>nr_todo))
+   complete(>completion);
}
 }
 
@@ -96,22 +94,6 @@ static void cpu_stop_queue_work(unsigned int cpu, struct 
cpu_stop_work *work)
raw_spin_unlock_irqrestore(>lock, flags);
 }
 
-static void wait_for_stop_done(struct cpu_stop_done *done)
-{
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   while (atomic_read(>nr_todo)) {
-   schedule();
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   }
-   /*
-* We need to wait until cpu_stop_signal_done() has cleared
-* done->waiter.
-*/
-   while (done->waiter)
-   cpu_relax();
-   set_current_state(TASK_RUNNING);
-}
-
 /**
  * stop_one_cpu - stop a cpu
  * @cpu: cpu to stop
@@ -143,7 +125,7 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void 
*arg)
 
cpu_stop_init_done(, 1);
cpu_stop_queue_work(cpu, );
-   wait_for_stop_done();
+   wait_for_completion();
return done.executed ? done.ret : -ENOENT;
 }
 
@@ -302,7 +284,7 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, 
cpu_stop_fn_t fn, void *
 
preempt_enable_nort();
 
-   wait_for_stop_done();
+   wait_for_completion();
 
return done.executed ? done.ret : -ENOENT;
 }
@@ -364,7 +346,7 @@ static int __stop_cpus(const struct cpumask *cpumask,
 
cpu_stop_init_done(, cpumask_weight(cpumask));
queue_stop_cpus_work(cpumask, fn, arg, , false);
-   wait_for_stop_done();
+   wait_for_completion();
return done.executed ? done.ret : -ENOENT;
 }
 
@@ -495,13 +477,7 @@ static void cpu_stopper_thread(unsigned int cpu)
  kallsyms_lookup((unsigned long)fn, NULL, NULL, NULL,
  ksym_buf), arg);

[PATCH RT 3/6] rtmutex: push down migrate_disable() into rt_spin_lock()

2016-02-12 Thread Sebastian Andrzej Siewior

No point in having the migrate disable/enable invocations in all the
macro/inlines. That's just more code for no win as we do a function
call anyway. Move it to the core code and save quite some text size.

text  databssdecfilename
11034127   3676912   14901248   29612287  vmlinux.before
10990437   3676848   14901248   29568533  vmlinux.after

~-40KiB

Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/locallock.h   |  6 +++---
 include/linux/spinlock_rt.h | 25 +++-
 kernel/cpu.c|  4 ++--
 kernel/locking/lglock.c |  2 +-
 kernel/locking/rt.c |  2 --
 kernel/locking/rtmutex.c| 46 +
 6 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/include/linux/locallock.h b/include/linux/locallock.h
index 339ba00adb9a..6fe5928fc2ab 100644
--- a/include/linux/locallock.h
+++ b/include/linux/locallock.h
@@ -43,9 +43,9 @@ struct local_irq_lock {
  * for CONFIG_PREEMPT_BASE map to the normal spin_* calls.
  */
 #ifdef CONFIG_PREEMPT_RT_FULL
-# define spin_lock_local(lock) rt_spin_lock(lock)
-# define spin_trylock_local(lock)  rt_spin_trylock(lock)
-# define spin_unlock_local(lock)   rt_spin_unlock(lock)
+# define spin_lock_local(lock) rt_spin_lock__no_mg(lock)
+# define spin_trylock_local(lock)  rt_spin_trylock__no_mg(lock)
+# define spin_unlock_local(lock)   rt_spin_unlock__no_mg(lock)
 #else
 # define spin_lock_local(lock) spin_lock(lock)
 # define spin_trylock_local(lock)  spin_trylock(lock)
diff --git a/include/linux/spinlock_rt.h b/include/linux/spinlock_rt.h
index f757096b230c..3b2825537531 100644
--- a/include/linux/spinlock_rt.h
+++ b/include/linux/spinlock_rt.h
@@ -18,6 +18,10 @@ do { \
__rt_spin_lock_init(slock, #slock, &__key); \
 } while (0)
 
+void __lockfunc rt_spin_lock__no_mg(spinlock_t *lock);
+void __lockfunc rt_spin_unlock__no_mg(spinlock_t *lock);
+int __lockfunc rt_spin_trylock__no_mg(spinlock_t *lock);
+
 extern void __lockfunc rt_spin_lock(spinlock_t *lock);
 extern unsigned long __lockfunc rt_spin_lock_trace_flags(spinlock_t *lock);
 extern void __lockfunc rt_spin_lock_nested(spinlock_t *lock, int subclass);
@@ -32,20 +36,16 @@ extern int atomic_dec_and_spin_lock(atomic_t *atomic, 
spinlock_t *lock);
  * lockdep-less calls, for derived types like rwlock:
  * (for trylock they can use rt_mutex_trylock() directly.
  */
+extern void __lockfunc __rt_spin_lock__no_mg(struct rt_mutex *lock);
 extern void __lockfunc __rt_spin_lock(struct rt_mutex *lock);
 extern void __lockfunc __rt_spin_unlock(struct rt_mutex *lock);
 extern int __lockfunc __rt_spin_trylock(struct rt_mutex *lock);
 
-#define spin_lock(lock)\
-   do {\
-   migrate_disable();  \
-   rt_spin_lock(lock); \
-   } while (0)
+#define spin_lock(lock)rt_spin_lock(lock)
 
 #define spin_lock_bh(lock) \
do {\
local_bh_disable(); \
-   migrate_disable();  \
rt_spin_lock(lock); \
} while (0)
 
@@ -56,24 +56,19 @@ extern int __lockfunc __rt_spin_trylock(struct rt_mutex 
*lock);
 #define spin_trylock(lock) \
 ({ \
int __locked;   \
-   migrate_disable();  \
__locked = spin_do_trylock(lock);   \
-   if (!__locked)  \
-   migrate_enable();   \
__locked;   \
 })
 
 #ifdef CONFIG_LOCKDEP
 # define spin_lock_nested(lock, subclass)  \
do {\
-   migrate_disable();  \
rt_spin_lock_nested(lock, subclass);\
} while (0)
 
 #define spin_lock_bh_nested(lock, subclass)\
do {\
local_bh_disable(); \
-   migrate_disable();  \
rt_spin_lock_nested(lock, subclass);\
} while (0)
 
@@ -81,7 +76,6 @@ extern int __lockfunc __rt_spin_trylock(struct rt_mutex 
*lock);
do { \
typecheck(unsigned long, flags); \
flags = 0;   \
-   migrate_disable();   \
rt_spin_lock_nested(lock, subclass); \
} while (0)
 #else
@@ -117,16 +111,11 @@ static inline unsigned long 
spin_lock_trace_flags(spinlock_t *lock)
 /*

Re: [PATCH v4] regulator: qcom-saw: Add support for SAW regulators

2016-02-12 Thread Mark Brown

On Thu, Feb 11, 2016 at 04:17:55PM -0800, Stephen Boyd wrote:
> On 02/11, Georgi Djakov wrote:

> > 8064 uses SSBI instead of SPMI and we currently do not have any
> > existing regulator support upstream yet. So this driver is not
> > duplicating any existing regulator. We should decide whether to
> > keep this driver or to replace it with a new ssbi-regulator driver
> > and bindings instead, where we can avoid the split-bus fun at
> > least to some extent. Maybe the latter is the better option?

> Yes I think having an ssbi/spmi regulator driver may be a better
> approach. The SAW code can monitor the regulator for voltage
> changes with a notifier and then stick the restore voltage into
> the SAW registers. There's only one sticking point below.

So this is sounding like we want to drop this driver?

> modifies the SAW registers to set the voltage on the CPU that is
> using the regulator, thereby preventing the CPU from going idle
> or hitting suspend when the voltage is changed. If we were to use
> the SSBI/SPMI regulator driver we would need to do something
> similar so that the SPM is guaranteed to not be running during
> the voltage switch. So I guess schedule a work on the CPU that's
> affected by the voltage switch and hope that the CPU doesn't go
> offline during that time?

"Hope" doesn't sound like this is going to be a safe long term solution,
either something more integrated with the offline code that explicitly
blocks offlining (which I don't off the top of my head know if we can do
easily) or something where we start something on the CPU and then
explicitly handshake with it (but then what if the CPU has real work to
do?) seems better.


signature.asc
Description: PGP signature

[PATCH RT 4/6] rt/locking: Reenable migration accross schedule

2016-02-12 Thread Sebastian Andrzej Siewior

From: Thomas Gleixner 

We currently disable migration across lock acquisition. That includes the part
where we block on the lock and schedule out. We cannot disable migration after
taking the lock as that would cause a possible lock inversion.

But we can be smart and enable migration when we block and schedule out. That
allows the scheduler to place the task freely at least if this is the first
migrate disable level. For nested locking this does not help at all.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/locking/rtmutex.c | 32 
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 913aa40f3b5e..66971005cc12 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -924,14 +924,19 @@ static int __try_to_take_rt_mutex(struct rt_mutex *lock,
  * preemptible spin_lock functions:
  */
 static inline void rt_spin_lock_fastlock(struct rt_mutex *lock,
-void  (*slowfn)(struct rt_mutex *lock))
+void  (*slowfn)(struct rt_mutex *lock,
+bool mg_off),
+bool do_mig_dis)
 {
might_sleep_no_state_check();
 
+   if (do_mig_dis)
+   migrate_disable();
+
if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
rt_mutex_deadlock_account_lock(lock, current);
else
-   slowfn(lock);
+   slowfn(lock, do_mig_dis);
 }
 
 static inline void rt_spin_lock_fastunlock(struct rt_mutex *lock,
@@ -989,7 +994,8 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
  * We store the current state under p->pi_lock in p->saved_state and
  * the try_to_wake_up() code handles this accordingly.
  */
-static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
+static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock,
+   bool mg_off)
 {
struct task_struct *lock_owner, *self = current;
struct rt_mutex_waiter waiter, *top_waiter;
@@ -1033,8 +1039,13 @@ static void  noinline __sched 
rt_spin_lock_slowlock(struct rt_mutex *lock)
 
debug_rt_mutex_print_deadlock();
 
-   if (top_waiter !=  || adaptive_wait(lock, lock_owner))
+   if (top_waiter !=  || adaptive_wait(lock, lock_owner)) {
+   if (mg_off)
+   migrate_enable();
schedule();
+   if (mg_off)
+   migrate_disable();
+   }
 
raw_spin_lock_irqsave(>wait_lock, flags);
 
@@ -1105,38 +1116,35 @@ static void  noinline __sched 
rt_spin_lock_slowunlock(struct rt_mutex *lock)
 
 void __lockfunc rt_spin_lock__no_mg(spinlock_t *lock)
 {
-   rt_spin_lock_fastlock(>lock, rt_spin_lock_slowlock);
+   rt_spin_lock_fastlock(>lock, rt_spin_lock_slowlock, false);
spin_acquire(>dep_map, 0, 0, _RET_IP_);
 }
 EXPORT_SYMBOL(rt_spin_lock__no_mg);
 
 void __lockfunc rt_spin_lock(spinlock_t *lock)
 {
-   migrate_disable();
-   rt_spin_lock_fastlock(>lock, rt_spin_lock_slowlock);
+   rt_spin_lock_fastlock(>lock, rt_spin_lock_slowlock, true);
spin_acquire(>dep_map, 0, 0, _RET_IP_);
 }
 EXPORT_SYMBOL(rt_spin_lock);
 
 void __lockfunc __rt_spin_lock(struct rt_mutex *lock)
 {
-   migrate_disable();
-   rt_spin_lock_fastlock(lock, rt_spin_lock_slowlock);
+   rt_spin_lock_fastlock(lock, rt_spin_lock_slowlock, true);
 }
 EXPORT_SYMBOL(__rt_spin_lock);
 
 void __lockfunc __rt_spin_lock__no_mg(struct rt_mutex *lock)
 {
-   rt_spin_lock_fastlock(lock, rt_spin_lock_slowlock);
+   rt_spin_lock_fastlock(lock, rt_spin_lock_slowlock, false);
 }
 EXPORT_SYMBOL(__rt_spin_lock__no_mg);
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 void __lockfunc rt_spin_lock_nested(spinlock_t *lock, int subclass)
 {
-   migrate_disable();
-   rt_spin_lock_fastlock(>lock, rt_spin_lock_slowlock);
spin_acquire(>dep_map, subclass, 0, _RET_IP_);
+   rt_spin_lock_fastlock(>lock, rt_spin_lock_slowlock, true);
 }
 EXPORT_SYMBOL(rt_spin_lock_nested);
 #endif
-- 
2.7.0

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1470 matches

Mail list logo