date:20121221

Re: [PATCH 2/2] staging/media: Fix trailing statements should be on next line in go7007/go7007-fw.c

2012-12-21 Thread Joe Perches

On Fri, 2012-12-21 at 18:43 -0200, Mauro Carvalho Chehab wrote:
> Em Mon,  5 Nov 2012 20:39:33 +0900
> YAMANE Toshiaki  escreveu:
> 
> > fixed below checkpatch error.
> > - ERROR: trailing statements should be on next line
> > 
> > Signed-off-by: YAMANE Toshiaki 
> > ---
> >  drivers/staging/media/go7007/go7007-fw.c |3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/staging/media/go7007/go7007-fw.c 
> > b/drivers/staging/media/go7007/go7007-fw.c
> > index f99c05b..cfce760 100644
> > --- a/drivers/staging/media/go7007/go7007-fw.c
> > +++ b/drivers/staging/media/go7007/go7007-fw.c
> > @@ -725,7 +725,8 @@ static int vti_bitlen(struct go7007 *go)
> >  {
> > unsigned int i, max_time_incr = go->sensor_framerate / go->fps_scale;
> >  
> > -   for (i = 31; (max_time_incr & ((1 << i) - 1)) == max_time_incr; --i);
> > +   for (i = 31; (max_time_incr & ((1 << i) - 1)) == max_time_incr; --i)
> > +   ;
> 
> Nah, this doesn't sound right to me. IMO, in this specific case,
> checkpatch.pl did a bad job.

Is this even guaranteed to exit the loop?
Maybe using ffs would be more sensible.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2012-12-21 Thread Terje Bergström

On 21.12.2012 16:36, Thierry Reding wrote:
> On Fri, Dec 21, 2012 at 01:39:21PM +0200, Terje Bergstrom wrote:
>> +static struct platform_driver tegra_drm_platform_driver = {
>> +.driver = {
>> +.name = "tegradrm",
> 
> This should be "tegra-drm" to match the module name.

We've actually created two problems.

First is that the device name should match driver name which should
match module name. But host1x doesn't know the module name of tegradrm.

Second problem is that host1x driver creates tegradrm device even if
tegradrm isn't loaded to system.

These mean that the device has to be created in tegra-drm module to have
access to the module name. So instead of just getter, we need a getter
and a setter.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] arch/tile: Implement user_stack_pointer

2012-12-21 Thread Al Viro

On Sat, Dec 22, 2012 at 12:21:11AM -0500, Simon Marchi wrote:
> It is needed when we turn on HAVE_ARCH_TRACEHOOK.

... and if you check the mainline, you'll see it (and other missing
user_stack_pointer() instances) already in place, as part of infrastructure
for sigaltstack series.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] dma: intel_mid: Remove return statements at the end of all void functions

2012-12-21 Thread Axel Lin

Signed-off-by: Axel Lin 
---
 drivers/dma/intel_mid_dma.c |9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/drivers/dma/intel_mid_dma.c b/drivers/dma/intel_mid_dma.c
index bc764af..1a83d1d 100644
--- a/drivers/dma/intel_mid_dma.c
+++ b/drivers/dma/intel_mid_dma.c
@@ -133,7 +133,6 @@ static void dmac1_mask_periphral_intr(struct middma_device 
*mid)
pimr |= mid->pimr_mask;
writel(pimr, mid->mask_reg + LNW_PERIPHRAL_MASK);
}
-   return;
 }
 
 /**
@@ -154,7 +153,6 @@ static void dmac1_unmask_periphral_intr(struct 
intel_mid_dma_chan *midc)
pimr &= ~mid->pimr_mask;
writel(pimr, mid->mask_reg + LNW_PERIPHRAL_MASK);
}
-   return;
 }
 
 /**
@@ -172,7 +170,6 @@ static void enable_dma_interrupt(struct intel_mid_dma_chan 
*midc)
/*en ch interrupts*/
iowrite32(UNMASK_INTR_REG(midc->ch_id), midc->dma_base + MASK_TFR);
iowrite32(UNMASK_INTR_REG(midc->ch_id), midc->dma_base + MASK_ERR);
-   return;
 }
 
 /**
@@ -189,7 +186,6 @@ static void disable_dma_interrupt(struct intel_mid_dma_chan 
*midc)
iowrite32(MASK_INTR_REG(midc->ch_id), midc->dma_base + MASK_BLOCK);
iowrite32(MASK_INTR_REG(midc->ch_id), midc->dma_base + MASK_TFR);
iowrite32(MASK_INTR_REG(midc->ch_id), midc->dma_base + MASK_ERR);
-   return;
 }
 
 /*
@@ -341,8 +337,7 @@ static void midc_scan_descriptors(struct middma_device *mid,
if (desc->status == DMA_IN_PROGRESS)
midc_descriptor_complete(midc, desc);
}
-   return;
-   }
+}
 /**
  * midc_lli_fill_sg -  Helper function to convert
  * SG list to Linked List Items.
@@ -992,7 +987,6 @@ static void dma_tasklet(unsigned long data)
spin_unlock_bh(>lock);
}
pr_debug("MDMA:Exiting takslet...\n");
-   return;
 }
 
 static void dma_tasklet1(unsigned long data)
@@ -1214,7 +1208,6 @@ static void middma_shutdown(struct pci_dev *pdev)
if (device->dma_base)
iounmap(device->dma_base);
free_irq(pdev->irq, device);
-   return;
 }
 
 /**
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Michel Lespinasse

On Fri, Dec 21, 2012 at 3:51 PM, Rik van Riel  wrote:
> Subject: x86,smp: auto tune spinlock backoff delay factor
>
> Many spinlocks are embedded in data structures; having many CPUs
> pounce on the cache line the lock is in will slow down the lock
> holder, and can cause system performance to fall off a cliff.
>
> The paper "Non-scalable locks are dangerous" is a good reference:
>
> http://pdos.csail.mit.edu/papers/linux:lock.pdf
>
> In the Linux kernel, spinlocks are optimized for the case of
> there not being contention. After all, if there is contention,
> the data structure can be improved to reduce or eliminate
> lock contention.
>
> Likewise, the spinlock API should remain simple, and the
> common case of the lock not being contended should remain
> as fast as ever.
>
> However, since spinlock contention should be fairly uncommon,
> we can add functionality into the spinlock slow path that keeps
> system performance from falling off a cliff when there is lock
> contention.
>
> Proportional delay in ticket locks is delaying the time between
> checking the ticket based on a delay factor, and the number of
> CPUs ahead of us in the queue for this lock. Checking the lock
> less often allows the lock holder to continue running, resulting
> in better throughput and preventing performance from dropping
> off a cliff.
>
> Proportional spinlock delay with a high delay factor works well
> when there is lots contention on a lock. Likewise, a smaller
> delay factor works well when a lock is lightly contended.
>
> Making the code auto-tune the delay factor results in a system
> that performs well with both light and heavy lock contention.
>
> Signed-off-by: Rik van Riel 

So I like the idea a lot, and I had never seen the auto-tuning as you
propose it. Your implementation looks simple enough and doesn't slow
the uncontended case, so props for that.

However, I have a few concerns about the behavior of this, which I
think deserve more experimentation (I may try helping with it after
new years).

One thing you mentioned in 0/3 is that the best value varies depending
on the number of CPUs contending. This is somewhat surprising to me; I
would have guessed/hoped that the (inc.tail - inc.head) multiplicative
factor would account for that already. I wonder if we can somehow
adjust the code so that a same constant would work no matter how many
threads are contending for the lock (note that one single read to the
spinlock word gives us both the current number of waiters and our
position among them).

The other factor that might change your auto-tuned value is the amount
of time that each thread holds the spinlock. I wonder what would
happen if the constant was tuned for spinlocks that have a low hold
time, and then used on spinlocks that might have a higher hold time.
obviously this would result in accessing the spinlock word more often
than necessary, but it shouldn't be very bad since the accesses
wouldn't be any more frequent than in the low hold time case, where
throughput is good. So maybe this would work acceptably well.

What I'm getting at is that I would be more confident that the
autotune algorithm will work well in all cases if the value only
depended on the system parameters such as CPU type and frequency,
rather than per-spinlock parameters such as number of waiters and hold
time.

I feel this review is too high-level to be really helpful, so I'll
stop until I can find time to experiment :)

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] regulator: twl: Convert twl[6030|4030]fixed_ops to regulator_list_voltage_linear

2012-12-21 Thread Axel Lin

Signed-off-by: Axel Lin 
---
 drivers/regulator/twl-regulator.c |   15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/regulator/twl-regulator.c 
b/drivers/regulator/twl-regulator.c
index 493c8c6..87c55d0 100644
--- a/drivers/regulator/twl-regulator.c
+++ b/drivers/regulator/twl-regulator.c
@@ -616,18 +616,8 @@ static struct regulator_ops twl6030ldo_ops = {
 
 /*--*/
 
-/*
- * Fixed voltage LDOs don't have a VSEL field to update.
- */
-static int twlfixed_list_voltage(struct regulator_dev *rdev, unsigned index)
-{
-   struct twlreg_info  *info = rdev_get_drvdata(rdev);
-
-   return info->min_mV * 1000;
-}
-
 static struct regulator_ops twl4030fixed_ops = {
-   .list_voltage   = twlfixed_list_voltage,
+   .list_voltage   = regulator_list_voltage_linear,
 
.enable = twl4030reg_enable,
.disable= twl4030reg_disable,
@@ -639,7 +629,7 @@ static struct regulator_ops twl4030fixed_ops = {
 };
 
 static struct regulator_ops twl6030fixed_ops = {
-   .list_voltage   = twlfixed_list_voltage,
+   .list_voltage   = regulator_list_voltage_linear,
 
.enable = twl6030reg_enable,
.disable= twl6030reg_disable,
@@ -945,6 +935,7 @@ static const struct twlreg_info TWLFIXED_INFO_##label = { \
.ops = , \
.type = REGULATOR_VOLTAGE, \
.owner = THIS_MODULE, \
+   .min_uV = mVolts * 1000, \
.enable_time = turnon_delay, \
}, \
}
-- 
1.7.9.5



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nconf: add j, k and l keys for menu navigation

2012-12-21 Thread Dmitry Voytik

On Fri, Dec 21, 2012 at 12:23:41PM -0800, Stephen Boyd wrote:
> On 12/21/12 11:12, Dmitry Voytik wrote:
> > Add vi-style keys for menu navigation: press j/k for down/up navigation
> > and l for entering to a submenu. Unfortantely h key is reserved for
> > the items help.
> 
> Maybe you can just add j/k for up/down and then use enter and backspace
> for right/left?

IMHO alternative l key for entering to a submenu will not harm.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] arch/tile: Enable HAVE_ARCH_TRACEHOOK

2012-12-21 Thread Simon Marchi

Looks like we have everything needed for that.

Signed-off-by: Simon Marchi 
---
 arch/tile/Kconfig |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/tile/Kconfig b/arch/tile/Kconfig
index 875d008..8cab409 100644
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -21,6 +21,7 @@ config TILE
select ARCH_HAVE_NMI_SAFE_CMPXCHG
select GENERIC_CLOCKEVENTS
select MODULES_USE_ELF_RELA
+   select HAVE_ARCH_TRACEHOOK
 
 # FIXME: investigate whether we need/want these options.
 #  select HAVE_IOREMAP_PROT
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] arch/tile: Call tracehook_report_syscall_{entry,exit} in syscall trace

2012-12-21 Thread Simon Marchi

Call tracehook functions for syscall tracing.

The check for TIF_SYSCALL_TRACE was removed, because the same check is
done right before in the assembly file.

Signed-off-by: Simon Marchi 
---
 arch/tile/kernel/intvec_32.S |6 --
 arch/tile/kernel/intvec_64.S |6 --
 arch/tile/kernel/ptrace.c|   30 ++
 3 files changed, 18 insertions(+), 24 deletions(-)

diff --git a/arch/tile/kernel/intvec_32.S b/arch/tile/kernel/intvec_32.S
index 6943515..6c3f597 100644
--- a/arch/tile/kernel/intvec_32.S
+++ b/arch/tile/kernel/intvec_32.S
@@ -1201,7 +1201,8 @@ handle_syscall:
lw  r30, r31
andir30, r30, _TIF_SYSCALL_TRACE
bzt r30, .Lrestore_syscall_regs
-   jal do_syscall_trace
+   PTREGS_PTR(r0, PTREGS_OFFSET_BASE)
+   jal do_syscall_trace_enter
FEEDBACK_REENTER(handle_syscall)
 
/*
@@ -1252,7 +1253,8 @@ handle_syscall:
lw  r30, r31
andir30, r30, _TIF_SYSCALL_TRACE
bzt r30, 1f
-   jal do_syscall_trace
+   PTREGS_PTR(r0, PTREGS_OFFSET_BASE)
+   jal do_syscall_trace_exit
FEEDBACK_REENTER(handle_syscall)
 1: {
 movei  r30, 0   /* not an NMI */
diff --git a/arch/tile/kernel/intvec_64.S b/arch/tile/kernel/intvec_64.S
index 7c06d59..a717279 100644
--- a/arch/tile/kernel/intvec_64.S
+++ b/arch/tile/kernel/intvec_64.S
@@ -1006,7 +1006,8 @@ handle_syscall:
 addi   r30, r31, THREAD_INFO_STATUS_OFFSET - THREAD_INFO_FLAGS_OFFSET
 beqzt  r30, .Lrestore_syscall_regs
}
-   jal do_syscall_trace
+   PTREGS_PTR(r0, PTREGS_OFFSET_BASE)
+   jal do_syscall_trace_enter
FEEDBACK_REENTER(handle_syscall)
 
/*
@@ -1075,7 +1076,8 @@ handle_syscall:
 andir0, r30, _TIF_SINGLESTEP
 beqzt   r0, 1f
}
-   jal do_syscall_trace
+   PTREGS_PTR(r0, PTREGS_OFFSET_BASE)
+   jal do_syscall_trace_exit
FEEDBACK_REENTER(handle_syscall)
andir0, r30, _TIF_SINGLESTEP
 
diff --git a/arch/tile/kernel/ptrace.c b/arch/tile/kernel/ptrace.c
index 9835312..0ab8b76 100644
--- a/arch/tile/kernel/ptrace.c
+++ b/arch/tile/kernel/ptrace.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -246,29 +247,18 @@ long compat_arch_ptrace(struct task_struct *child, 
compat_long_t request,
 }
 #endif
 
-void do_syscall_trace(void)
+int do_syscall_trace_enter(struct pt_regs *regs)
 {
-   if (!test_thread_flag(TIF_SYSCALL_TRACE))
-   return;
-
-   if (!(current->ptrace & PT_PTRACED))
-   return;
+   if (tracehook_report_syscall_entry(regs)) {
+   regs->regs[TREG_SYSCALL_NR] = -1;
+   }
 
-   /*
-* The 0x80 provides a way for the tracing parent to distinguish
-* between a syscall stop and SIGTRAP delivery
-*/
-   ptrace_notify(SIGTRAP|((current->ptrace & PT_TRACESYSGOOD) ? 0x80 : 0));
+   return regs->regs[TREG_SYSCALL_NR];
+}
 
-   /*
-* this isn't the same as continuing with a signal, but it will do
-* for normal use.  strace only continues with a signal if the
-* stopping signal is not SIGTRAP.  -brl
-*/
-   if (current->exit_code) {
-   send_sig(current->exit_code, current, 1);
-   current->exit_code = 0;
-   }
+void do_syscall_trace_exit(struct pt_regs *regs)
+{
+   tracehook_report_syscall_exit(regs, 0);
 }
 
 void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs, int 
error_code)
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] arch/tile: Implement user_stack_pointer

2012-12-21 Thread Simon Marchi

It is needed when we turn on HAVE_ARCH_TRACEHOOK.

Signed-off-by: Simon Marchi 
---
 arch/tile/include/asm/ptrace.h |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/tile/include/asm/ptrace.h b/arch/tile/include/asm/ptrace.h
index 5ce052e..4be42fb 100644
--- a/arch/tile/include/asm/ptrace.h
+++ b/arch/tile/include/asm/ptrace.h
@@ -80,6 +80,11 @@ struct task_struct;
 extern void send_sigtrap(struct task_struct *tsk, struct pt_regs *regs,
 int error_code);
 
+static inline unsigned long user_stack_pointer(struct pt_regs *regs)
+{
+   return regs->sp;
+}
+
 #ifdef __tilegx__
 /* We need this since sigval_t has a user pointer in it, for GETSIGINFO etc. */
 #define __ARCH_WANT_COMPAT_SYS_PTRACE
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G

2012-12-21 Thread Yinghai Lu

On Fri, Dec 21, 2012 at 6:42 PM, Konrad Rzeszutek Wilk
 wrote:
> On Mon, Dec 17, 2012 at 11:15:32PM -0800, Yinghai Lu wrote:
>> Now we have limit kdump reseved under 896M, because kexec has the limitation.
>> and also bzImage need to stay under 4g.
>>
>> To make kexec/kdump could use range above 4g, we need to make bzImage and
>> ramdisk could be loaded above 4g.
>> During booting bzImage will be unpacked on same postion and stay high.
>>
>> The patches add fields in setup_header and boot_params to
>> 1. get info about ramdisk position info above 4g from bootloader/kexec
>> 2. get info about cmd_line_ptr info above 4g from bootloader/kexec
>> 3. set xloadflags bit0 in header for bzImage and bootloader/kexec load
>>could check that to decide if it could to put bzImage high.
>> 4. use sentinel to make sure ext_* fields in boot_params could be used.
>>
>> This patches is tested with kexec tools with local changes and they are sent
>> to kexec list later.
>>
>> could be found at:
>>
>> 
>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git 
>> for-x86-boot

that points to -v8.

>
> Did a light test and it looks to work under Xen - thought I had not tested
> any various configuration of memory layouts.
>
> More worryingly it blew up under native under an Dell T105 AMD box with 4GB 
> of memory.
> I can't get it even to print anything on the serial log:

boot log with working kernel ?

>
> (this is an excerpt from pxelinux.cfg/C0A8 file)
> LABEL BAREMETAL
>KERNEL vmlinuz
>APPEND initrd=initramfs.cpio.gz debug selinux=0  loglevel=10 apic=debug 
> console=uart8250,115200n8
>
>
> PXELINUX 3.82 2009-06-09  Copyright (C) 1994-2009 H. Peter Anvin et al
> Loading 
> vmlinuz...
> Loading 
> initramfs.cpio.gz.ready.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 24/27] x86: Add swiotlb force off support

2012-12-21 Thread Yinghai Lu

On Fri, Dec 21, 2012 at 7:25 PM, H. Peter Anvin  wrote:
> On 12/21/2012 07:23 PM, Eric W. Biederman wrote:
>>
>> In this case YH has been working on the case of loading a kernel
>> completely above 4G, and apparently he has also been testing the case of
>> running a kernel with no memory below 4G.
>>
>
> It is worth noting that we cannot run with *no* memory below 4G -- it is
> not possible to run SMP at least without some memory below the 1M mark.

yes need keep 8k or so for trampoline.

but kdump only use one CPU.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 24/27] x86: Add swiotlb force off support

2012-12-21 Thread Yinghai Lu

On Fri, Dec 21, 2012 at 6:42 PM, Eric W. Biederman
 wrote:
> Konrad Rzeszutek Wilk  writes:
>
>> On Mon, Dec 17, 2012 at 11:15:56PM -0800, Yinghai Lu wrote:
>>> So use could disable swiotlb from command line, even swiotlb support
>>> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.
>>
>> You really need to spell out why this is useful.
>
> YH why can't we safely autodetect that the swiotlb is unusable when
> there is no memory below 4G free?

good point.

will give it try.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 24/27] x86: Add swiotlb force off support

2012-12-21 Thread Yinghai Lu

On Fri, Dec 21, 2012 at 6:18 PM, Konrad Rzeszutek Wilk
 wrote:
> On Thu, Dec 13, 2012 at 02:02:18PM -0800, Yinghai Lu wrote:
>> So use could disable swiotlb from command line, even swiotlb support
>> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.
>
> Does this have any usage besides testing?

for kdump.

>
> And also pls in the future use scripts/get_maintainer.pl so
> that you can extract from the email of the maintainer (which would be me).
ok.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH review 3/3] proc: Allow proc_free_inum to be called from any context

2012-12-21 Thread Eric W. Biederman


While testing the pid namespace code I hit this nasty warning.

[  176.262617] [ cut here ]
[  176.263388] WARNING: at 
/home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 
local_bh_enable_ip+0x7a/0xa0()
[  176.265145] Hardware name: Bochs
[  176.265677] Modules linked in:
[  176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18
[  176.266564] Call Trace:
[  176.266564]  [] warn_slowpath_common+0x7f/0xc0
[  176.266564]  [] warn_slowpath_null+0x1a/0x20
[  176.266564]  [] local_bh_enable_ip+0x7a/0xa0
[  176.266564]  [] _raw_spin_unlock_bh+0x19/0x20
[  176.266564]  [] proc_free_inum+0x3a/0x50
[  176.266564]  [] free_pid_ns+0x1c/0x80
[  176.266564]  [] put_pid_ns+0x35/0x50
[  176.266564]  [] put_pid+0x4a/0x60
[  176.266564]  [] tty_ioctl+0x717/0xc10
[  176.266564]  [] ? wait_consider_task+0x855/0xb90
[  176.266564]  [] ? default_spin_lock_flags+0x9/0x10
[  176.266564]  [] ? remove_wait_queue+0x5a/0x70
[  176.266564]  [] do_vfs_ioctl+0x98/0x550
[  176.266564]  [] ? recalc_sigpending+0x1f/0x60
[  176.266564]  [] ? __set_task_blocked+0x37/0x80
[  176.266564]  [] ? sys_wait4+0xab/0xf0
[  176.266564]  [] sys_ioctl+0x91/0xb0
[  176.266564]  [] ? task_stopped_code+0x50/0x50
[  176.266564]  [] system_call_fastpath+0x16/0x1b
[  176.266564] ---[ end trace 387af88219ad6143 ]---

It turns out that spin_unlock_bh(proc_inum_lock) is not safe when
put_pid is called with another spinlock held and irqs disabled.

For now take the easy path and use spin_lock_irqsave(proc_inum_lock)
in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock).

Signed-off-by: "Eric W. Biederman" 
---
 fs/proc/generic.c |   13 +++--
 1 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index e064f56..76ddae8 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -352,18 +352,18 @@ retry:
if (!ida_pre_get(_inum_ida, GFP_KERNEL))
return -ENOMEM;
 
-   spin_lock_bh(_inum_lock);
+   spin_lock_irq(_inum_lock);
error = ida_get_new(_inum_ida, );
-   spin_unlock_bh(_inum_lock);
+   spin_unlock_irq(_inum_lock);
if (error == -EAGAIN)
goto retry;
else if (error)
return error;
 
if (i > UINT_MAX - PROC_DYNAMIC_FIRST) {
-   spin_lock_bh(_inum_lock);
+   spin_lock_irq(_inum_lock);
ida_remove(_inum_ida, i);
-   spin_unlock_bh(_inum_lock);
+   spin_unlock_irq(_inum_lock);
return -ENOSPC;
}
*inum = PROC_DYNAMIC_FIRST + i;
@@ -372,9 +372,10 @@ retry:
 
 void proc_free_inum(unsigned int inum)
 {
-   spin_lock_bh(_inum_lock);
+   unsigned long flags;
+   spin_lock_irqsave(_inum_lock, flags);
ida_remove(_inum_ida, inum - PROC_DYNAMIC_FIRST);
-   spin_unlock_bh(_inum_lock);
+   spin_unlock_irqrestore(_inum_lock, flags);
 }
 
 static void *proc_follow_link(struct dentry *dentry, struct nameidata *nd)
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH review 2/3] pidns: Stop pid allocation when init dies

2012-12-21 Thread Eric W. Biederman


Oleg pointed out that in a pid namespace the sequence.
- pid 1 becomes a zombie
- setns(thepidns), fork,...
- reaping pid 1.
- The injected processes exiting.

Can lead to processes attempting access their child reaper and
instead following a stale pointer.

That waitpid for init can return before all of the processes in
the pid namespace have exited is also unfortunate.

Avoid these problems by disabling the allocation of new pids in a pid
namespace when init dies, instead of when the last process in a pid
namespace is reaped.

Pointed-out-by:  Oleg Nesterov 
Signed-off-by: "Eric W. Biederman" 
---
 include/linux/pid.h   |1 +
 include/linux/pid_namespace.h |4 +++-
 kernel/pid.c  |   13 ++---
 kernel/pid_namespace.c|4 
 4 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index b152d44..2381c97 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -121,6 +121,7 @@ int next_pidmap(struct pid_namespace *pid_ns, unsigned int 
last);
 
 extern struct pid *alloc_pid(struct pid_namespace *ns);
 extern void free_pid(struct pid *pid);
+extern void disable_pid_allocation(struct pid_namespace *ns);
 
 /*
  * ns_of_pid() returns the pid namespace in which the specified pid was
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index bf28599..215e5e3 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -21,7 +21,7 @@ struct pid_namespace {
struct kref kref;
struct pidmap pidmap[PIDMAP_ENTRIES];
int last_pid;
-   int nr_hashed;
+   unsigned int nr_hashed;
struct task_struct *child_reaper;
struct kmem_cache *pid_cachep;
unsigned int level;
@@ -42,6 +42,8 @@ struct pid_namespace {
 
 extern struct pid_namespace init_pid_ns;
 
+#define PIDNS_HASH_ADDING (1U << 31)
+
 #ifdef CONFIG_PID_NS
 static inline struct pid_namespace *get_pid_ns(struct pid_namespace *ns)
 {
diff --git a/kernel/pid.c b/kernel/pid.c
index 36aa02f..3a5c872 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -270,7 +270,6 @@ void free_pid(struct pid *pid)
wake_up_process(ns->child_reaper);
break;
case 0:
-   ns->nr_hashed = -1;
schedule_work(>proc_work);
break;
}
@@ -319,7 +318,7 @@ struct pid *alloc_pid(struct pid_namespace *ns)
 
upid = pid->numbers + ns->level;
spin_lock_irq(_lock);
-   if (ns->nr_hashed < 0)
+   if (ns->nr_hashed < PIDNS_HASH_ADDING)
goto out_unlock;
for ( ; upid >= pid->numbers; --upid) {
hlist_add_head_rcu(>pid_chain,
@@ -342,6 +341,14 @@ out_free:
goto out;
 }
 
+void disable_pid_allocation(struct pid_namespace *ns)
+{
+   spin_lock_irq(_lock);
+   if (ns->nr_hashed >= PIDNS_HASH_ADDING)
+   ns->nr_hashed -= PIDNS_HASH_ADDING;
+   spin_unlock_irq(_lock);
+}
+
 struct pid *find_pid_ns(int nr, struct pid_namespace *ns)
 {
struct hlist_node *elem;
@@ -584,7 +591,7 @@ void __init pidmap_init(void)
/* Reserve PID 0. We never call free_pidmap(0) */
set_bit(0, init_pid_ns.pidmap[0].page);
atomic_dec(_pid_ns.pidmap[0].nr_free);
-   init_pid_ns.nr_hashed = 1;
+   init_pid_ns.nr_hashed = 1 + PIDNS_HASH_ADDING;
 
init_pid_ns.pid_cachep = KMEM_CACHE(pid,
SLAB_HWCACHE_ALIGN | SLAB_PANIC);
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index fdbd0cd..c1c3dc1 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -115,6 +115,7 @@ static struct pid_namespace *create_pid_namespace(struct 
user_namespace *user_ns
ns->level = level;
ns->parent = get_pid_ns(parent_pid_ns);
ns->user_ns = get_user_ns(user_ns);
+   ns->nr_hashed = PIDNS_HASH_ADDING;
INIT_WORK(>proc_work, proc_cleanup_work);
 
set_bit(0, ns->pidmap[0].page);
@@ -181,6 +182,9 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
int rc;
struct task_struct *task, *me = current;
 
+   /* Don't allow any more processes into the pid namespace */
+   disable_pid_allocation(pid_ns);
+
/* Ignore SIGCHLD causing any terminated children to autoreap */
spin_lock_irq(>sighand->siglock);
me->sighand->action[SIGCHLD - 1].sa.sa_handler = SIG_IGN;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH review 1/3] pidns: Outlaw thread creation after unshare(CLONE_NEWPID)

2012-12-21 Thread Eric W. Biederman


The sequence:
unshare(CLONE_NEWPID)
clone(CLONE_THREAD|CLONE_SIGHAND|CLONE_VM)

Creates a new process in the new pid namespace without setting
pid_ns->child_reaper.  After forking this results in a NULL
pointer dereference.

Avoid this and other nonsense scenarios that can show up after
creating a new pid namespace with unshare by adding a new
check in copy_prodcess.

Pointed-out-by:  Oleg Nesterov 
Signed-off-by: "Eric W. Biederman" 
---
 kernel/fork.c |8 
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index a31b823..65ca6d2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1166,6 +1166,14 @@ static struct task_struct *copy_process(unsigned long 
clone_flags,
current->signal->flags & SIGNAL_UNKILLABLE)
return ERR_PTR(-EINVAL);
 
+   /*
+* If the new process will be in a different pid namespace
+* don't allow the creation of threads.
+*/
+   if ((clone_flags & (CLONE_VM|CLONE_NEWPID)) &&
+   (task_active_pid_ns(current) != current->nsproxy->pid_ns))
+   return ERR_PTR(-EINVAL);
+
retval = security_task_create(clone_flags);
if (retval)
goto fork_out;
-- 
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH review 0/3] pid namespaces fixes

2012-12-21 Thread Eric W. Biederman


Oleg assuming I am not blind these patches should fix the issues you
spotted in the pid namespace as well as one additional one that I found
during testing.

Anyone with an extra set of eyeballs that wants to look over this code
and double check to make certain I am not doing something stupid would
be welcome.

These patches are against 3.8-rc1 and I hope to get the into linux-next
and on to Linus shortly.

Eric W. Biederman (3):
  pidns: Outlaw thread creation after unshare(CLONE_NEWPID)
  pidns: Stop pid allocation when init dies
  proc: Allow proc_free_inum to be called from any context

 fs/proc/generic.c |   13 +++--
 include/linux/pid.h   |1 +
 include/linux/pid_namespace.h |4 +++-
 kernel/fork.c |8 
 kernel/pid.c  |   13 ++---
 kernel/pid_namespace.c|4 
 6 files changed, 33 insertions(+), 10 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/3] x86,smp: proportional backoff for ticket spinlocks

2012-12-21 Thread Michel Lespinasse

On Fri, Dec 21, 2012 at 3:51 PM, Rik van Riel  wrote:
> Subject: x86,smp: proportional backoff for ticket spinlocks
>
> Simple fixed value proportional backoff for ticket spinlocks.
> By pounding on the cacheline with the spin lock less often,
> bus traffic is reduced. In cases of a data structure with
> embedded spinlock, the lock holder has a better chance of
> making progress.
>
> Signed-off-by: Rik van Riel 

Looks fine to me other than the arbitrary-ness of 50

Reviewed-by: Michel Lespinasse 

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/3] x86,smp: move waiting on contended lock out of line

2012-12-21 Thread Rik van Riel


On 12/21/2012 11:40 PM, Michel Lespinasse wrote:

On Fri, Dec 21, 2012 at 3:50 PM, Rik van Riel  wrote:



@@ -53,12 +55,11 @@ static __always_inline void 
__ticket_spin_lock(arch_spinlock_t *lock)

 inc = xadd(>tickets, inc);



+   if (inc.head == inc.tail)
+   goto out;
+
+   ticket_spin_lock_wait(lock, inc);
+ out:


why not just:

if (inc.head != inc.tail)
   ticket_spin_lock_wait(lock, inc)


That makes the code nicer, thank you. Applied.


+++ b/arch/x86/kernel/smp.c
@@ -113,6 +113,20 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
  static bool smp_no_nmi_ipi = false;

  /*
+ * Wait on a congested ticket spinlock.
+ */
+void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
+{
+   for (;;) {
+   cpu_relax();
+   inc.head = ACCESS_ONCE(lock->tickets.head);
+
+   if (inc.head == inc.tail)
+   break;
+   }


Why not just:

do {
   cpu_relax()
   inc.head = ...
} while (inc.head != inc.tail);


Other than that, no problems with the principle of it.


In patch #3 I do something else inside the head == tail
conditional block, so this one is best left alone.

Thank you for the comments.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/3] x86,smp: move waiting on contended lock out of line

2012-12-21 Thread Michel Lespinasse

On Fri, Dec 21, 2012 at 3:50 PM, Rik van Riel  wrote:
> diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
> index 33692ea..2a45eb0 100644
> --- a/arch/x86/include/asm/spinlock.h
> +++ b/arch/x86/include/asm/spinlock.h
> @@ -34,6 +34,8 @@
>  # define UNLOCK_LOCK_PREFIX
>  #endif
>
> +extern void ticket_spin_lock_wait(arch_spinlock_t *, struct __raw_tickets);
> +
>  /*
>   * Ticket locks are conceptually two parts, one indicating the current head 
> of
>   * the queue, and the other indicating the current tail. The lock is acquired
> @@ -53,12 +55,11 @@ static __always_inline void 
> __ticket_spin_lock(arch_spinlock_t *lock)
>
> inc = xadd(>tickets, inc);
>
> -   for (;;) {
> -   if (inc.head == inc.tail)
> -   break;
> -   cpu_relax();
> -   inc.head = ACCESS_ONCE(lock->tickets.head);
> -   }
> +   if (inc.head == inc.tail)
> +   goto out;
> +
> +   ticket_spin_lock_wait(lock, inc);
> + out:

why not just:

if (inc.head != inc.tail)
  ticket_spin_lock_wait(lock, inc)

> barrier();  /* make sure nothing creeps before the lock 
> is taken */
>  }
>
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 48d2b7d..20da354 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -113,6 +113,20 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
>  static bool smp_no_nmi_ipi = false;
>
>  /*
> + * Wait on a congested ticket spinlock.
> + */
> +void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
> +{
> +   for (;;) {
> +   cpu_relax();
> +   inc.head = ACCESS_ONCE(lock->tickets.head);
> +
> +   if (inc.head == inc.tail)
> +   break;
> +   }

Why not just:

do {
  cpu_relax()
  inc.head = ...
} while (inc.head != inc.tail);


Other than that, no problems with the principle of it.

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[GIT PULL] dmaengine updates

2012-12-21 Thread Vinod Koul

Hi Linus,

This is the pull request for dmaengine. I just saw that you have declared rc1
couple of hours ago, so I missed the window narrowly... I can try to make an
excuse that I am vacation and was traveling so this got delayed. And was
counting on your Christmas deadline :(

Please do consider merging this for rc2 as it brings in much awaited DT support 
for
dmaengine which lot of folks care about and have plans to work on it for next
release. Along with this add few other odd fixes including ones on async_tx

are available in the git repository at:

  git://git.infradead.org/users/vkoul/slave-dma.git next

Akinobu Mita (4):
  dmaengine: use for_each_set_bit
  dma: amba-pl08x: use vchan_dma_desc_free_list
  dmatest: adjust invalid module parameters for number of source buffers
  async_tx: use memchr_inv

Andy Shevchenko (4):
  dw_dmac: change dev_printk() to corresponding macros
  dw_dmac: don't call platform_get_drvdata twice
  dw_dmac: change dev_crit to dev_WARN in dwc_handle_error
  dw_dmac: introduce to_dw_desc() macro

Barry Song (2):
  dmaengine: sirf: enable the driver support new SiRFmarco SoC
  DMAEngine: add dmaengine_prep_interleaved_dma wrapper for interleaved api

Bartlomiej Zolnierkiewicz (10):
  async_tx: add missing DMA unmap to async_memcpy()
  ioat: add missing DMA unmap to ioat_dma_self_test()
  mtd: fsmc_nand: add missing DMA unmap to dma_xfer()
  carma-fpga: pass correct flags to ->device_prep_dma_memcpy()
  ioat3: add missing DMA unmap to ioat_xor_val_self_test()
  async_tx: fix build for async_memset
  dmaengine: remove dma_async_memcpy_pending() macro
  dmaengine: remove dma_async_memcpy_complete() macro
  dmaengine: add cpu_relax() to busy-loop in dma_sync_wait()
  async_tx: fix checking of dma_wait_for_async_tx() return value

Dave Jiang (2):
  ioat: Add alignment workaround for IVB platforms
  ioat: remove chanerr mask setting for IOAT v3.x

Guennadi Liakhovetski (1):
  dma: sh: Don't use ENODEV for failing slave lookup

Heikki Krogerus (2):
  dmaengine: dw_dmac: remove CLK dependency
  dmaengine: dw_dmac: amend description and indentation

Jean Delvare (1):
  dma: ipu: Drop unused spinlock

Joe Perches (1):
  dma: Convert dev_printk(KERN_ to dev_(

Jon Hunter (4):
  dmaengine: add helper function to request a slave DMA channel
  of: Add generic device tree DMA helpers
  of: dma: fix potential deadlock when requesting a slave channel
  of: dma: fix protection of DMA controller data stored by DMA helpers

Jon Mason (1):
  dmatest: Fix NULL pointer dereference on ioat

Kees Cook (1):
  drivers/dma: remove CONFIG_EXPERIMENTAL

Maciej Sosnowski (1):
  dca: check against empty dca_domains list before unregister provider

Matt Porter (1):
  of: dma: fix typos in generic dma binding definition

Sachin Kamat (1):
  DMA: PL330: Use devm_* functions

Shiraz Hashim (1):
  dmaengine/dmatest: terminate transfers only in case of errors

Vinod Koul (5):
  of: dma- fix build break for !CONFIG_OF
  dmaengine: fix build failure due to missing semi-colon
  Merge branch 'topic/dmaengine_dt' into next
  dmaengine: fix !of_dma compilation warning
  Merge git://git.kernel.org/.../djbw/dmaengine.git/next

Viresh Kumar (3):
  dmaengine: dw_dmac: Update documentation style comments for 
dw_dma_platform_data
  dmaengine: dw_dmac: Enhance device tree support
  ARM: SPEAr13xx: Pass DW DMAC platform data from DT

Wei Yongjun (1):
  pch_dma: use module_pci_driver to simplify the code

 Documentation/devicetree/bindings/dma/dma.txt  |   81 ++
 Documentation/devicetree/bindings/dma/snps-dma.txt |   44 
 arch/arm/boot/dts/spear1340.dtsi   |   19 ++
 arch/arm/boot/dts/spear13xx.dtsi   |   38 +++
 arch/arm/mach-spear13xx/include/mach/spear.h   |2 -
 arch/arm/mach-spear13xx/spear1310.c|4 +-
 arch/arm/mach-spear13xx/spear1340.c|   27 +--
 arch/arm/mach-spear13xx/spear13xx.c|   54 +
 crypto/async_tx/async_memcpy.c |6 +
 crypto/async_tx/async_memset.c |1 +
 crypto/async_tx/async_tx.c |9 +-
 crypto/async_tx/async_xor.c|4 +-
 drivers/dca/dca-core.c |5 +
 drivers/dma/Kconfig|7 +-
 drivers/dma/amba-pl08x.c   |8 +-
 drivers/dma/at_hdmac_regs.h|8 +-
 drivers/dma/dmaengine.c|   21 ++-
 drivers/dma/dmatest.c  |   22 ++-
 drivers/dma/dw_dmac.c  |  167 +++--
 drivers/dma/dw_dmac_regs.h |6 +
 drivers/dma/ioat/dma.c |   11 +-
 drivers/dma/ioat/dma_v3.c

[PATCH RT 1/5] sched: Adjust sched_reset_on_fork when nothing else changes

2012-12-21 Thread Steven Rostedt

From: Thomas Gleixner 

If the policy and priority remain unchanged a possible modification of
sched_reset_on_fork gets lost in the early exit path.

Signed-off-by: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Cc: stable...@vger.kernel.org
---
 kernel/sched/core.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1f9d6f5..3753bda 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4441,11 +4441,13 @@ recheck:
}
 
/*
-* If not changing anything there's no need to proceed further:
+* If not changing anything there's no need to proceed
+* further, but store a possible modification of
+* reset_on_fork.
 */
if (unlikely(policy == p->policy && (!rt_policy(policy) ||
param->sched_priority == p->rt_priority))) {
-
+   p->sched_reset_on_fork = reset_on_fork;
__task_rq_unlock(rq);
raw_spin_unlock_irqrestore(>pi_lock, flags);
return 0;
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 4/5] block: Use cpu_chill() for retry loops

2012-12-21 Thread Steven Rostedt

From: Thomas Gleixner 

Retry loops on RT might loop forever when the modifying side was
preempted. Steven also observed a live lock when there was a
concurrent priority boosting going on.

Use cpu_chill() instead of cpu_relax() to let the system
make progress.

Signed-off-by: Thomas Gleixner 
Cc: stable...@vger.kernel.org
---
 block/blk-ioc.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index fb95dd2..6b54201 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -8,6 +8,7 @@
 #include 
 #include  /* for max_pfn/max_low_pfn */
 #include 
+#include 
 
 #include "blk.h"
 
@@ -110,7 +111,7 @@ static void ioc_release_fn(struct work_struct *work)
spin_unlock(q->queue_lock);
} else {
spin_unlock_irqrestore(>lock, flags);
-   cpu_relax();
+   cpu_chill();
spin_lock_irqsave_nested(>lock, flags, 1);
}
}
@@ -188,7 +189,7 @@ retry:
spin_unlock(icq->q->queue_lock);
} else {
spin_unlock_irqrestore(>lock, flags);
-   cpu_relax();
+   cpu_chill();
goto retry;
}
}
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 0/5] [ANNOUNCE] 3.4.24-rt36-rc1 stable review

2012-12-21 Thread Steven Rostedt


Dear RT Folks,

This is the RT stable review cycle of patch 3.4.24-rt36-rc1.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 12/26/2012.

Enjoy,

-- Steve


To build 3.4.24-rt36-rc1 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.4.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.4.24.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.4/patch-3.4.24-rt36-rc1.patch.xz

You can also build from 3.4.24-rt35 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.4/incr/patch-3.4.24-rt35-rt36-rc1.patch.xz


Changes from 3.4.24-rt35:

---


Steven Rostedt (1):
  Linux 3.4.24-rt36-rc1

Thomas Gleixner (4):
  sched: Adjust sched_reset_on_fork when nothing else changes
  sched: Queue RT tasks to head when prio drops
  sched: Consider pi boosting in setscheduler
  block: Use cpu_chill() for retry loops


 block/blk-ioc.c   |5 +++--
 include/linux/sched.h |5 +
 kernel/rtmutex.c  |   12 +++
 kernel/sched/core.c   |   55 +
 localversion-rt   |2 +-
 5 files changed, 63 insertions(+), 16 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 2/5] sched: Queue RT tasks to head when prio drops

2012-12-21 Thread Steven Rostedt

From: Thomas Gleixner 

The following scenario does not work correctly:

Runqueue of CPUx contains two runnable and pinned tasks:
 T1: SCHED_FIFO, prio 80
 T2: SCHED_FIFO, prio 80

T1 is on the cpu and executes the following syscalls (classic priority
ceiling scenario):

 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 90);
 ...
 sys_sched_setscheduler(pid(T1), SCHED_FIFO, .prio = 80);
 ...

Now T1 gets preempted by T3 (SCHED_FIFO, prio 95). After T3 goes back
to sleep the scheduler picks T2. Surprise!

The same happens w/o actual preemption when T1 is forced into the
scheduler due to a sporadic NEED_RESCHED event. The scheduler invokes
pick_next_task() which returns T2. So T1 gets preempted and scheduled
out.

This happens because sched_setscheduler() dequeues T1 from the prio 90
list and then enqueues it on the tail of the prio 80 list behind T2.
This violates the POSIX spec and surprises user space which relies on
the guarantee that SCHED_FIFO tasks are not scheduled out unless they
give the CPU up voluntarily or are preempted by a higher priority
task. In the latter case the preempted task must get back on the CPU
after the preempting task schedules out again.

We fixed a similar issue already in commit 60db48c (sched: Queue a
deboosted task to the head of the RT prio queue). The same treatment
is necessary for sched_setscheduler(). So enqueue to head of the prio
bucket list if the priority of the task is lowered.

It might be possible that existing user space relies on the current
behaviour, but it can be considered highly unlikely due to the corner
case nature of the application scenario.

Signed-off-by: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Cc: stable...@vger.kernel.org
---
 kernel/sched/core.c |9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3753bda..b02f995 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4489,8 +4489,13 @@ recheck:
 
if (running)
p->sched_class->set_curr_task(rq);
-   if (on_rq)
-   enqueue_task(rq, p, 0);
+   if (on_rq) {
+   /*
+* We enqueue to tail when the priority of a task is
+* increased (user space view).
+*/
+   enqueue_task(rq, p, oldprio <= p->prio ? ENQUEUE_HEAD : 0);
+   }
 
check_class_changed(rq, p, prev_class, oldprio);
task_rq_unlock(rq, p, );
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 5/5] Linux 3.4.24-rt36-rc1

2012-12-21 Thread Steven Rostedt

From: Steven Rostedt 

---
 localversion-rt |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index 366440d..a7827dc 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt35
+-rt36-rc1
-- 
1.7.10.4


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 3/5] sched: Consider pi boosting in setscheduler

2012-12-21 Thread Steven Rostedt

From: Thomas Gleixner 

If a PI boosted task policy/priority is modified by a setscheduler()
call we unconditionally dequeue and requeue the task if it is on the
runqueue even if the new priority is lower than the current effective
boosted priority. This can result in undesired reordering of the
priority bucket list.

If the new priority is less or equal than the current effective we
just store the new parameters in the task struct and leave the
scheduler class and the runqueue untouched. This is handled when the
task deboosts itself. Only if the new priority is higher than the
effective boosted priority we apply the change immediately.

Signed-off-by: Thomas Gleixner 
Cc: sta...@vger.kernel.org
Cc: stable...@vger.kernel.org
---
 include/linux/sched.h |5 +
 kernel/rtmutex.c  |   12 
 kernel/sched/core.c   |   40 +++-
 3 files changed, 48 insertions(+), 9 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f291347..b0448fa 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2166,6 +2166,7 @@ extern unsigned int sysctl_sched_cfs_bandwidth_slice;
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
+extern int rt_mutex_check_prio(struct task_struct *task, int newprio);
 extern void rt_mutex_adjust_pi(struct task_struct *p);
 static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
 {
@@ -2176,6 +2177,10 @@ static inline int rt_mutex_getprio(struct task_struct *p)
 {
return p->normal_prio;
 }
+static inline int rt_mutex_check_prio(struct task_struct *task, int newprio)
+{
+   return 0;
+}
 # define rt_mutex_adjust_pi(p) do { } while (0)
 static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
 {
diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 3bff726..20742e7 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -124,6 +124,18 @@ int rt_mutex_getprio(struct task_struct *task)
 }
 
 /*
+ * Called by sched_setscheduler() to check whether the priority change
+ * is overruled by a possible priority boosting.
+ */
+int rt_mutex_check_prio(struct task_struct *task, int newprio)
+{
+   if (!task_has_pi_waiters(task))
+   return 0;
+
+   return task_top_pi_waiter(task)->pi_list_entry.prio <= newprio;
+}
+
+/*
  * Adjust the priority of a task, after its pi_waiters got modified.
  *
  * This can be both boosting and unboosting. task->pi_lock must be held.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b02f995..7b501a3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4085,7 +4085,8 @@ EXPORT_SYMBOL(sleep_on_timeout);
  * This function changes the 'effective' priority of a task. It does
  * not touch ->normal_prio like __setscheduler().
  *
- * Used by the rt_mutex code to implement priority inheritance logic.
+ * Used by the rt_mutex code to implement priority inheritance
+ * logic. Call site only calls if the priority of the task changed.
  */
 void rt_mutex_setprio(struct task_struct *p, int prio)
 {
@@ -4308,20 +4309,25 @@ static struct task_struct *find_process_by_pid(pid_t 
pid)
return pid ? find_task_by_vpid(pid) : current;
 }
 
-/* Actually do priority change: must hold rq lock. */
-static void
-__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
+static void __setscheduler_params(struct task_struct *p, int policy, int prio)
 {
p->policy = policy;
p->rt_priority = prio;
p->normal_prio = normal_prio(p);
+   set_load_weight(p);
+}
+
+/* Actually do priority change: must hold rq lock. */
+static void
+__setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
+{
+   __setscheduler_params(p, policy, prio);
/* we are holding p->pi_lock already */
p->prio = rt_mutex_getprio(p);
if (rt_prio(p->prio))
p->sched_class = _sched_class;
else
p->sched_class = _sched_class;
-   set_load_weight(p);
 }
 
 /*
@@ -4346,6 +4352,7 @@ static bool check_same_owner(struct task_struct *p)
 static int __sched_setscheduler(struct task_struct *p, int policy,
const struct sched_param *param, bool user)
 {
+   int newprio = MAX_RT_PRIO - 1 - param->sched_priority;
int retval, oldprio, oldpolicy = -1, on_rq, running;
unsigned long flags;
const struct sched_class *prev_class;
@@ -4474,6 +4481,25 @@ recheck:
task_rq_unlock(rq, p, );
goto recheck;
}
+
+   p->sched_reset_on_fork = reset_on_fork;
+   oldprio = p->prio;
+
+   /*
+* Special case for priority boosted tasks.
+*
+* If the new priority is lower or equal (user space view)
+* than the current (boosted) priority, we just store the new
+* normal parameters and do not touch the scheduler class and
+* the runqueue.

Re: [PATCH 1/9] mm: make mlockall preserve flags other than VM_LOCKED in def_flags

2012-12-21 Thread Rik van Riel


On 12/20/2012 07:49 PM, Michel Lespinasse wrote:

On most architectures, def_flags is either 0 or VM_LOCKED depending on
whether mlockall(MCL_FUTURE) was called. However, this is not an absolute
rule as kvm support on s390 may set the VM_NOHUGEPAGE flag in def_flags.
We don't want mlockall to clear that.

Signed-off-by: Michel Lespinasse 


Reviewed-by: Rik van Riel 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 21/25] tty/max3100: don't use [delayed_]work_pending()

2012-12-21 Thread Greg Kroah-Hartman

On Fri, Dec 21, 2012 at 05:57:11PM -0800, Tejun Heo wrote:
> There's no need to test whether a (delayed) work item in pending
> before queueing, flushing or cancelling it.  Most uses are unnecessary
> and quite a few of them are buggy.
> 
> Remove unnecessary pending tests from max3100.  Only compile tested.
> 
> Signed-off-by: Tejun Heo 
> Cc: Greg Kroah-Hartman 
> Cc: Jiri Slaby 
> ---
> Please let me know how this patch should be routed.  I can take it
> through the workqueue tree if necessary.

Please, feel free to take it through your tree:

Acked-by: Greg Kroah-Hartman 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2012-12-21 Thread Steven Rostedt

On Fri, 2012-12-21 at 13:39 +0200, Terje Bergstrom wrote:

> diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h
> index d98d74c..e087910 100644
> --- a/include/trace/events/host1x.h
> +++ b/include/trace/events/host1x.h
> @@ -37,6 +37,214 @@ DECLARE_EVENT_CLASS(host1x,
>   TP_printk("name=%s", __entry->name)
>  );
>  
> +DEFINE_EVENT(host1x, host1x_channel_open,
> + TP_PROTO(const char *name),
> + TP_ARGS(name)
> +);
> +
> +DEFINE_EVENT(host1x, host1x_channel_release,
> + TP_PROTO(const char *name),
> + TP_ARGS(name)
> +);
> +
> +TRACE_EVENT(host1x_cdma_begin,
> + TP_PROTO(const char *name),
> +
> + TP_ARGS(name),
> +
> + TP_STRUCT__entry(
> + __field(const char *, name)
> + ),
> +
> + TP_fast_assign(
> + __entry->name = name;
> + ),
> +
> + TP_printk("name=%s",
> + __entry->name)
> +);
> +
> +TRACE_EVENT(host1x_cdma_end,
> + TP_PROTO(const char *name),
> +
> + TP_ARGS(name),
> +
> + TP_STRUCT__entry(
> + __field(const char *, name)
> + ),
> +
> + TP_fast_assign(
> + __entry->name = name;
> + ),
> +
> + TP_printk("name=%s",
> + __entry->name)
> +);

The above two should be combined into a DECLARE_EVENT_CLASS() and
DEFINE_EVENT()s. Saves text and data space that way.

> +
> +TRACE_EVENT(host1x_cdma_flush,
> + TP_PROTO(const char *name, int timeout),
> +
> + TP_ARGS(name, timeout),
> +
> + TP_STRUCT__entry(
> + __field(const char *, name)
> + __field(int, timeout)
> + ),
> +
> + TP_fast_assign(
> + __entry->name = name;
> + __entry->timeout = timeout;
> + ),
> +
> + TP_printk("name=%s, timeout=%d",
> + __entry->name, __entry->timeout)
> +);
> +
> +TRACE_EVENT(host1x_cdma_push,
> + TP_PROTO(const char *name, u32 op1, u32 op2),
> +
> + TP_ARGS(name, op1, op2),
> +
> + TP_STRUCT__entry(
> + __field(const char *, name)
> + __field(u32, op1)
> + __field(u32, op2)
> + ),
> +
> + TP_fast_assign(
> + __entry->name = name;
> + __entry->op1 = op1;
> + __entry->op2 = op2;
> + ),
> +
> + TP_printk("name=%s, op1=%08x, op2=%08x",
> + __entry->name, __entry->op1, __entry->op2)
> +);
> +
> +TRACE_EVENT(host1x_cdma_push_gather,
> + TP_PROTO(const char *name, u32 mem_id,
> + u32 words, u32 offset, void *cmdbuf),
> +
> + TP_ARGS(name, mem_id, words, offset, cmdbuf),
> +
> + TP_STRUCT__entry(
> + __field(const char *, name)
> + __field(u32, mem_id)
> + __field(u32, words)
> + __field(u32, offset)
> + __field(bool, cmdbuf)
> + __dynamic_array(u32, cmdbuf, words)
> + ),
> +
> + TP_fast_assign(
> + if (cmdbuf) {
> + memcpy(__get_dynamic_array(cmdbuf), cmdbuf+offset,
> + words * sizeof(u32));
> + }
> + __entry->cmdbuf = cmdbuf;
> + __entry->name = name;
> + __entry->mem_id = mem_id;
> + __entry->words = words;
> + __entry->offset = offset;
> + ),
> +
> + TP_printk("name=%s, mem_id=%08x, words=%u, offset=%d, contents=[%s]",
> +   __entry->name, __entry->mem_id,
> +   __entry->words, __entry->offset,
> +   __print_hex(__get_dynamic_array(cmdbuf),
> +   __entry->cmdbuf ? __entry->words * 4 : 0))
> +);
> +
> +TRACE_EVENT(host1x_channel_submit,
> + TP_PROTO(const char *name, u32 cmdbufs, u32 relocs, u32 waitchks,
> + u32 syncpt_id, u32 syncpt_incrs),
> +
> + TP_ARGS(name, cmdbufs, relocs, waitchks, syncpt_id, syncpt_incrs),
> +
> + TP_STRUCT__entry(
> + __field(const char *, name)
> + __field(u32, cmdbufs)
> + __field(u32, relocs)
> + __field(u32, waitchks)
> + __field(u32, syncpt_id)
> + __field(u32, syncpt_incrs)
> + ),
> +
> + TP_fast_assign(
> + __entry->name = name;
> + __entry->cmdbufs = cmdbufs;
> + __entry->relocs = relocs;
> + __entry->waitchks = waitchks;
> + __entry->syncpt_id = syncpt_id;
> + __entry->syncpt_incrs = syncpt_incrs;
> + ),
> +
> + TP_printk("name=%s, cmdbufs=%u, relocs=%u, waitchks=%d,"
> + "syncpt_id=%u, syncpt_incrs=%u",
> +   __entry->name, __entry->cmdbufs, __entry->relocs, __entry->waitchks,
> +   __entry->syncpt_id, __entry->syncpt_incrs)
> +);
> +
> +TRACE_EVENT(host1x_channel_submitted,
> + TP_PROTO(const char *name, u32 syncpt_base, u32 syncpt_max),
> +
> + TP_ARGS(name, syncpt_base, syncpt_max),
> +
> + TP_STRUCT__entry(
> + __field(const char *, name)
> + __field(u32, syncpt_base)
> + __field(u32,

Re: [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Rik van Riel


On 12/21/2012 10:49 PM, Steven Rostedt wrote:

On Fri, Dec 21, 2012 at 09:51:35PM -0500, Rik van Riel wrote:



However, since spinlock contention should not be the
usual state, and all a scalable lock does is make sure
that N+1 CPUs does not perform worse than N CPUs, using
scalable locks is a stop-gap measure.

I believe a stop-gap measure should be kept as simple as
we can. I am willing to consider moving to a per-lock
delay factor if we can figure out an easy way to do it,
but I would like to avoid too much extra complexity...


Rik,

I like your solution. It's rather simple and simple solutions tend to
end up being the closest to optimal. The more complex a solution gets,
the more it starts chasing fireflies.



Anyway, I'd like to see this code tested, and more benchmarks run
against it.


Absolutely.  I would love to see if this code actually
causes regressions anywhere.

It is simple enough that I suspect it will not, but there
really is only one way to find out.

The more people test this with different workloads on
different SMP systems, the better.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] drivers: infiniband: hw: cxgb4: fix cast warning

2012-12-21 Thread Steve Wise


Acked-by: Steve Wise 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

The relationship between flag: BH_Uptodate & BH_Dirty in buffer_head

2012-12-21 Thread cyhung . cs00g

Hi, all

I want ask a question.

As we know, structure buffer_head has a member called b_state.
And this state could be defined by many flags.

The question is that, I want to know the relationship between BH_Uptodate
& BH_Dirty.

If a data is verified, the BH_Dirty will setup, and what about BH_Uptodate?

Under what circumstance the BH_Uptodate would be set up?


Thanks. :)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Steven Rostedt

On Fri, Dec 21, 2012 at 09:51:35PM -0500, Rik van Riel wrote:
> On 12/21/2012 07:47 PM, David Daney wrote:
> 
> >>+#define MIN_SPINLOCK_DELAY 1
> >>+#define MAX_SPINLOCK_DELAY 1000
> >>+DEFINE_PER_CPU(int, spinlock_delay) = { MIN_SPINLOCK_DELAY };
> >
> >
> >This gives the same delay for all locks in the system, but the amount of
> >work done under each lock is different.  So, for any given lock, the
> >delay is not optimal.
> >
> >This is an untested idea that came to me after looking at this:
> >
> >o Assume that for any given lock, the optimal delay is the same for all
> >CPUs in the system.
> >
> >o Store a per-lock delay value in arch_spinlock_t.

This can bloat the data structures. I would like to avoid that.

> >
> >o Once a CPU owns the lock it can update the delay as you do for the
> >per_cpu version.  Tuning the delay on fewer of the locking operations
> >reduces bus traffic, but makes it converge more slowly.
> >
> >o Bonus points if you can update the delay as part of the releasing store.
> 
> It would absolutely have to be part of the same load and
> store cycle, otherwise we would increase bus traffic and
> defeat the purpose.
> 
> However, since spinlock contention should not be the
> usual state, and all a scalable lock does is make sure
> that N+1 CPUs does not perform worse than N CPUs, using
> scalable locks is a stop-gap measure.
> 
> I believe a stop-gap measure should be kept as simple as
> we can. I am willing to consider moving to a per-lock
> delay factor if we can figure out an easy way to do it,
> but I would like to avoid too much extra complexity...

Rik,

I like your solution. It's rather simple and simple solutions tend to
end up being the closest to optimal. The more complex a solution gets,
the more it starts chasing fireflies.

Locks that are not likely to be contended will most likely not hit this
code, as it only gets triggered when contended. Now, really the wait
will be the size of the critical section. If quick locks gets hit a lot,
the auto delay will shink. If long critical sections start getting
contention, the auto delay will grow. But the general delay should
become a balance in the system that should be ideal. Kind of like the
NUMA balancing ;-)

Anyway, I'd like to see this code tested, and more benchmarks run
against it.

Thanks,

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Rik van Riel


On 12/21/2012 10:33 PM, Steven Rostedt wrote:

On Fri, Dec 21, 2012 at 06:56:13PM -0500, Rik van Riel wrote:

diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 4e44840..e44c56f 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -113,19 +113,62 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
  static bool smp_no_nmi_ipi = false;

  /*
- * Wait on a congested ticket spinlock.
+ * Wait on a congested ticket spinlock. Many spinlocks are embedded in
+ * data structures; having many CPUs pounce on the cache line with the
+ * spinlock simultaneously can slow down the lock holder, and the system
+ * as a whole.
+ *
+ * To prevent total performance collapse in case of bad spinlock contention,
+ * perform proportional backoff. The per-cpu value of delay is automatically
+ * tuned to limit the number of times spinning CPUs poll the lock before
+ * obtaining it. This limits the amount of cross-CPU traffic required to obtain
+ * a spinlock, and keeps system performance from dropping off a cliff.
+ *
+ * There is a tradeoff. If we poll too often, the whole system is slowed
+ * down. If we sleep too long, the lock will go unused for a period of
+ * time. Adjusting "delay" to poll, on average, 2.7 times before the
+ * lock is obtained seems to result in low bus traffic. The combination
+ * of aiming for a non-integer amount of average polls, and scaling the
+ * sleep period proportionally to how many CPUs are ahead of us in the
+ * queue for this ticket lock seems to reduce the amount of time spent
+ * "oversleeping" the release of the lock.
   */
+#define MIN_SPINLOCK_DELAY 1
+#define MAX_SPINLOCK_DELAY 1000
+DEFINE_PER_CPU(int, spinlock_delay) = { MIN_SPINLOCK_DELAY };
  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
  {
+   /*
+* Use the raw per-cpu pointer; preemption is disabled in the
+* spinlock code. This avoids put_cpu_var once we have the lock.
+*/
+   int *delay_ptr = _cpu(spinlock_delay, smp_processor_id());
+   int delay = *delay_ptr;


I'm confused by the above comment. Why not just:

int delay = this_cpu_read(spinlock_delay);
?


Eric Dumazet pointed out the same thing.  My code now
uses __this_cpu_read and __this_cpu_write.


Too bad you posted this just before break. I currently have access to a
40 core box, and I would have loved to test this. But right now I have
it testing other things, and hopefully I'll still have access to it
after the break.


I will try to run this test on a really large SMP system
in the lab during the break.

Ideally, the auto-tuning will keep the delay value large
enough that performance will stay flat even when there are
100 CPUs contending over the same lock.

Maybe it turns out that the maximum allowed delay value
needs to be larger.  Only one way to find out...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/3] x86,smp: proportional backoff for ticket spinlocks

2012-12-21 Thread Rik van Riel


On 12/21/2012 10:14 PM, Steven Rostedt wrote:


OK, I replied here before reading patch 3 (still reviewing it). Why have
this patch at all? Just to test if you broke something between this and
patch 3? Or perhaps patch 3 may not get accepted? In that case, you
would still need a comment.

Either explicitly state that this patch is just a stepping stone for
patch 3, and will either be accepted or rejected along with patch 3. Or
keep it as a stand alone patch and add comments as such. Or just get rid
of it all together.


I will document this patch better, explaining that it is
a stepping stone, that the number 50 is likely to be
wrong for many systems, and that the next patch fixes
things, using this text in the changelog:



The number 50 is likely to be wrong for many setups, and
this patch is mostly to illustrate the concept of proportional
backup. The next patch automatically tunes the delay value.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Rik van Riel


On 12/21/2012 10:29 PM, Eric Dumazet wrote:

On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:

+   int *delay_ptr = _cpu(spinlock_delay, smp_processor_id());
+   int delay = *delay_ptr;


int delay = __this_cpu_read(spinlock_delay);


}
+   *delay_ptr = delay;


__this_cpu_write(spinlock_delay, delay);


Thanks for that cleanup. I have applied it to my code.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Steven Rostedt

On Fri, Dec 21, 2012 at 06:56:13PM -0500, Rik van Riel wrote:
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 4e44840..e44c56f 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -113,19 +113,62 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
>  static bool smp_no_nmi_ipi = false;
>  
>  /*
> - * Wait on a congested ticket spinlock.
> + * Wait on a congested ticket spinlock. Many spinlocks are embedded in
> + * data structures; having many CPUs pounce on the cache line with the
> + * spinlock simultaneously can slow down the lock holder, and the system
> + * as a whole.
> + *
> + * To prevent total performance collapse in case of bad spinlock contention,
> + * perform proportional backoff. The per-cpu value of delay is automatically
> + * tuned to limit the number of times spinning CPUs poll the lock before
> + * obtaining it. This limits the amount of cross-CPU traffic required to 
> obtain
> + * a spinlock, and keeps system performance from dropping off a cliff.
> + *
> + * There is a tradeoff. If we poll too often, the whole system is slowed
> + * down. If we sleep too long, the lock will go unused for a period of
> + * time. Adjusting "delay" to poll, on average, 2.7 times before the
> + * lock is obtained seems to result in low bus traffic. The combination
> + * of aiming for a non-integer amount of average polls, and scaling the
> + * sleep period proportionally to how many CPUs are ahead of us in the
> + * queue for this ticket lock seems to reduce the amount of time spent
> + * "oversleeping" the release of the lock.
>   */
> +#define MIN_SPINLOCK_DELAY 1
> +#define MAX_SPINLOCK_DELAY 1000
> +DEFINE_PER_CPU(int, spinlock_delay) = { MIN_SPINLOCK_DELAY };
>  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
>  {
> + /*
> +  * Use the raw per-cpu pointer; preemption is disabled in the
> +  * spinlock code. This avoids put_cpu_var once we have the lock.
> +  */
> + int *delay_ptr = _cpu(spinlock_delay, smp_processor_id());
> + int delay = *delay_ptr;

I'm confused by the above comment. Why not just:

int delay = this_cpu_read(spinlock_delay);
?

> +
>   for (;;) {
> - int loops = 50 * (__ticket_t)(inc.tail - inc.head);
> + int loops = delay * (__ticket_t)(inc.tail - inc.head);
>   while (loops--)
>   cpu_relax();
>  
>   inc.head = ACCESS_ONCE(lock->tickets.head);
> - if (inc.head == inc.tail)
> + if (inc.head == inc.tail) {
> + /* Decrease the delay, since we may have overslept. */
> + if (delay > MIN_SPINLOCK_DELAY)
> + delay--;
>   break;
> + }
> +
> + /*
> +  * The lock is still busy, the delay was not long enough.
> +  * Going through here 2.7 times will, on average, cancel
> +  * out the decrement above. Using a non-integer number
> +  * gets rid of performance artifacts and reduces oversleeping.
> +  */
> + if (delay < MAX_SPINLOCK_DELAY &&
> + ((inc.head & 3) == 0 || (inc.head & 7) == 1))
> + delay++;
>   }
> + *delay_ptr = delay;

this_cpu_write(spinlock_delay, delay);

Too bad you posted this just before break. I currently have access to a
40 core box, and I would have loved to test this. But right now I have
it testing other things, and hopefully I'll still have access to it
after the break.

-- Steve

>  }
>  
>  /*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Eric Dumazet

On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:
> + int *delay_ptr = _cpu(spinlock_delay, smp_processor_id());
> + int delay = *delay_ptr;

int delay = __this_cpu_read(spinlock_delay);

>   }
> + *delay_ptr = delay;

__this_cpu_write(spinlock_delay, delay);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 24/27] x86: Add swiotlb force off support

2012-12-21 Thread H. Peter Anvin

On 12/21/2012 07:23 PM, Eric W. Biederman wrote:
> 
> In this case YH has been working on the case of loading a kernel
> completely above 4G, and apparently he has also been testing the case of
> running a kernel with no memory below 4G.
> 

It is worth noting that we cannot run with *no* memory below 4G -- it is
not possible to run SMP at least without some memory below the 1M mark.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 24/27] x86: Add swiotlb force off support

2012-12-21 Thread Eric W. Biederman

Konrad Rzeszutek Wilk  writes:

> On Fri, Dec 21, 2012 at 06:42:47PM -0800, Eric W. Biederman wrote:
>> Konrad Rzeszutek Wilk  writes:
>> 
>> > On Mon, Dec 17, 2012 at 11:15:56PM -0800, Yinghai Lu wrote:
>> >> So use could disable swiotlb from command line, even swiotlb support
>> >> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.
>> >
>> > You really need to spell out why this is useful.
>> 
>> YH why can't we safely autodetect that the swiotlb is unusable when
>> there is no memory below 4G free?
>
> I am not sure what 'YH' stands for (Yeah?).  

Yinghai Lu's nickname.

> However we could turn SWIOTLB off altogether if it cannot allocate
> _some_ memory. It could try first 64MB, then 32MB, lastly 16MB. And
> if all that fails - print a nice warning and continue on.
>
> Later in the late initialization phase, when pci_swiotlb_late_init
> is called - it can then figure out whether 'iommu' has been set
> and it iself was never able to allocate. At that point it can try
> the dynamic allocation (swiotlb_late_init_with_default_size)
> ... and if that fails give up and panic.

As far as I can tell panics should be avoided unless there is something
that actually needs an iommu, and the swiommu is the only option, and
the swiommu can not fulfill that request.

In this case YH has been working on the case of loading a kernel
completely above 4G, and apparently he has also been testing the case of
running a kernel with no memory below 4G.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/5] f2fs: remove unneeded initialization of nr_dirty in dirty_seglist_info

2012-12-21 Thread Namjae Jeon

From: Namjae Jeon 

Since, the memory for the object of dirty_seglist_info is allocated
using kzalloc - which returns zeroed out memory. So, there is no need
to initialize the nr_dirty values with zeroes.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/segment.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index ffacee4..b0ec996 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1573,7 +1573,6 @@ static int build_dirty_segmap(struct f2fs_sb_info *sbi)
 
for (i = 0; i < NR_DIRTY_TYPE; i++) {
dirty_i->dirty_segmap[i] = kzalloc(bitmap_size, GFP_KERNEL);
-   dirty_i->nr_dirty[i] = 0;
if (!dirty_i->dirty_segmap[i])
return -ENOMEM;
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 5/5] f2fs: remove unneeded variable from f2fs_sync_fs

2012-12-21 Thread Namjae Jeon

From: Namjae Jeon 

We can directly return '0' from the function, instead of introducing a
'ret' variable.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/super.c |3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 86f8549..39ba11a 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -119,7 +119,6 @@ static void f2fs_put_super(struct super_block *sb)
 int f2fs_sync_fs(struct super_block *sb, int sync)
 {
struct f2fs_sb_info *sbi = F2FS_SB(sb);
-   int ret = 0;
 
if (!sbi->s_dirty && !get_pages(sbi, F2FS_DIRTY_NODES))
return 0;
@@ -127,7 +126,7 @@ int f2fs_sync_fs(struct super_block *sb, int sync)
if (sync)
write_checkpoint(sbi, false, false);
 
-   return ret;
+   return 0;
 }
 
 static int f2fs_statfs(struct dentry *dentry, struct kstatfs *buf)
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 4/5] f2fs: fix fsync_inode list addition logic and avoid invalid access to memory

2012-12-21 Thread Namjae Jeon

From: Namjae Jeon 

In function find_fsync_dnodes() - the fsync inodes gets added to the list, but
in one path suppose f2fs_iget results in error, in such case - error gets added
to the fsync inode list.
In next call to recover_data()->get_fsync_inode()
entry = list_entry(this, struct fsync_inode_entry, list);
if (entry->inode->i_ino == ino)
This can result in "invalid access to memory" when it encounters 'error' as
entry in the fsync inode list.
So, add the fsync inode entry to the list only in case of no errors.
And, free the object at that point itself in case of issue.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/recovery.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 632e679..e602bfa 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -144,14 +144,15 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, 
struct list_head *head)
goto out;
}
 
-   INIT_LIST_HEAD(>list);
-   list_add_tail(>list, head);
-
entry->inode = f2fs_iget(sbi->sb, ino_of_node(page));
if (IS_ERR(entry->inode)) {
err = PTR_ERR(entry->inode);
+   kmem_cache_free(fsync_entry_slab, entry);
goto out;
}
+
+   INIT_LIST_HEAD(>list);
+   list_add_tail(>list, head);
entry->blkaddr = blkaddr;
}
if (IS_INODE(page)) {
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/5] f2fs: handle error from f2fs_iget_nowait

2012-12-21 Thread Namjae Jeon

From: Namjae Jeon 

In case f2fs_iget_nowait returns error, it results in truncate_hole being
called with 'error' value as inode pointer. There is no check in truncate_hole
for valid inode, so it could result in crash due "invalid access to memory".
Avoid this by handling error condition properly.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/recovery.c |3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index b07e9b6..632e679 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -228,6 +228,9 @@ static void check_index_in_prev_nodes(struct f2fs_sb_info 
*sbi,
 
/* Deallocate previous index in the node page */
inode = f2fs_iget_nowait(sbi->sb, ino);
+   if (IS_ERR(inode))
+   return;
+
truncate_hole(inode, bidx, bidx + 1);
iput(inode);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/3] x86,smp: proportional backoff for ticket spinlocks

2012-12-21 Thread Steven Rostedt

On Fri, Dec 21, 2012 at 10:07:56PM -0500, Steven Rostedt wrote:
> > diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> > index 20da354..4e44840 100644
> > --- a/arch/x86/kernel/smp.c
> > +++ b/arch/x86/kernel/smp.c
> > @@ -118,9 +118,11 @@ static bool smp_no_nmi_ipi = false;
> >  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
> >  {
> > for (;;) {
> > -   cpu_relax();
> > -   inc.head = ACCESS_ONCE(lock->tickets.head);
> > +   int loops = 50 * (__ticket_t)(inc.tail - inc.head);
> > +   while (loops--)
> > +   cpu_relax();
> 
> -ENOCOMMENT
> 
> Please add a comment above to explain what it's doing. Don't expect
> people to check change logs. Also, explain why you picked 50.
> 

OK, I replied here before reading patch 3 (still reviewing it). Why have
this patch at all? Just to test if you broke something between this and
patch 3? Or perhaps patch 3 may not get accepted? In that case, you
would still need a comment.

Either explicitly state that this patch is just a stepping stone for
patch 3, and will either be accepted or rejected along with patch 3. Or
keep it as a stand alone patch and add comments as such. Or just get rid
of it all together.

Thanks,

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 24/27] x86: Add swiotlb force off support

2012-12-21 Thread Konrad Rzeszutek Wilk

On Fri, Dec 21, 2012 at 06:42:47PM -0800, Eric W. Biederman wrote:
> Konrad Rzeszutek Wilk  writes:
> 
> > On Mon, Dec 17, 2012 at 11:15:56PM -0800, Yinghai Lu wrote:
> >> So use could disable swiotlb from command line, even swiotlb support
> >> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.
> >
> > You really need to spell out why this is useful.
> 
> YH why can't we safely autodetect that the swiotlb is unusable when
> there is no memory below 4G free?

I am not sure what 'YH' stands for (Yeah?).

However we could turn SWIOTLB off altogether if it cannot allocate
_some_ memory. It could try first 64MB, then 32MB, lastly 16MB. And
if all that fails - print a nice warning and continue on.

Later in the late initialization phase, when pci_swiotlb_late_init
is called - it can then figure out whether 'iommu' has been set
and it iself was never able to allocate. At that point it can try
the dynamic allocation (swiotlb_late_init_with_default_size)
... and if that fails give up and panic.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/5] f2fs: Introduce some information prints in the mount path

2012-12-21 Thread Namjae Jeon

From: Namjae Jeon 

Added few informative prints in the mount path, to convey proper error
in case of mount failure.

Signed-off-by: Namjae Jeon 
Signed-off-by: Amit Sahrawat 
---
 fs/f2fs/super.c |   46 --
 1 file changed, 36 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index cf0ffb8..86f8549 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -342,19 +342,29 @@ static int sanity_check_raw_super(struct f2fs_super_block 
*raw_super)
 {
unsigned int blocksize;
 
-   if (F2FS_SUPER_MAGIC != le32_to_cpu(raw_super->magic))
+   if (F2FS_SUPER_MAGIC != le32_to_cpu(raw_super->magic)) {
+   pr_info("Magic Mismatch, valid(0x%x) - read(0x%x)\n",
+   F2FS_SUPER_MAGIC, le32_to_cpu(raw_super->magic));
return 1;
+   }
 
/* Currently, support only 4KB block size */
blocksize = 1 << le32_to_cpu(raw_super->log_blocksize);
-   if (blocksize != PAGE_CACHE_SIZE)
+   if (blocksize != PAGE_CACHE_SIZE) {
+   pr_info("Not valid blocksize (%u), supports only 4KB\n",
+   blocksize);
return 1;
+   }
if (le32_to_cpu(raw_super->log_sectorsize) !=
-   F2FS_LOG_SECTOR_SIZE)
+   F2FS_LOG_SECTOR_SIZE) {
+   pr_info("Not valid log sectorsize\n");
return 1;
+   }
if (le32_to_cpu(raw_super->log_sectors_per_block) !=
-   F2FS_LOG_SECTORS_PER_BLOCK)
+   F2FS_LOG_SECTORS_PER_BLOCK) {
+   pr_info("Not valid log sectors per block\n");
return 1;
+   }
return 0;
 }
 
@@ -415,13 +425,16 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
return -ENOMEM;
 
/* set a temporary block size */
-   if (!sb_set_blocksize(sb, F2FS_BLKSIZE))
+   if (!sb_set_blocksize(sb, F2FS_BLKSIZE)) {
+   pr_err("unable to set blocksize\n");
goto free_sbi;
+   }
 
/* read f2fs raw super block */
raw_super_buf = sb_bread(sb, 0);
if (!raw_super_buf) {
err = -EIO;
+   pr_err("unable to read superblock\n");
goto free_sbi;
}
raw_super = (struct f2fs_super_block *)
@@ -443,8 +456,10 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
goto free_sb_buf;
 
/* sanity checking of raw super */
-   if (sanity_check_raw_super(raw_super))
+   if (sanity_check_raw_super(raw_super)) {
+   pr_err("Not a valid F2FS filesystem\n");
goto free_sb_buf;
+   }
 
sb->s_maxbytes = max_file_size(le32_to_cpu(raw_super->log_blocksize));
sb->s_max_links = F2FS_LINK_MAX;
@@ -478,18 +493,23 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
/* get an inode for meta space */
sbi->meta_inode = f2fs_iget(sb, F2FS_META_INO(sbi));
if (IS_ERR(sbi->meta_inode)) {
+   pr_err("Failed to read F2FS meta data inode\n");
err = PTR_ERR(sbi->meta_inode);
goto free_sb_buf;
}
 
err = get_valid_checkpoint(sbi);
-   if (err)
+   if (err) {
+   pr_err("Failed to get valid F2FS checkpoint\n");
goto free_meta_inode;
+   }
 
/* sanity checking of checkpoint */
err = -EINVAL;
-   if (sanity_check_ckpt(raw_super, sbi->ckpt))
+   if (sanity_check_ckpt(raw_super, sbi->ckpt)) {
+   pr_err("Not a valid F2FS checkpoint\n");
goto free_cp;
+   }
 
sbi->total_valid_node_count =
le32_to_cpu(sbi->ckpt->valid_node_count);
@@ -511,17 +531,22 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
 
/* setup f2fs internal modules */
err = build_segment_manager(sbi);
-   if (err)
+   if (err) {
+   pr_err("Failed to initialize F2FS segment manager\n");
goto free_sm;
+   }
err = build_node_manager(sbi);
-   if (err)
+   if (err) {
+   pr_err("Failed to initialize F2FS node manager\n");
goto free_nm;
+   }
 
build_gc_manager(sbi);
 
/* get an inode for node space */
sbi->node_inode = f2fs_iget(sb, F2FS_NODE_INO(sbi));
if (IS_ERR(sbi->node_inode)) {
+   pr_err("Failed to read node inode\n");
err = PTR_ERR(sbi->node_inode);
goto free_nm;
}
@@ -534,6 +559,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
/* read root inode and dentry */
root = f2fs_iget(sb, F2FS_ROOT_INO(sbi));
if (IS_ERR(root)) {
+

Re: [PATCH v2 1/4] ODROID-X: dts: Add board dts file for ODROID-X

2012-12-21 Thread Dongjin Kim

Hi Kukjin,

Thank you for your review.

I was confused a little bit about bootargs what is better removing or
including it in the board file, obviously I saw error message failing
mount the root file system if I remove bootargs from board file and
with default bootargs from u-boot.

What I thought is passing bootargs from boot-loader would be better to
give more options like boot device and partition setup to the
developers. If board file has fixed bootargs, then DTS file has to be
updated whenever boot partition is changed or arguments need to be
changed. And bootargs and other boot configuration could be set up
from u-boot itself with certain script as well, I believe that this is
much easier and comfortable rather than changing DTS file. We also can
start the hardware with initramfs with default bootargs like
"boot=/dev/ram0 rw ramdisk=8192..." and mount the expected boot
partition for rootfs, anyhow real boot media has to be specified in
bootargs as well.

Thanks and any advice will be welcomed a lot.

Best regrds,
Dongjin.

On Sat, Dec 22, 2012 at 3:13 AM, Kukjin Kim  wrote:
> Dongjin Kim wrote:
>>
>> Add initial dtb file for Hardkernel's ODROID-X board based on EXYNOS4412
>> SoC.
>>
>> Signed-off-by: Dongjin Kim 
>> ---
>>  arch/arm/boot/dts/Makefile   |1 +
>>  arch/arm/boot/dts/exynos4412-odroidx.dts |   47
>> ++
>>  2 files changed, 48 insertions(+)
>>  create mode 100644 arch/arm/boot/dts/exynos4412-odroidx.dts
>>
>> diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
>> index d077ef8..364d67b 100644
>> --- a/arch/arm/boot/dts/Makefile
>> +++ b/arch/arm/boot/dts/Makefile
>> @@ -44,6 +44,7 @@ dtb-$(CONFIG_ARCH_EXYNOS) += exynos4210-
>> origen.dtb \
>>   exynos4210-trats.dtb \
>>   exynos5250-smdk5250.dtb \
>>   exynos5440-ssdk5440.dtb \
>> + exynos4412-odroidx.dtb \
>>   exynos4412-smdk4412.dtb \
>>   exynos5250-smdk5250.dtb \
>>   exynos5250-snow.dtb
>
> Just now, I sorted out the ordering alphabetically and submitted.
>
>> diff --git a/arch/arm/boot/dts/exynos4412-odroidx.dts
>> b/arch/arm/boot/dts/exynos4412-odroidx.dts
>> new file mode 100644
>> index 000..323ed177
>> --- /dev/null
>> +++ b/arch/arm/boot/dts/exynos4412-odroidx.dts
>> @@ -0,0 +1,47 @@
>> +/*
>> + * Hardkernel's Exynos4412 based ODROID-X board device tree source
>> + *
>> + * Copyright (c) 2012 Dongjin Kim 
>> + *
>> + * Device tree source file for Hardkernel's ODROID-X board which is based
>> on
>> + * Samsung's Exynos4412 SoC.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> +*/
>> +
>> +/dts-v1/;
>> +/include/ "exynos4412.dtsi"
>> +
>> +/ {
>> + model = "Hardkernel ODROID-X board based on Exynos4412";
>> + compatible = "hardkernel,exynos4412", "samsung,exynos4412";
>
> Probably,
>
> +   compatible = "hardkernel,odroid-x", "samsung,exynos4412";
>
>> +
>> + memory {
>> + reg = <0x4000 0x4000>;
>
> If you can't see any error message in kernel boot, boot-loader should inform
> the memory size and bank information to the kernel. So should be separated
> with bank size, 256MiB. But I need to think again as Olof said in other
> thread.
>
>> + };
>> +
>> + serial@1380 {
>> + status = "okay";
>> + };
>> +
>> + serial@1381 {
>> + status = "okay";
>> + };
>> +
>> + serial@1382 {
>> + status = "okay";
>> + };
>> +
>> + serial@1383 {
>> + status = "okay";
>> + };
>> +
>> + sdhci@1253 {
>> + bus-width = <4>;
>> + pinctrl-0 = <_clk _cmd _cd _bus4>;
>> + pinctrl-names = "default";
>> + status = "okay";
>> + };
>> +};
>> --
>> 1.7.9.5
>
> BTW, you don't need any bootargs as default?
>
> - Kukjin
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 2/3] x86,smp: proportional backoff for ticket spinlocks

2012-12-21 Thread Steven Rostedt

On Fri, Dec 21, 2012 at 06:51:15PM -0500, Rik van Riel wrote:
> Subject: x86,smp: proportional backoff for ticket spinlocks
> 
> Simple fixed value proportional backoff for ticket spinlocks.
> By pounding on the cacheline with the spin lock less often,
> bus traffic is reduced. In cases of a data structure with
> embedded spinlock, the lock holder has a better chance of
> making progress.
> 
> Signed-off-by: Rik van Riel 
> ---
>  arch/x86/kernel/smp.c |6 --
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 20da354..4e44840 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -118,9 +118,11 @@ static bool smp_no_nmi_ipi = false;
>  void ticket_spin_lock_wait(arch_spinlock_t *lock, struct __raw_tickets inc)
>  {
>   for (;;) {
> - cpu_relax();
> - inc.head = ACCESS_ONCE(lock->tickets.head);
> + int loops = 50 * (__ticket_t)(inc.tail - inc.head);
> + while (loops--)
> + cpu_relax();

-ENOCOMMENT

Please add a comment above to explain what it's doing. Don't expect
people to check change logs. Also, explain why you picked 50.

-- Steve

>  
> + inc.head = ACCESS_ONCE(lock->tickets.head);
>   if (inc.head == inc.tail)
>   break;
>   }
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 23/25] video/exynos: don't use [delayed_]work_pending()

2012-12-21 Thread Kukjin Kim

Tejun Heo wrote:
> 
> There's no need to test whether a (delayed) work item in pending
> before queueing, flushing or cancelling it.  Most uses are unnecessary
> and quite a few of them are buggy.
> 
> Remove unnecessary pending tests from exynos_dp_core.  Only compile
> tested.
> 
> Signed-off-by: Tejun Heo 
> Cc: Kukjin Kim 

Acked-by: Kukjin Kim 

> ---
> Please let me know how this patch should be routed.  I can take it
> through the workqueue tree if necessary.
> 
Cc'ed Jingoo and Florian.

Now the exynos dp driver are being handled by Jingoo so let's waiting for
his opinion to take this by himself or not. 

- Kukjin Kim

> Thanks.
> 
>  drivers/video/exynos/exynos_dp_core.c | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/video/exynos/exynos_dp_core.c
> b/drivers/video/exynos/exynos_dp_core.c
> index 28fd686..3002a6a 100644
> --- a/drivers/video/exynos/exynos_dp_core.c
> +++ b/drivers/video/exynos/exynos_dp_core.c
> @@ -1121,8 +1121,7 @@ static int __devexit exynos_dp_remove(struct
> platform_device *pdev)
> 
>   disable_irq(dp->irq);
> 
> - if (work_pending(>hotplug_work))
> - flush_work(>hotplug_work);
> + flush_work(>hotplug_work);
> 
>   if (pdev->dev.of_node) {
>   if (dp->phy_addr)
> @@ -1144,8 +1143,7 @@ static int exynos_dp_suspend(struct device *dev)
>   struct exynos_dp_platdata *pdata = dev->platform_data;
>   struct exynos_dp_device *dp = dev_get_drvdata(dev);
> 
> - if (work_pending(>hotplug_work))
> - flush_work(>hotplug_work);
> + flush_work(>hotplug_work);
> 
>   if (dev->of_node) {
>   if (dp->phy_addr)
> --
> 1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/3] x86,smp: move waiting on contended lock out of line

2012-12-21 Thread Steven Rostedt

On Fri, Dec 21, 2012 at 06:50:38PM -0500, Rik van Riel wrote:
> Subject: x86,smp: move waiting on contended ticket lock out of line
> 
> Moving the wait loop for congested loops to its own function allows
> us to add things to that wait loop, without growing the size of the
> kernel text appreciably.
> 
> Signed-off-by: Rik van Riel 

Reviewed-by: Steven Rostedt 

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G

2012-12-21 Thread H. Peter Anvin

Earlyprintk please?

Konrad Rzeszutek Wilk  wrote:

>On Mon, Dec 17, 2012 at 11:15:32PM -0800, Yinghai Lu wrote:
>> Now we have limit kdump reseved under 896M, because kexec has the
>limitation.
>> and also bzImage need to stay under 4g.
>> 
>> To make kexec/kdump could use range above 4g, we need to make bzImage
>and
>> ramdisk could be loaded above 4g.
>> During booting bzImage will be unpacked on same postion and stay
>high.
>> 
>> The patches add fields in setup_header and boot_params to
>> 1. get info about ramdisk position info above 4g from
>bootloader/kexec
>> 2. get info about cmd_line_ptr info above 4g from bootloader/kexec
>> 3. set xloadflags bit0 in header for bzImage and bootloader/kexec
>load
>>could check that to decide if it could to put bzImage high.
>> 4. use sentinel to make sure ext_* fields in boot_params could be
>used.
>> 
>> This patches is tested with kexec tools with local changes and they
>are sent
>> to kexec list later.
>> 
>> could be found at:
>> 
>>
>git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>for-x86-boot
>
>Did a light test and it looks to work under Xen - thought I had not
>tested
>any various configuration of memory layouts. 
>
>More worryingly it blew up under native under an Dell T105 AMD box with
>4GB of memory.
>I can't get it even to print anything on the serial log:
>
>(this is an excerpt from pxelinux.cfg/C0A8 file)
>LABEL BAREMETAL
>   KERNEL vmlinuz
>APPEND initrd=initramfs.cpio.gz debug selinux=0  loglevel=10 apic=debug
>console=uart8250,115200n8
>
>
>PXELINUX 3.82 2009-06-09  Copyright (C) 1994-2009 H. Peter Anvin et al
>Loading
>vmlinuz...
>Loading
>initramfs.cpio.gz...!
 .
ready.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Linux 3.8-rc1

2012-12-21 Thread Steven Rostedt

On Fri, Dec 21, 2012 at 06:00:58PM -0800, Linus Torvalds wrote:
> The longest night of the year is upon us (*), and what better thing to
> do than get yourself some nice mulled wine, sit back, relax, and play
> with the most recent rc kernel?
> 
> This has been a big merge window: we've got more commits than any
> other kernel in the v3.x kernel series (although v3.2-rc1 was *almost*
> as big). It's been a rather busy merge window, in other words.
> 
> The diffstat looks normal: about 63% of the patch being to drivers
> (staging, networking, scsi, gpu, sound, drbd etc) , 18% architecture
> updates (with various ARM platform things being the bulk of it as
> usual, sigh), and the rest being "various", like core networking,
> filesystems (new f2fs flash-optimized filesystem) and include files
> etc.
> 
> I'm appending the "merge shortlog" which is about the only half-way
> readable automated data I can give you. There's a *ton* of stuff here.
> Go out and test it,

Hi Linus,

Was there anything wrong with Frederic's printk patches? They are needed
as one of the steps to acheive tickless for a CPU running a single task.

 https://lkml.org/lkml/2012/12/17/177

Maybe his "RFC" threw you for a loop. It was RFC as, if you're OK with
it, please pull it, otherwise please comment to what you think is
wrong with it.

It's been in linux-next for a while without anyone complaining about it.
I replied to his patch with the diff against your tree as Frederic left
that out.

Thanks,

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Rik van Riel


On 12/21/2012 07:18 PM, Eric Dumazet wrote:

On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:

Argh, the first one had a typo in it that did not influence
performance with fewer threads running, but that made things
worse with more than a dozen threads...



+
+   /*
+* The lock is still busy, the delay was not long enough.
+* Going through here 2.7 times will, on average, cancel
+* out the decrement above. Using a non-integer number
+* gets rid of performance artifacts and reduces oversleeping.
+*/
+   if (delay < MAX_SPINLOCK_DELAY &&
+   ((inc.head & 3) == 0 || (inc.head & 7) == 1))
+   delay++;


((inc.head & 3) == 0 || (inc.head & 7) == 1)) seems a strange condition
to me...


It is. It turned out that doing the increment
every 4 times (just the first check) resulted
in odd performance artifacts when running with
4, 8, 12 or 16 CPUs.

Moving to the above got rid of the performance
artifact.

It also results in aiming for a sleep period
that is not an exact multiple of the lock
acquiring period, which results in less
"oversleeping", and measurably better
performance.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3 -v2] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Rik van Riel


On 12/21/2012 07:48 PM, Eric Dumazet wrote:

On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:

Argh, the first one had a typo in it that did not influence
performance with fewer threads running, but that made things
worse with more than a dozen threads...

Please let me know if you can break these patches.
---8<---
Subject: x86,smp: auto tune spinlock backoff delay factor



+#define MIN_SPINLOCK_DELAY 1
+#define MAX_SPINLOCK_DELAY 1000
+DEFINE_PER_CPU(int, spinlock_delay) = { MIN_SPINLOCK_DELAY };


Using a single spinlock_delay per cpu assumes there is a single
contended spinlock on the machine, or that contended
spinlocks protect the same critical section.


The goal is to reduce bus traffic, and keep total
system performance from falling through the floor.

If we have one lock that takes N cycles to acquire,
and a second contended lock that takes N*2 cycles
to acquire, checking the first lock fewer times
before acquisition, and the second lock more times,
should still result in similar average system
throughput.

I suspect this approach should work well if we have
multiple contended locks in the system.


Given that we probably know where the contended spinlocks are, couldnt
we use a real scalable implementation for them ?


The scalable locks tend to have a slightly more
complex locking API, resulting in a slightly
higher overhead in the non-contended (normal)
case.  That means we cannot use them everywhere.

Also, scalable locks merely make sure that N+1
CPUs perform the same as N CPUs when there is
lock contention.  They do not cause the system
to actually scale.

For actual scalability, the data structure would
need to be changed, so locking requirements are
better.


A known contended one is the Qdisc lock in network layer. We added a
second lock (busylock) to lower a bit the pressure on a separate cache
line, but a scalable lock would be much better...


My locking patches are meant for dealing with the
offenders we do not know about, to make sure that
system performance does not fall off a cliff when
we run into a surprise.

Known scalability bugs we can fix.

Unknown ones should not cause somebody's system
to fail.


I guess there are patent issues...


At least one of the scalable lock implementations has been
known since 1991, so there should not be any patent issues
with that one.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: epoll with ONESHOT possibly fails to deliver events

2012-12-21 Thread Eric Wong

"Junchang(Jason) Wang"  wrote:
> We still believe this is a bug in epoll system even though we can't
> prove that so far. Both Andi and I are very interested in this problem
> and helping you experts solve this it. Just let us know if we can
> help.

I'm just another epoll user, definitely not an expert.  Hopefully
somebody else can figure this out, because I'm unable to reproduce the
problem with your code and I haven't spotted any bugs from reading
through the kernel.

Curious, I also have a multi-threaded HTTP server which is a little
similar (multi-threaded, 2 epoll descriptors (only one epoll is heavily
used).  I run it on 2/4-core systems and haven't hit issues with epoll.

If you want to test, it should be easy to build from tarball:

  http://bogomips.org/cmogstored/files/cmogstored-1.0.0.tar.gz
  configure && make
  ./cmogstored --httplisten=8080 --docroot=/path/to/whatever

More info here: http://bogomips.org/cmogstored/README
git clone http://bogomips.org/cmogstored.git
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3] x86,smp: auto tune spinlock backoff delay factor

2012-12-21 Thread Rik van Riel


On 12/21/2012 07:47 PM, David Daney wrote:


+#define MIN_SPINLOCK_DELAY 1
+#define MAX_SPINLOCK_DELAY 1000
+DEFINE_PER_CPU(int, spinlock_delay) = { MIN_SPINLOCK_DELAY };



This gives the same delay for all locks in the system, but the amount of
work done under each lock is different.  So, for any given lock, the
delay is not optimal.

This is an untested idea that came to me after looking at this:

o Assume that for any given lock, the optimal delay is the same for all
CPUs in the system.

>

o Store a per-lock delay value in arch_spinlock_t.

o Once a CPU owns the lock it can update the delay as you do for the
per_cpu version.  Tuning the delay on fewer of the locking operations
reduces bus traffic, but makes it converge more slowly.

o Bonus points if you can update the delay as part of the releasing store.


It would absolutely have to be part of the same load and
store cycle, otherwise we would increase bus traffic and
defeat the purpose.

However, since spinlock contention should not be the
usual state, and all a scalable lock does is make sure
that N+1 CPUs does not perform worse than N CPUs, using
scalable locks is a stop-gap measure.

I believe a stop-gap measure should be kept as simple as
we can. I am willing to consider moving to a per-lock
delay factor if we can figure out an easy way to do it,
but I would like to avoid too much extra complexity...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 24/27] x86: Add swiotlb force off support

2012-12-21 Thread Eric W. Biederman

Konrad Rzeszutek Wilk  writes:

> On Mon, Dec 17, 2012 at 11:15:56PM -0800, Yinghai Lu wrote:
>> So use could disable swiotlb from command line, even swiotlb support
>> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.
>
> You really need to spell out why this is useful.

YH why can't we safely autodetect that the swiotlb is unusable when
there is no memory below 4G free?

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 00/27] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G

2012-12-21 Thread Konrad Rzeszutek Wilk

On Mon, Dec 17, 2012 at 11:15:32PM -0800, Yinghai Lu wrote:
> Now we have limit kdump reseved under 896M, because kexec has the limitation.
> and also bzImage need to stay under 4g.
> 
> To make kexec/kdump could use range above 4g, we need to make bzImage and
> ramdisk could be loaded above 4g.
> During booting bzImage will be unpacked on same postion and stay high.
> 
> The patches add fields in setup_header and boot_params to
> 1. get info about ramdisk position info above 4g from bootloader/kexec
> 2. get info about cmd_line_ptr info above 4g from bootloader/kexec
> 3. set xloadflags bit0 in header for bzImage and bootloader/kexec load
>could check that to decide if it could to put bzImage high.
> 4. use sentinel to make sure ext_* fields in boot_params could be used.
> 
> This patches is tested with kexec tools with local changes and they are sent
> to kexec list later.
> 
> could be found at:
> 
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git 
> for-x86-boot

Did a light test and it looks to work under Xen - thought I had not tested
any various configuration of memory layouts. 

More worryingly it blew up under native under an Dell T105 AMD box with 4GB of 
memory.
I can't get it even to print anything on the serial log:

(this is an excerpt from pxelinux.cfg/C0A8 file)
LABEL BAREMETAL
   KERNEL vmlinuz
   APPEND initrd=initramfs.cpio.gz debug selinux=0  loglevel=10 apic=debug 
console=uart8250,115200n8


PXELINUX 3.82 2009-06-09  Copyright (C) 1994-2009 H. Peter Anvin et al
Loading 
vmlinuz...
Loading 
initramfs.cpio.gz.ready.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v7 24/27] x86: Add swiotlb force off support

2012-12-21 Thread Konrad Rzeszutek Wilk

On Mon, Dec 17, 2012 at 11:15:56PM -0800, Yinghai Lu wrote:
> So use could disable swiotlb from command line, even swiotlb support
> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.

You really need to spell out why this is useful.

> 
> Signed-off-by: Yinghai Lu 
> ---
>  Documentation/kernel-parameters.txt |7 +++
>  arch/x86/kernel/pci-swiotlb.c   |   10 +-
>  drivers/iommu/amd_iommu.c   |1 +
>  include/linux/swiotlb.h |1 +
>  lib/swiotlb.c   |5 -
>  5 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index ea8e5b4..2b37020 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2835,6 +2835,13 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>  
>   swiotlb=[IA-64] Number of I/O TLB slabs
>  
> + swiotlb=[force|off|on] [KNL] disable or enable swiotlb.
> + force
> + on
> + Enable swiotlb.
> + off
> + Disable swiotlb.
> +
>   switches=   [HW,M68k]
>  
>   sysfs.deprecated=0|1 [KNL]
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index 6f93eb7..80afd3b 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -58,12 +58,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
>   */
>  int __init pci_swiotlb_detect_override(void)
>  {
> - int use_swiotlb = swiotlb | swiotlb_force;
> -
>   if (swiotlb_force)
>   swiotlb = 1;
> + else if (swiotlb_force_off)
> + swiotlb = 0;
>  
> - return use_swiotlb;
> + return swiotlb;
>  }
>  IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
> pci_xen_swiotlb_detect,
> @@ -76,9 +76,9 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
>   */
>  int __init pci_swiotlb_detect_4gb(void)
>  {
> - /* don't initialize swiotlb if iommu=off (no_iommu=1) */
> + /* don't initialize swiotlb if iommu=off (no_iommu=1) or force off */
>  #ifdef CONFIG_X86_64
> - if (!no_iommu && max_pfn > MAX_DMA32_PFN)
> + if (!no_iommu && !swiotlb_force_off && max_pfn > MAX_DMA32_PFN)
>   swiotlb = 1;
>  #endif
>   return swiotlb;
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 55074cb..4f370d3 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -3082,6 +3082,7 @@ int __init amd_iommu_init_dma_ops(void)
>   unhandled = device_dma_ops_init();
>   if (unhandled && max_pfn > MAX_DMA32_PFN) {
>   /* There are unhandled devices - initialize swiotlb for them */
> + WARN(swiotlb_force_off, "Please remove swiotlb=off\n");
>   swiotlb = 1;
>   }
>  
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 1d2506f..dc43968 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -8,6 +8,7 @@ struct dma_attrs;
>  struct scatterlist;
>  
>  extern int swiotlb_force;
> +extern int swiotlb_force_off;

I think the usage of the swiotlb_force and making it a flag is a better way.
>  
>  /*
>   * Maximum allowable number of contiguous slabs to map,
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 958322e..3a0ec46 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -51,6 +51,7 @@
>  #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
>  
>  int swiotlb_force;
> +int swiotlb_force_off;
>  
>  /*
>   * Used to do a quick range check in swiotlb_tbl_unmap_single and
> @@ -102,8 +103,10 @@ setup_io_tlb_npages(char *str)
>   }
>   if (*str == ',')
>   ++str;
> - if (!strcmp(str, "force"))
> + if (!strcmp(str, "force") || !strcmp(str, "on"))
>   swiotlb_force = 1;
> + if (!strcmp(str, "off"))
> + swiotlb_force_off = 1;
>  
>   return 1;
>  }
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/2] drivers: infiniband: hw: cxgb4: fix cast warning

2012-12-21 Thread Stefan Hasko

Fixed compile warning cast to pointer from integer of different size

Signed-off-by: Stefan Hasko 
---
 drivers/infiniband/hw/cxgb4/device.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/cxgb4/device.c 
b/drivers/infiniband/hw/cxgb4/device.c
index cb4ecd7..314ec4a 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -413,7 +413,7 @@ static int c4iw_rdev_open(struct c4iw_rdev *rdev)
PDBG("udb len 0x%x udb base %p db_reg %p gts_reg %p qpshift %lu "
 "qpmask 0x%x cqshift %lu cqmask 0x%x\n",
 (unsigned)pci_resource_len(rdev->lldi.pdev, 2),
-(void *)pci_resource_start(rdev->lldi.pdev, 2),
+(void *)(unsigned long)pci_resource_start(rdev->lldi.pdev, 2),
 rdev->lldi.db_reg,
 rdev->lldi.gts_reg,
 rdev->qpshift, rdev->qpmask,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 03/27] x86, boot: move verify_cpu.S and no_longmode after 0x200

2012-12-21 Thread Konrad Rzeszutek Wilk

On Wed, Dec 19, 2012 at 01:58:57PM -0800, Yinghai Lu wrote:
> On Wed, Dec 19, 2012 at 12:57 PM, Borislav Petkov  wrote:
> > On Tue, Dec 18, 2012 at 07:44:55PM -0800, Yinghai Lu wrote:
> >
> > So this explains what you're doing but I'd like to know why?
> >
> > Why do you need to free some more room between startup_32 and
> > startup_64? Do you need this room in another patch, maybe the next one:
> >
> > "[PATCH v7 14/27] x86, boot: Move lldt/ltr out of 64bit code section"
> >
> > Is that so? If yes, please write that in the commit message so that we
> > know why you're doing that change.
> 
> duplicate next patch commit log here. no, that's too long.

Why is that a problem? Long patch commit logs are OK.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 25/25] ipc: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

Hello, Andrew.

On Fri, Dec 21, 2012 at 06:15:23PM -0800, Andrew Morton wrote:
> On Fri, 21 Dec 2012 17:57:15 -0800 Tejun Heo  wrote:
> 
> > There's no need to test whether a (delayed) work item in pending
> > before queueing, flushing or cancelling it.  Most uses are unnecessary
> > and quite a few of them are buggy.
> 
> > -   if (!work_pending(_memory_wq))
> > -   schedule_work(_memory_wq);
> > +   schedule_work(_memory_wq);
> 
> Well, the new code is a ton slower than the old code if the work is
> frequently pending, so some care is needed with such a conversion.

Yeah, I mentioned it in the head message.  it comes down to
test_and_set_bit() vs. test_bit() and none of the current users seems
to be hot enough for that to matter at all.

In very hot paths, such optimization *could* be valid.  The problem is
that [delayed_]work_pending() seem to be abused much more than they
are put to any actual usefulness.  Maybe we should rename them to
something really ugly.  I don't know.

> That's not an issue for the IPC callsite - memory offlining isn't
> frequent.
> 
> > ...
> >
> > Please let me know how this patch should be routed.  I can take it
> > through the workqueue tree if necessary.
> > 
> 
> Please merge this one yourself.

Can I add your acked-by?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 23/27] x86: Don't panic if can not alloc buffer for swiotlb

2012-12-21 Thread Konrad Rzeszutek Wilk

On Thu, Dec 13, 2012 at 02:02:17PM -0800, Yinghai Lu wrote:
> Normal boot path on system with iommu support:
> swiotlb buffer will be allocated early at first and then try to initialize
> iommu, if iommu for intel or amd could setup properly, swiotlb buffer
> will be freed.
> 
> The early allocating is with bootmem, and could panic when we try to use
> kdump with buffer above 4G only.
> 
> Replace the panic with WARN, and the kernel can go on without swiotlb,
> and could iommu later.


What if SWIOTLB is the only option? Meaning there are no other IOMMUs?


> 
> Signed-off-by: Yinghai Lu 
> ---
>  arch/x86/kernel/pci-swiotlb.c |5 -
>  include/linux/swiotlb.h   |2 +-
>  lib/swiotlb.c |   17 +++--
>  3 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index 6c483ba..6f93eb7 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -91,7 +91,10 @@ IOMMU_INIT(pci_swiotlb_detect_4gb,
>  void __init pci_swiotlb_init(void)
>  {
>   if (swiotlb) {
> - swiotlb_init(0);
> + if (swiotlb_init(0)) {
> + swiotlb = 0;
> + return;
> + }
>   dma_ops = _dma_ops;
>   }
>  }
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 8d08b3e..f7535d1 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -22,7 +22,7 @@ extern int swiotlb_force;
>   */
>  #define IO_TLB_SHIFT 11
>  
> -extern void swiotlb_init(int verbose);
> +int swiotlb_init(int verbose);
>  extern void swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
> verbose);
>  extern unsigned long swiotlb_nr_tbl(void);
>  extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index f114bf6..6b99ea7 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -170,7 +170,7 @@ void __init swiotlb_init_with_tbl(char *tlb, unsigned 
> long nslabs, int verbose)
>   * Statically reserve bounce buffer space and initialize bounce buffer data
>   * structures for the software IO TLB used to implement the DMA API.
>   */
> -static void __init
> +static int __init
>  swiotlb_init_with_default_size(size_t default_size, int verbose)
>  {
>   unsigned long bytes;
> @@ -185,17 +185,22 @@ swiotlb_init_with_default_size(size_t default_size, int 
> verbose)
>   /*
>* Get IO TLB memory from the low pages
>*/
> - io_tlb_start = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
> - if (!io_tlb_start)
> - panic("Cannot allocate SWIOTLB buffer");
> + io_tlb_start = alloc_bootmem_low_pages_nopanic(PAGE_ALIGN(bytes));
> + if (!io_tlb_start) {
> + WARN(1, "Cannot allocate SWIOTLB buffer");
> + return -1;
> + }
>  
>   swiotlb_init_with_tbl(io_tlb_start, io_tlb_nslabs, verbose);
> +
> + return 0;
>  }
>  
> -void __init
> +int __init
>  swiotlb_init(int verbose)
>  {
> - swiotlb_init_with_default_size(64 * (1<<20), verbose);  /* default to 
> 64MB */
> + /* default to 64MB */
> + return swiotlb_init_with_default_size(64 * (1<<20), verbose);
>  }
>  
>  /*
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v6 24/27] x86: Add swiotlb force off support

2012-12-21 Thread Konrad Rzeszutek Wilk

On Thu, Dec 13, 2012 at 02:02:18PM -0800, Yinghai Lu wrote:
> So use could disable swiotlb from command line, even swiotlb support
> is compiled in.  Just like we have intel_iommu=on and intel_iommu=off.

Does this have any usage besides testing?

And also pls in the future use scripts/get_maintainer.pl so
that you can extract from the email of the maintainer (which would be me).

> 
> Signed-off-by: Yinghai Lu 
> ---
>  Documentation/kernel-parameters.txt |7 +++
>  arch/x86/kernel/pci-swiotlb.c   |   10 +-
>  drivers/iommu/amd_iommu.c   |1 +
>  include/linux/swiotlb.h |1 +
>  lib/swiotlb.c   |5 -
>  5 files changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/kernel-parameters.txt 
> b/Documentation/kernel-parameters.txt
> index 20e248c..08b4c9d 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -2832,6 +2832,13 @@ bytes respectively. Such letter suffixes can also be 
> entirely omitted.
>  
>   swiotlb=[IA-64] Number of I/O TLB slabs
>  
> + swiotlb=[force|off|on] [KNL] disable or enable swiotlb.
> + force
> + on
> + Enable swiotlb.
> + off
> + Disable swiotlb.
> +
>   switches=   [HW,M68k]
>  
>   sysfs.deprecated=0|1 [KNL]
> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
> index 6f93eb7..80afd3b 100644
> --- a/arch/x86/kernel/pci-swiotlb.c
> +++ b/arch/x86/kernel/pci-swiotlb.c
> @@ -58,12 +58,12 @@ static struct dma_map_ops swiotlb_dma_ops = {
>   */
>  int __init pci_swiotlb_detect_override(void)
>  {
> - int use_swiotlb = swiotlb | swiotlb_force;
> -
>   if (swiotlb_force)
>   swiotlb = 1;
> + else if (swiotlb_force_off)
> + swiotlb = 0;
>  
> - return use_swiotlb;
> + return swiotlb;
>  }
>  IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
> pci_xen_swiotlb_detect,
> @@ -76,9 +76,9 @@ IOMMU_INIT_FINISH(pci_swiotlb_detect_override,
>   */
>  int __init pci_swiotlb_detect_4gb(void)
>  {
> - /* don't initialize swiotlb if iommu=off (no_iommu=1) */
> + /* don't initialize swiotlb if iommu=off (no_iommu=1) or force off */
>  #ifdef CONFIG_X86_64
> - if (!no_iommu && max_pfn > MAX_DMA32_PFN)
> + if (!no_iommu && !swiotlb_force_off && max_pfn > MAX_DMA32_PFN)
>   swiotlb = 1;
>  #endif
>   return swiotlb;
> diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
> index 55074cb..4f370d3 100644
> --- a/drivers/iommu/amd_iommu.c
> +++ b/drivers/iommu/amd_iommu.c
> @@ -3082,6 +3082,7 @@ int __init amd_iommu_init_dma_ops(void)
>   unhandled = device_dma_ops_init();
>   if (unhandled && max_pfn > MAX_DMA32_PFN) {
>   /* There are unhandled devices - initialize swiotlb for them */
> + WARN(swiotlb_force_off, "Please remove swiotlb=off\n");
>   swiotlb = 1;
>   }
>  
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index f7535d1..dd7cf65 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -8,6 +8,7 @@ struct dma_attrs;
>  struct scatterlist;
>  
>  extern int swiotlb_force;
> +extern int swiotlb_force_off;
>  
>  /*
>   * Maximum allowable number of contiguous slabs to map,
> diff --git a/lib/swiotlb.c b/lib/swiotlb.c
> index 6b99ea7..3f51b2c 100644
> --- a/lib/swiotlb.c
> +++ b/lib/swiotlb.c
> @@ -51,6 +51,7 @@
>  #define IO_TLB_MIN_SLABS ((1<<20) >> IO_TLB_SHIFT)
>  
>  int swiotlb_force;
> +int swiotlb_force_off;
>  
>  /*
>   * Used to do a quick range check in swiotlb_tbl_unmap_single and
> @@ -102,8 +103,10 @@ setup_io_tlb_npages(char *str)
>   }
>   if (*str == ',')
>   ++str;
> - if (!strcmp(str, "force"))
> + if (!strcmp(str, "force") || !strcmp(str, "on"))
>   swiotlb_force = 1;
> + if (!strcmp(str, "off"))
> + swiotlb_force_off = 1;
>  
>   return 1;
>  }
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held

2012-12-21 Thread Andy Lutomirski

On Fri, Dec 21, 2012 at 5:59 PM, Michel Lespinasse  wrote:
> On Fri, Dec 21, 2012 at 5:09 PM, Andy Lutomirski  wrote:
>> On Fri, Dec 21, 2012 at 4:59 PM, Michel Lespinasse  wrote:
>>> On Fri, Dec 21, 2012 at 4:36 PM, Andy Lutomirski  
>>> wrote:
 Something's buggy here.  My evil test case is stuck with lots of
 threads spinning at 100% system time.

 The tasks in question use MCL_FUTURE but not MAP_POPULATE.  These
 tasks are immune to SIGKILL.
>>>
>>> Looking into it.
>>>
>>> There seems to be a problem with mlockall - the following program
>>> fails in an unkillable way even before my changes:
>>>
>>> #include 
>>> #include 
>>> #include 
>>>
>>> int main(void) {
>>>   void *p = mmap(NULL, 0x1000,
>>>  PROT_READ | PROT_WRITE,
>>>  MAP_PRIVATE | MAP_ANON | MAP_NORESERVE,
>>>  -1, 0);
>>>   printf("p: %p\n", p);
>>>   mlockall(MCL_CURRENT);
>>>   return 0;
>>> }
>>>
>>> I think my changes propagate this existing problem so it now shows up
>>> in more places :/
>
> So in my test case, the issue was caused by the mapping being 2^32
> pages, which overflowed the integer 'nr_pages' argument to
> __get_user_pages, which caused an infinite loop as __get_user_pages()
> would return 0 so __mm_populate() would make no progress.
>
> When dropping one zero from that humongous size in the test case, the
> test case becomes at least killable.
>
>> Hmm.  I'm using MCL_FUTURE with MAP_NORESERVE, but those mappings are
>> not insanely large.  Should MAP_NORESERVE would negate MCL_FUTURE?
>> I'm doing MAP_NORESERVE, PROT_NONE to prevent pages from being
>> allocated in the future -- I have no intention of ever using them.
>
> MAP_NORESERVE doesn't prevent page allocation, but PROT_NONE does
> (precisely because people use it the same way as you do :)
>
>> The other odd thing I do is use MAP_FIXED to replace MAP_NORESERVE pages.
> Yes, I've seen people do that here too.
>
> Could you share your test case so I can try reproducing the issue
> you're seeing ?

Not so easy.  My test case is a large chunk of a high-frequency
trading system :)

I just tried it again.  Not I have a task stuck in
mlockall(MCL_CURRENT|MCL_FUTURE).  The stack is:

[<>] flush_work+0x1c2/0x280
[<>] schedule_on_each_cpu+0xe3/0x130
[<>] lru_add_drain_all+0x15/0x20
[<>] sys_mlockall+0x125/0x1a0
[<>] tracesys+0xd0/0xd5
[<>] 0x

The sequence of mmap and munmap calls, according to strace, is:

6084  mmap(NULL, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0e4000
6084  mmap(NULL, 8388744, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f55096c3000
6084  mmap(0x7f5509ec2000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3ff000) = 0x7f5509ec2000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550a0e3000
6084  mmap(NULL, 2413688, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5509475000
6084  mmap(0x7f550969c000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x27000) = 0x7f550969c000
6084  mmap(0x7f550969e000, 148600, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f550969e000
6084  mmap(NULL, 12636304, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5508867000
6084  mmap(0x7f550942, 327680, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7b9000) = 0x7f550942
6084  mmap(0x7f550947, 16528, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f550947
6084  mmap(NULL, 8409224, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5508061000
6084  mmap(0x7f550885b000, 36864, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3fa000) = 0x7f550885b000
6084  mmap(0x7f5508864000, 8328, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5508864000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550806
6084  mmap(NULL, 8404144, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f550785c000
6084  mmap(0x7f5508054000, 45056, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3f8000) = 0x7f5508054000
6084  mmap(0x7f550805f000, 3248, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f550805f000
6084  mmap(NULL, 8390584, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f550705b000
6084  mmap(0x7f5507859000, 12288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3fe000) = 0x7f5507859000
6084  mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f550705a000
6084  mmap(NULL, 8393296, PROT_READ|PROT_EXEC,
MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5506858000
6084  mmap(0x7f5507055000, 16384, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3fd000) = 0x7f5507055000
6084  mmap(0x7f5507059000, 592,

Re: [PATCH 25/25] ipc: don't use [delayed_]work_pending()

2012-12-21 Thread Andrew Morton

On Fri, 21 Dec 2012 17:57:15 -0800 Tejun Heo  wrote:

> There's no need to test whether a (delayed) work item in pending
> before queueing, flushing or cancelling it.  Most uses are unnecessary
> and quite a few of them are buggy.

> - if (!work_pending(_memory_wq))
> - schedule_work(_memory_wq);
> + schedule_work(_memory_wq);

Well, the new code is a ton slower than the old code if the work is
frequently pending, so some care is needed with such a conversion.

That's not an issue for the IPC callsite - memory offlining isn't
frequent.

> ...
>
> Please let me know how this patch should be routed.  I can take it
> through the workqueue tree if necessary.
> 

Please merge this one yourself.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v5 12/13] x86, 64bit: Print init kernel lowmap correctly

2012-12-21 Thread Konrad Rzeszutek Wilk

On Fri, Dec 21, 2012 at 03:52:53PM -0800, Yinghai Lu wrote:
> On Fri, Dec 21, 2012 at 3:39 PM, Konrad Rzeszutek Wilk
>  wrote:
> > On Fri, Dec 21, 2012 at 02:44:39PM -0800, Yinghai Lu wrote:
> >>
> >> maybe we can change the subject of this patch to:
> >>
> >> Subject: [PATCH] x86, 64bit: Don't set max_pfn_mapped wrong on native boot 
> >> path
> >
> > Or the inverse.
> >
> > Set max_pfn_mapped correctly on non-native boot path?
> >
> > But this patch is not actually touching max_pfn_mapped - it is vaddr_end?
> 
> No,
> 
> it is 0 for native path
> 
> 
> > So maybe:
> >
> > Subject: For platforms to set max_pfn_mapped, take that under advisement 
> > when blowing away __ka page entries.
   ^^ that

> 
> hard to understand.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/25] sja1000: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from sja1000.  Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Wolfgang Grandegger 
Cc: "David S. Miller" 
Cc: net...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/net/can/sja1000/peak_pci.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/can/sja1000/peak_pci.c 
b/drivers/net/can/sja1000/peak_pci.c
index d84888f..600ac72 100644
--- a/drivers/net/can/sja1000/peak_pci.c
+++ b/drivers/net/can/sja1000/peak_pci.c
@@ -339,8 +339,7 @@ static void peak_pciec_set_leds(struct peak_pciec_card 
*card, u8 led_mask, u8 s)
  */
 static void peak_pciec_start_led_work(struct peak_pciec_card *card)
 {
-   if (!delayed_work_pending(>led_work))
-   schedule_delayed_work(>led_work, HZ);
+   schedule_delayed_work(>led_work, HZ);
 }
 
 /*
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/25] devfreq: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from devfreq.  Only compile tested.

Signed-off-by: Tejun Heo 
Cc: MyungJoo Ham 
Cc: Kyungmin Park 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/devfreq/devfreq.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 53766f3..fb4695e 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -291,8 +291,7 @@ void devfreq_monitor_resume(struct devfreq *devfreq)
if (!devfreq->stop_polling)
goto out;
 
-   if (!delayed_work_pending(>work) &&
-   devfreq->profile->polling_ms)
+   if (devfreq->profile->polling_ms)
queue_delayed_work(devfreq_wq, >work,
msecs_to_jiffies(devfreq->profile->polling_ms));
devfreq->stop_polling = false;
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/25] charger_manager: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests and rewrite _setup_polling() so that
it uses mod_delayed_work() if the next polling interval is sooner than
currently scheduled.  queue_delayed_work() is used otherwise.

Only compile tested.  I noticed that two work items - setup_polling
and cm_monitor_work - schedule each other.  It's a very unusual
construct and I'm fairly sure it's racy.  You can't break such
circular dependency by calling cancel on each.  I strongly recommend
revising the mechanism.

Signed-off-by: Tejun Heo 
Cc: Anton Vorontsov 
Cc: David Woodhouse 
Cc: Donggeun Kim 
Cc: MyungJoo Ham 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/power/charger-manager.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/power/charger-manager.c b/drivers/power/charger-manager.c
index adb3a4b..9fd9776 100644
--- a/drivers/power/charger-manager.c
+++ b/drivers/power/charger-manager.c
@@ -675,15 +675,21 @@ static void _setup_polling(struct work_struct *work)
WARN(cm_wq == NULL, "charger-manager: workqueue not initialized"
". try it later. %s\n", __func__);
 
+   /*
+* Use mod_delayed_work() iff the next polling interval should
+* occur before the currently scheduled one.  If @cm_monitor_work
+* isn't active, the end result is the same, so no need to worry
+* about stale @next_polling.
+*/
_next_polling = jiffies + polling_jiffy;
 
-   if (!delayed_work_pending(_monitor_work) ||
-   (delayed_work_pending(_monitor_work) &&
-time_after(next_polling, _next_polling))) {
-   next_polling = jiffies + polling_jiffy;
+   if (time_before(_next_polling, next_polling)) {
mod_delayed_work(cm_wq, _monitor_work, polling_jiffy);
+   next_polling = _next_polling;
+   } else {
+   if (queue_delayed_work(cm_wq, _monitor_work, polling_jiffy))
+   next_polling = _next_polling;
}
-
 out:
mutex_unlock(_list_mtx);
 }
@@ -757,8 +763,7 @@ static void misc_event_handler(struct charger_manager *cm,
if (cm_suspended)
device_set_wakeup_capable(cm->dev, true);
 
-   if (!delayed_work_pending(_monitor_work) &&
-   is_polling_required(cm) && cm->desc->polling_interval_ms)
+   if (is_polling_required(cm) && cm->desc->polling_interval_ms)
schedule_work(_polling);
uevent_notify(cm, default_event_names[type]);
 }
@@ -1176,8 +1181,7 @@ static int charger_extcon_notifier(struct notifier_block 
*self,
 * when charger cable is attached.
 */
if (cable->attached && is_polling_required(cable->cm)) {
-   if (work_pending(_polling))
-   cancel_work_sync(_polling);
+   cancel_work_sync(_polling);
schedule_work(_polling);
}
 
@@ -1667,10 +1671,8 @@ static int charger_manager_remove(struct platform_device 
*pdev)
list_del(>entry);
mutex_unlock(_list_mtx);
 
-   if (work_pending(_polling))
-   cancel_work_sync(_polling);
-   if (delayed_work_pending(_monitor_work))
-   cancel_delayed_work_sync(_monitor_work);
+   cancel_work_sync(_polling);
+   cancel_delayed_work_sync(_monitor_work);
 
for (i = 0 ; i < desc->num_charger_regulators ; i++) {
struct charger_regulator *charger
@@ -1739,8 +1741,7 @@ static int cm_suspend_prepare(struct device *dev)
cm_suspended = true;
}
 
-   if (delayed_work_pending(>fullbatt_vchk_work))
-   cancel_delayed_work(>fullbatt_vchk_work);
+   cancel_delayed_work(>fullbatt_vchk_work);
cm->status_save_ext_pwr_inserted = is_ext_pwr_online(cm);
cm->status_save_batt = is_batt_present(cm);
 
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/25] libertas: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

* delayed_work_pending() test in lbs_cfg_scan() is spurious as
  priv->scan_req can't be NULL w/ scan_work pending; otherwise,
  lbs_scan_worker() will segfault.  Drop it.  BTW, the synchronization
  around scan_work seems racy.  There's nothing synchronizing accesses
  to scan related fields in lbs_private.

* Drop work_pending() test from if_sdio_reset_card().  As
  work_pending() becomes %false before if_sdio_reset_card_worker()
  starts executing, it doesn't really protect anything.  reset_host
  may change between mmc_remove_host() and mmc_add_host().  Make
  if_sdio_reset_card_worker() cache the target mmc_host so that it
  isn't affected by if_sdio_reset_card() racing with it.

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Dan Williams 
Cc: libertas-...@lists.infradead.org
Cc: linux-wirel...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/net/wireless/libertas/cfg.c | 2 +-
 drivers/net/wireless/libertas/if_sdio.c | 9 -
 2 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/libertas/cfg.c 
b/drivers/net/wireless/libertas/cfg.c
index ec6d5d6..ec30cd1 100644
--- a/drivers/net/wireless/libertas/cfg.c
+++ b/drivers/net/wireless/libertas/cfg.c
@@ -814,7 +814,7 @@ static int lbs_cfg_scan(struct wiphy *wiphy,
 
lbs_deb_enter(LBS_DEB_CFG80211);
 
-   if (priv->scan_req || delayed_work_pending(>scan_work)) {
+   if (priv->scan_req) {
/* old scan request not yet processed */
ret = -EAGAIN;
goto out;
diff --git a/drivers/net/wireless/libertas/if_sdio.c 
b/drivers/net/wireless/libertas/if_sdio.c
index 739309e..8c53c17 100644
--- a/drivers/net/wireless/libertas/if_sdio.c
+++ b/drivers/net/wireless/libertas/if_sdio.c
@@ -1074,6 +1074,8 @@ static struct mmc_host *reset_host;
 
 static void if_sdio_reset_card_worker(struct work_struct *work)
 {
+   struct mmc_host *target = reset_host;
+
/*
 * The actual reset operation must be run outside of lbs_thread. This
 * is because mmc_remove_host() will cause the device to be instantly
@@ -1085,8 +1087,8 @@ static void if_sdio_reset_card_worker(struct work_struct 
*work)
 */
 
pr_info("Resetting card...");
-   mmc_remove_host(reset_host);
-   mmc_add_host(reset_host);
+   mmc_remove_host(target);
+   mmc_add_host(target);
 }
 static DECLARE_WORK(card_reset_work, if_sdio_reset_card_worker);
 
@@ -1094,9 +1096,6 @@ static void if_sdio_reset_card(struct lbs_private *priv)
 {
struct if_sdio_card *card = priv->card;
 
-   if (work_pending(_reset_work))
-   return;
-
reset_host = card->func->card->host;
schedule_work(_reset_work);
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/25] x86/mce: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from x86/mce.  Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Tony Luck 
Cc: Borislav Petkov 
Cc: linux-e...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 arch/x86/kernel/cpu/mcheck/mce.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 80dbda8..c06a736 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -514,8 +514,7 @@ static void mce_schedule_work(void)
 {
if (!mce_ring_empty()) {
struct work_struct *work = &__get_cpu_var(mce_work);
-   if (!work_pending(work))
-   schedule_work(work);
+   schedule_work(work);
}
 }
 
@@ -1351,12 +1350,7 @@ int mce_notify_irq(void)
/* wake processes polling /dev/mcelog */
wake_up_interruptible(_chrdev_wait);
 
-   /*
-* There is no risk of missing notifications because
-* work_pending is always cleared before the function is
-* executed.
-*/
-   if (mce_helper[0] && !work_pending(_trigger_work))
+   if (mce_helper[0])
schedule_work(_trigger_work);
 
if (__ratelimit())
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 14/25] rfkill: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from rfkill.  Only compile
tested.

Signed-off-by: Tejun Heo 
Cc: "John W. Linville" 
Cc: linux-wirel...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 net/rfkill/input.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/rfkill/input.c b/net/rfkill/input.c
index c9d931e..b85107b 100644
--- a/net/rfkill/input.c
+++ b/net/rfkill/input.c
@@ -148,11 +148,9 @@ static unsigned long rfkill_ratelimit(const unsigned long 
last)
 
 static void rfkill_schedule_ratelimited(void)
 {
-   if (delayed_work_pending(_op_work))
-   return;
-   schedule_delayed_work(_op_work,
- rfkill_ratelimit(rfkill_last_scheduled));
-   rfkill_last_scheduled = jiffies;
+   if (schedule_delayed_work(_op_work,
+ rfkill_ratelimit(rfkill_last_scheduled)))
+   rfkill_last_scheduled = jiffies;
 }
 
 static void rfkill_schedule_global_op(enum rfkill_sched_op op)
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/25] pm: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from pm autosleep and qos.  Only
compile tested.

Signed-off-by: Tejun Heo 
Cc: "Rafael J. Wysocki" 
Cc: linux...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 kernel/power/autosleep.c | 2 +-
 kernel/power/qos.c   | 9 +++--
 2 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/power/autosleep.c b/kernel/power/autosleep.c
index ca304046..c6422ff 100644
--- a/kernel/power/autosleep.c
+++ b/kernel/power/autosleep.c
@@ -66,7 +66,7 @@ static DECLARE_WORK(suspend_work, try_to_suspend);
 
 void queue_up_suspend_work(void)
 {
-   if (!work_pending(_work) && autosleep_state > PM_SUSPEND_ON)
+   if (autosleep_state > PM_SUSPEND_ON)
queue_work(autosleep_wq, _work);
 }
 
diff --git a/kernel/power/qos.c b/kernel/power/qos.c
index 9322ff7..587ddde 100644
--- a/kernel/power/qos.c
+++ b/kernel/power/qos.c
@@ -359,8 +359,7 @@ void pm_qos_update_request(struct pm_qos_request *req,
return;
}
 
-   if (delayed_work_pending(>work))
-   cancel_delayed_work_sync(>work);
+   cancel_delayed_work_sync(>work);
 
if (new_value != req->node.prio)
pm_qos_update_target(
@@ -386,8 +385,7 @@ void pm_qos_update_request_timeout(struct pm_qos_request 
*req, s32 new_value,
 "%s called for unknown object.", __func__))
return;
 
-   if (delayed_work_pending(>work))
-   cancel_delayed_work_sync(>work);
+   cancel_delayed_work_sync(>work);
 
if (new_value != req->node.prio)
pm_qos_update_target(
@@ -416,8 +414,7 @@ void pm_qos_remove_request(struct pm_qos_request *req)
return;
}
 
-   if (delayed_work_pending(>work))
-   cancel_delayed_work_sync(>work);
+   cancel_delayed_work_sync(>work);
 
pm_qos_update_target(pm_qos_array[req->pm_qos_class]->constraints,
 >node, PM_QOS_REMOVE_REQ,
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/25] thinkpad_acpi: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from thinkpad_acpi.  Only compile
tested.

Signed-off-by: Tejun Heo 
Cc: Henrique de Moraes Holschuh 
Cc: ibm-acpi-de...@lists.sourceforge.net
Cc: platform-driver-...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/platform/x86/thinkpad_acpi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/platform/x86/thinkpad_acpi.c 
b/drivers/platform/x86/thinkpad_acpi.c
index 75dd651..8421d1e 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -4877,8 +4877,7 @@ static int __init light_init(struct ibm_init_struct *iibm)
 static void light_exit(void)
 {
led_classdev_unregister(_led_thinklight.led_classdev);
-   if (work_pending(_led_thinklight.work))
-   flush_workqueue(tpacpi_wq);
+   flush_workqueue(tpacpi_wq);
 }
 
 static int light_read(struct seq_file *m)
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/25] kprobes: fix wait_for_kprobe_optimizer()

2012-12-21 Thread Tejun Heo

wait_for_kprobe_optimizer() seems largely broken.  It uses
optimizer_comp which is never re-initialized, so
wait_for_kprobe_optimizer() will never wait for anything once
kprobe_optimizer() finishes all pending jobs for the first time.

Also, aside from completion, delayed_work_pending() is %false once
kprobe_optimizer() starts execution and wait_for_kprobe_optimizer()
won't wait for it.

Reimplement it so that it flushes optimizing_work until
[un]optimizing_lists are empty.  Note that this also makes
optimizing_work execute immediately if someone's waiting for it, which
is the nicer behavior.

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Ananth N Mavinakayanahalli 
Cc: Anil S Keshavamurthy 
Cc: "David S. Miller" 
Cc: Masami Hiramatsu 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 kernel/kprobes.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 098f396..f230e81 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -471,7 +471,6 @@ static LIST_HEAD(unoptimizing_list);
 
 static void kprobe_optimizer(struct work_struct *work);
 static DECLARE_DELAYED_WORK(optimizing_work, kprobe_optimizer);
-static DECLARE_COMPLETION(optimizer_comp);
 #define OPTIMIZE_DELAY 5
 
 /*
@@ -552,8 +551,7 @@ static __kprobes void do_free_cleaned_kprobes(struct 
list_head *free_list)
 /* Start optimizer after OPTIMIZE_DELAY passed */
 static __kprobes void kick_kprobe_optimizer(void)
 {
-   if (!delayed_work_pending(_work))
-   schedule_delayed_work(_work, OPTIMIZE_DELAY);
+   schedule_delayed_work(_work, OPTIMIZE_DELAY);
 }
 
 /* Kprobe jump optimizer */
@@ -592,16 +590,25 @@ static __kprobes void kprobe_optimizer(struct work_struct 
*work)
/* Step 5: Kick optimizer again if needed */
if (!list_empty(_list) || !list_empty(_list))
kick_kprobe_optimizer();
-   else
-   /* Wake up all waiters */
-   complete_all(_comp);
 }
 
 /* Wait for completing optimization and unoptimization */
 static __kprobes void wait_for_kprobe_optimizer(void)
 {
-   if (delayed_work_pending(_work))
-   wait_for_completion(_comp);
+   mutex_lock(_mutex);
+
+   while (!list_empty(_list) || 
!list_empty(_list)) {
+   mutex_unlock(_mutex);
+
+   /* this will also make optimizing_work execute immmediately */
+   flush_delayed_work(_work);
+   /* @optimizing_work might not have been queued yet, relax */
+   cpu_relax();
+
+   mutex_lock(_mutex);
+   }
+
+   mutex_unlock(_mutex);
 }
 
 /* Optimize kprobe if p is ready to be optimized */
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 09/25] wl1251: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from wl1251.  Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Luciano Coelho 
Cc: linux-wirel...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/net/wireless/ti/wl1251/ps.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ti/wl1251/ps.c 
b/drivers/net/wireless/ti/wl1251/ps.c
index db719f7..b9e27b9 100644
--- a/drivers/net/wireless/ti/wl1251/ps.c
+++ b/drivers/net/wireless/ti/wl1251/ps.c
@@ -68,8 +68,7 @@ int wl1251_ps_elp_wakeup(struct wl1251 *wl)
unsigned long timeout, start;
u32 elp_reg;
 
-   if (delayed_work_pending(>elp_work))
-   cancel_delayed_work(>elp_work);
+   cancel_delayed_work(>elp_work);
 
if (!wl->elp)
return 0;
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linux 3.8-rc1

2012-12-21 Thread Linus Torvalds

The longest night of the year is upon us (*), and what better thing to
do than get yourself some nice mulled wine, sit back, relax, and play
with the most recent rc kernel?

This has been a big merge window: we've got more commits than any
other kernel in the v3.x kernel series (although v3.2-rc1 was *almost*
as big). It's been a rather busy merge window, in other words.

The diffstat looks normal: about 63% of the patch being to drivers
(staging, networking, scsi, gpu, sound, drbd etc) , 18% architecture
updates (with various ARM platform things being the bulk of it as
usual, sigh), and the rest being "various", like core networking,
filesystems (new f2fs flash-optimized filesystem) and include files
etc.

I'm appending the "merge shortlog" which is about the only half-way
readable automated data I can give you. There's a *ton* of stuff here.
Go out and test it,

  Linus

(*) And by "us" I mean mainly people in the same timezone and
hemisphere as I am. Because I'm too self-centered to care about
anybody else.

---

Alasdair G Kergon
 - dm update

Alex Williamson
 - vfio update

Al Viro
 - big execve/kernel_thread/fork unification series
 - signal handling cleanups
 - VFS update

Andrew Morton
 - misc patches
 - misc updates
 - misc VM changes
 - patches

Anton Vorontsov
 - battery subsystem updates
 - battery update, part 2
 - pstore update

Arnd Bergmann
 - asm-generic cleanup

Artem Bityutskiy
 - UBI update

Benjamin Herrenschmidt
 - powerpc update

Ben Myers
 - xfs update

Bjorn Helgaas
 - PCI update

Boaz Harrosh
 - exofs changes

Bob Liu
 - blackfin update

Borislav Petkov
 - EDAC fixes

Bruce Fields
 - nfsd update

Bryan Wu
 - LED subsystem update

Catalin Marinas
 - ARM64 updates

Chris Ball
 - MMC updates

Chris Mason
 - btrfs update
 - two btrfs reverts

Chris Metcalf
 - tile updates

Chris Zankel
 - Xtensa patchset

Dave Airlie
 - drm bugfix
 - DRM updates

David Howells
 - MN10300 changes
 - UAPI disintegration for Alpha
 - UAPI disintegration for H8/300, M32R and Score
 - x86 UAPI disintegration

David Miller
 - networking changes
 - networking fixes
 - networking fixes
 - sparc fixes
 - tiny sparc update

David Teigland
 - dlm updates

David Woodhouse
 - MTD updates
 - preparatory gcc intrisics bswap patch

Dmitry Torokhov
 - second round of input updates

Eric Biederman
 - user namespace changes
 - (again) user namespace infrastructure changes

Eric Paris
 - filesystem notification updates

Geert Uytterhoeven
 - m68k updates

Grant Likely
 - another devicetree update
 - device tree changes
 - devicetree, gpio and spi bugfixes
 - GPIO updates
 - irqdomain changes
 - SPI updates

Greg Kroah-Hartman
 - Char/Misc driver merge
 - driver core updates
 - EXTCON patches
 - staging driver tree merge
 - TTY/Serial merge
 - USB patches

Greg Ungerer
 - m68knommu updates

Guenter Roeck
 - hwmon fixlet
 - hwmon updates

Herbert Xu
 - crypto update

Ian Kent
 - emailed autofs cleanup/fix patches

Ingo Molnar
 - core timer changes
 - irq fixes
 - "Nuke 386-DX/SX support"
 - perf fixes
 - perf updates
 - RCU update
 - scheduler updates
 - trivial fix branches
 - x86 asm changes
 - x86 boot changes
 - x86 BSP hotplug changes
 - x86 cleanups
 - x86 RAS update
 - x86 timer update
 - x86 topology discovery improvements

Jaegeuk Kim
 - new F2FS filesystem

James Bottomley
 - first round of SCSI updates

James Morris
 - security subsystem updates

Jan Kara
 - ext3, udf, quota fixes

Jean Delvare
 - hwmon subsystem update
 - i2c update

Jeff Garzik
 - libata updates

Jens Axboe
 - block driver update
 - block layer core updates

Jesper Nilsson
 - CRIS changes

Jiri Kosina
 - HID subsystem updates
 - trivial branch

Joerg Roedel
 - IOMMU updates

Jonas Bonn
 - OpenRISC update

Konrad Rzeszutek Wilk
 - swiotlb update
 - Xen bugfixes
 - Xen updates

Len Brown
 - powertool update

Linus Walleij
 - pinctrl changes

Marcelo Tosatti
 - KVM updates

Marek Szyprowski
 - CMA and DMA-mapping update

Mark Brown
 - regmap updates
 - regulator updates

Martin Schwidefsky
 - s390 update
 - s390 update #2

Mauro Carvalho Chehab
 - media updates

Mel Gorman
 - Automatic NUMA Balancing bare-bones

Michal Marek
 - kbuild changes
 - kbuild misc changes

Michal Simek
 - microblaze update

Mike Turquette
 - clock framework changes

Neil Brown
 - md update

Nicholas Bellinger
 - target updates

Olof Johansson
 - ARM SoC board updates
 - ARM SoC board updates, take 2
 - ARM SoC cleanups on various subarchitectures
 - ARM SoC device tree conversions and enablement
 - ARM SoC device-tree updates, take 2,
 - ARM SoC driver specific changes
 - ARM SoC fixes
 - ARM SoC fixes part 2
 - ARM SoC Header cleanups
 - ARM SoC multiplatform conversion patches
 - ARM SoC Non-critical bug fixes
 - ARM SoC power management and clock changes
 - ARM SoC updates
 - ARM SoC updates for Marvell mvebu/kirkwood
 - ARM Soc updates, take 2,

Pekka Enberg
 - SLAB changes

Peter Anvin
 - one final 386 removal patch
 - small x86 fixes
 - x86 ACPI

[PATCH 17/25] wm97xx: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from wm97xx.  Instead of testing
work_pending(), use the return value of queue_work() to decide whether
to disable IRQ or not.

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Mark Brown 
Cc: Liam Girdwood 
Cc: linux-in...@vger.kernel.org
Cc: Dmitry Torokhov 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/input/touchscreen/wm97xx-core.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/input/touchscreen/wm97xx-core.c 
b/drivers/input/touchscreen/wm97xx-core.c
index 5dbe73a..fd16c63 100644
--- a/drivers/input/touchscreen/wm97xx-core.c
+++ b/drivers/input/touchscreen/wm97xx-core.c
@@ -363,10 +363,8 @@ static irqreturn_t wm97xx_pen_interrupt(int irq, void 
*dev_id)
 {
struct wm97xx *wm = dev_id;
 
-   if (!work_pending(>pen_event_work)) {
+   if (queue_work(wm->ts_workq, >pen_event_work))
wm->mach_ops->irq_enable(wm, 0);
-   queue_work(wm->ts_workq, >pen_event_work);
-   }
 
return IRQ_HANDLED;
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 16/25] PM / Domains: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from power domains.  Only compile
tested.

Signed-off-by: Tejun Heo 
Cc: Rafael J. Wysocki 
Cc: linux...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/base/power/domain.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index acc3a8d..9a6b05a 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -433,8 +433,7 @@ static bool genpd_abort_poweroff(struct generic_pm_domain 
*genpd)
  */
 void genpd_queue_power_off_work(struct generic_pm_domain *genpd)
 {
-   if (!work_pending(>power_off_work))
-   queue_work(pm_wq, >power_off_work);
+   queue_work(pm_wq, >power_off_work);
 }
 
 /**
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 19/25] net/caif: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from caif.  Only compile tested.

Signed-off-by: Tejun Heo 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/net/caif/caif_shmcore.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/net/caif/caif_shmcore.c b/drivers/net/caif/caif_shmcore.c
index bc497d7..16bd654 100644
--- a/drivers/net/caif/caif_shmcore.c
+++ b/drivers/net/caif/caif_shmcore.c
@@ -183,9 +183,7 @@ int caif_shmdrv_rx_cb(u32 mbx_msg, void *priv)
spin_unlock_irqrestore(_drv->lock, flags);
 
/* Schedule RX work queue. */
-   if (!work_pending(_drv->shm_rx_work))
-   queue_work(pshm_drv->pshm_rx_workqueue,
-   _drv->shm_rx_work);
+   queue_work(pshm_drv->pshm_rx_workqueue, _drv->shm_rx_work);
}
 
/* Check for emptied buffers. */
@@ -246,9 +244,8 @@ int caif_shmdrv_rx_cb(u32 mbx_msg, void *priv)
 
 
/* Schedule the work queue. if required */
-   if (!work_pending(_drv->shm_tx_work))
-   queue_work(pshm_drv->pshm_tx_workqueue,
-   _drv->shm_tx_work);
+   queue_work(pshm_drv->pshm_tx_workqueue,
+  _drv->shm_tx_work);
} else
spin_unlock_irqrestore(_drv->lock, flags);
}
@@ -374,8 +371,7 @@ static void shm_rx_work_func(struct work_struct *rx_work)
}
 
/* Schedule the work queue. if required */
-   if (!work_pending(_drv->shm_tx_work))
-   queue_work(pshm_drv->pshm_tx_workqueue, _drv->shm_tx_work);
+   queue_work(pshm_drv->pshm_tx_workqueue, _drv->shm_tx_work);
 
 }
 
@@ -528,8 +524,7 @@ static int shm_netdev_tx(struct sk_buff *skb, struct 
net_device *shm_netdev)
skb_queue_tail(_drv->sk_qhead, skb);
 
/* Schedule Tx work queue. for deferred processing of skbs*/
-   if (!work_pending(_drv->shm_tx_work))
-   queue_work(pshm_drv->pshm_tx_workqueue, _drv->shm_tx_work);
+   queue_work(pshm_drv->pshm_tx_workqueue, _drv->shm_tx_work);
 
return 0;
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 20/25] wimax/i2400m: fix i2400m->wake_tx_skb handling

2012-12-21 Thread Tejun Heo

i2400m_net_wake_tx() sets ->wake_tx_skb with the given skb if
->wake_tx_ws is not pending; however, i2400m_wake_tx_work() could have
just started execution and haven't fetched ->wake_tx_skb handling.

* i2400m_net_wake_tx() now tests whether the previous ->wake_tx_skb
  has been consumed by ->wake_tx_ws instead of testing work_pending().

* i2400m_net_wake_stop() is simplified similarly.  It always puts
  ->wake_tx_skb if non-NULL.

* Spurious ->wake_tx_skb dereference outside critical section dropped
  from i2400m_wake_tx_work().

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Inaky Perez-Gonzalez 
Cc: linux-wi...@intel.com
Cc: wi...@linuxwimax.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/net/wimax/i2400m/netdev.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/net/wimax/i2400m/netdev.c 
b/drivers/net/wimax/i2400m/netdev.c
index 1d76ae8..530581c 100644
--- a/drivers/net/wimax/i2400m/netdev.c
+++ b/drivers/net/wimax/i2400m/netdev.c
@@ -156,7 +156,7 @@ void i2400m_wake_tx_work(struct work_struct *ws)
struct i2400m *i2400m = container_of(ws, struct i2400m, wake_tx_ws);
struct net_device *net_dev = i2400m->wimax_dev.net_dev;
struct device *dev = i2400m_dev(i2400m);
-   struct sk_buff *skb = i2400m->wake_tx_skb;
+   struct sk_buff *skb;
unsigned long flags;
 
spin_lock_irqsave(>tx_lock, flags);
@@ -236,23 +236,26 @@ void i2400m_tx_prep_header(struct sk_buff *skb)
 void i2400m_net_wake_stop(struct i2400m *i2400m)
 {
struct device *dev = i2400m_dev(i2400m);
+   struct sk_buff *wake_tx_skb;
+   unsigned long flags;
 
d_fnstart(3, dev, "(i2400m %p)\n", i2400m);
-   /* See i2400m_hard_start_xmit(), references are taken there
-* and here we release them if the work was still
-* pending. Note we can't differentiate work not pending vs
-* never scheduled, so the NULL check does that. */
-   if (cancel_work_sync(>wake_tx_ws) == 0
-   && i2400m->wake_tx_skb != NULL) {
-   unsigned long flags;
-   struct sk_buff *wake_tx_skb;
-   spin_lock_irqsave(>tx_lock, flags);
-   wake_tx_skb = i2400m->wake_tx_skb;  /* compat help */
-   i2400m->wake_tx_skb = NULL; /* compat help */
-   spin_unlock_irqrestore(>tx_lock, flags);
+   /*
+* See i2400m_hard_start_xmit(), references are taken there and
+* here we release them if the packet was still pending.
+*/
+   cancel_work_sync(>wake_tx_ws);
+
+   spin_lock_irqsave(>tx_lock, flags);
+   wake_tx_skb = i2400m->wake_tx_skb;
+   i2400m->wake_tx_skb = NULL;
+   spin_unlock_irqrestore(>tx_lock, flags);
+
+   if (wake_tx_skb) {
i2400m_put(i2400m);
kfree_skb(wake_tx_skb);
}
+
d_fnend(3, dev, "(i2400m %p) = void\n", i2400m);
 }
 
@@ -288,7 +291,7 @@ int i2400m_net_wake_tx(struct i2400m *i2400m, struct 
net_device *net_dev,
 * and if pending, release those resources. */
result = 0;
spin_lock_irqsave(>tx_lock, flags);
-   if (!work_pending(>wake_tx_ws)) {
+   if (!i2400m->wake_tx_skb) {
netif_stop_queue(net_dev);
i2400m_get(i2400m);
i2400m->wake_tx_skb = skb_get(skb); /* transfer ref count */
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/9] Avoid populating unbounded num of ptes with mmap_sem held

2012-12-21 Thread Michel Lespinasse

On Fri, Dec 21, 2012 at 5:09 PM, Andy Lutomirski  wrote:
> On Fri, Dec 21, 2012 at 4:59 PM, Michel Lespinasse  wrote:
>> On Fri, Dec 21, 2012 at 4:36 PM, Andy Lutomirski  wrote:
>>> Something's buggy here.  My evil test case is stuck with lots of
>>> threads spinning at 100% system time.
>>>
>>> The tasks in question use MCL_FUTURE but not MAP_POPULATE.  These
>>> tasks are immune to SIGKILL.
>>
>> Looking into it.
>>
>> There seems to be a problem with mlockall - the following program
>> fails in an unkillable way even before my changes:
>>
>> #include 
>> #include 
>> #include 
>>
>> int main(void) {
>>   void *p = mmap(NULL, 0x1000,
>>  PROT_READ | PROT_WRITE,
>>  MAP_PRIVATE | MAP_ANON | MAP_NORESERVE,
>>  -1, 0);
>>   printf("p: %p\n", p);
>>   mlockall(MCL_CURRENT);
>>   return 0;
>> }
>>
>> I think my changes propagate this existing problem so it now shows up
>> in more places :/

So in my test case, the issue was caused by the mapping being 2^32
pages, which overflowed the integer 'nr_pages' argument to
__get_user_pages, which caused an infinite loop as __get_user_pages()
would return 0 so __mm_populate() would make no progress.

When dropping one zero from that humongous size in the test case, the
test case becomes at least killable.

> Hmm.  I'm using MCL_FUTURE with MAP_NORESERVE, but those mappings are
> not insanely large.  Should MAP_NORESERVE would negate MCL_FUTURE?
> I'm doing MAP_NORESERVE, PROT_NONE to prevent pages from being
> allocated in the future -- I have no intention of ever using them.

MAP_NORESERVE doesn't prevent page allocation, but PROT_NONE does
(precisely because people use it the same way as you do :)

> The other odd thing I do is use MAP_FIXED to replace MAP_NORESERVE pages.
Yes, I've seen people do that here too.

Could you share your test case so I can try reproducing the issue
you're seeing ?

Thanks,

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 12/25] bluetooth/l2cap: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Reimplement l2cap_set_timer() such that it uses mod_delayed_work() or
schedule_delayed_work() depending on a new param @override and let the
users specify whether to override or not instead of using
delayed_work_pending().

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Marcel Holtmann 
Cc: Gustavo Padovan 
Cc: Johan Hedberg 
Cc: linux-blueto...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 include/net/bluetooth/l2cap.h | 24 
 net/bluetooth/l2cap_core.c|  7 +++
 2 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
index 7588ef4..f12cbeb 100644
--- a/include/net/bluetooth/l2cap.h
+++ b/include/net/bluetooth/l2cap.h
@@ -718,17 +718,25 @@ static inline void l2cap_chan_unlock(struct l2cap_chan 
*chan)
 }
 
 static inline void l2cap_set_timer(struct l2cap_chan *chan,
-  struct delayed_work *work, long timeout)
+  struct delayed_work *work, long timeout,
+  bool override)
 {
+   bool was_pending;
+
BT_DBG("chan %p state %s timeout %ld", chan,
   state_to_string(chan->state), timeout);
 
-   /* If delayed work cancelled do not hold(chan)
-  since it is already done with previous set_timer */
-   if (!cancel_delayed_work(work))
-   l2cap_chan_hold(chan);
+   /* @work should hold a reference to @chan */
+   l2cap_chan_hold(chan);
+
+   if (override)
+   was_pending = mod_delayed_work(system_wq, work, timeout);
+   else
+   was_pending = !schedule_delayed_work(work, timeout);
 
-   schedule_delayed_work(work, timeout);
+   /* if @work was already pending, lose the extra ref */
+   if (was_pending)
+   l2cap_chan_put(chan);
 }
 
 static inline bool l2cap_clear_timer(struct l2cap_chan *chan,
@@ -745,12 +753,12 @@ static inline bool l2cap_clear_timer(struct l2cap_chan 
*chan,
return ret;
 }
 
-#define __set_chan_timer(c, t) l2cap_set_timer(c, >chan_timer, (t))
+#define __set_chan_timer(c, t) l2cap_set_timer(c, >chan_timer, (t), true)
 #define __clear_chan_timer(c) l2cap_clear_timer(c, >chan_timer)
 #define __clear_retrans_timer(c) l2cap_clear_timer(c, >retrans_timer)
 #define __clear_monitor_timer(c) l2cap_clear_timer(c, >monitor_timer)
 #define __set_ack_timer(c) l2cap_set_timer(c, >ack_timer, \
-   msecs_to_jiffies(L2CAP_DEFAULT_ACK_TO));
+   msecs_to_jiffies(L2CAP_DEFAULT_ACK_TO), true);
 #define __clear_ack_timer(c) l2cap_clear_timer(c, >ack_timer)
 
 static inline int __seq_offset(struct l2cap_chan *chan, __u16 seq1, __u16 seq2)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 2c78208..91db91c 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -246,10 +246,9 @@ static inline void l2cap_chan_set_err(struct l2cap_chan 
*chan, int err)
 
 static void __set_retrans_timer(struct l2cap_chan *chan)
 {
-   if (!delayed_work_pending(>monitor_timer) &&
-   chan->retrans_timeout) {
+   if (chan->retrans_timeout) {
l2cap_set_timer(chan, >retrans_timer,
-   msecs_to_jiffies(chan->retrans_timeout));
+   msecs_to_jiffies(chan->retrans_timeout), false);
}
 }
 
@@ -258,7 +257,7 @@ static void __set_monitor_timer(struct l2cap_chan *chan)
__clear_retrans_timer(chan);
if (chan->monitor_timeout) {
l2cap_set_timer(chan, >monitor_timer,
-   msecs_to_jiffies(chan->monitor_timeout));
+   msecs_to_jiffies(chan->monitor_timeout), true);
}
 }
 
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 23/25] video/exynos: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from exynos_dp_core.  Only compile
tested.

Signed-off-by: Tejun Heo 
Cc: Kukjin Kim 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/video/exynos/exynos_dp_core.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/video/exynos/exynos_dp_core.c 
b/drivers/video/exynos/exynos_dp_core.c
index 28fd686..3002a6a 100644
--- a/drivers/video/exynos/exynos_dp_core.c
+++ b/drivers/video/exynos/exynos_dp_core.c
@@ -1121,8 +1121,7 @@ static int __devexit exynos_dp_remove(struct 
platform_device *pdev)
 
disable_irq(dp->irq);
 
-   if (work_pending(>hotplug_work))
-   flush_work(>hotplug_work);
+   flush_work(>hotplug_work);
 
if (pdev->dev.of_node) {
if (dp->phy_addr)
@@ -1144,8 +1143,7 @@ static int exynos_dp_suspend(struct device *dev)
struct exynos_dp_platdata *pdata = dev->platform_data;
struct exynos_dp_device *dp = dev_get_drvdata(dev);
 
-   if (work_pending(>hotplug_work))
-   flush_work(>hotplug_work);
+   flush_work(>hotplug_work);
 
if (dev->of_node) {
if (dp->phy_addr)
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 24/25] debugobjects: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from debugobjects.  While at it,
change @sched to bool and move kevent_up() test to later for brevity.

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Thomas Gleixner 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 lib/debugobjects.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index d11808c..b9dbfdf 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -189,20 +189,19 @@ static void free_obj_work(struct work_struct *work)
 static void free_object(struct debug_obj *obj)
 {
unsigned long flags;
-   int sched = 0;
+   bool sched;
 
raw_spin_lock_irqsave(_lock, flags);
/*
 * schedule work when the pool is filled and the cache is
 * initialized:
 */
-   if (obj_pool_free > ODEBUG_POOL_SIZE && obj_cache)
-   sched = keventd_up() && !work_pending(_obj_work);
+   sched = obj_pool_free > ODEBUG_POOL_SIZE && obj_cache;
hlist_add_head(>node, _pool);
obj_pool_free++;
obj_pool_used--;
raw_spin_unlock_irqrestore(_lock, flags);
-   if (sched)
+   if (sched && keventd_up())
schedule_work(_obj_work);
 }
 
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 25/25] ipc: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from ipc.  Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Andrew Morton 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 ipc/util.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/ipc/util.c b/ipc/util.c
index 72fd078..add2776 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -71,8 +71,7 @@ static int ipc_memory_callback(struct notifier_block *self,
 * activate the ipcns notification chain.
 * No need to keep several ipc work items on the queue.
 */
-   if (!work_pending(_memory_wq))
-   schedule_work(_memory_wq);
+   schedule_work(_memory_wq);
break;
case MEM_GOING_ONLINE:
case MEM_GOING_OFFLINE:
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Makefile: Make checkstack work with O= builds

2012-12-21 Thread Stephen Boyd

The vmlinux doesn't always live in the same directory as the
source files and so 'make O=obj checkstack' fails with a missing
vmlinux file. Fix checkstack so that this is possible.

Signed-off-by: Stephen Boyd 
---

It would also be nice if this depended on vmlinux and modules being built
already but I couldn't figure that part out.

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 4fe0559..54a386b 100644
--- a/Makefile
+++ b/Makefile
@@ -1318,7 +1318,7 @@ else
 CHECKSTACK_ARCH := $(ARCH)
 endif
 checkstack:
-   $(OBJDUMP) -d vmlinux $$(find . -name '*.ko') | \
+   $(OBJDUMP) -d $(objtree)/vmlinux $$(find $(objtree) -name '*.ko') | \
$(PERL) $(src)/scripts/checkstack.pl $(CHECKSTACK_ARCH)
 
 kernelrelease:
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 22/25] usb/at91_udc: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from at91_udc.  Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Andrew Victor 
Cc: Nicolas Ferre 
Cc: Jean-Christophe Plagniol-Villard 
Cc: Felipe Balbi 
Cc: linux-...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/usb/gadget/at91_udc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/at91_udc.c b/drivers/usb/gadget/at91_udc.c
index f4a21f6..e81d8a2 100644
--- a/drivers/usb/gadget/at91_udc.c
+++ b/drivers/usb/gadget/at91_udc.c
@@ -1621,8 +1621,7 @@ static void at91_vbus_timer(unsigned long data)
 * bus such as i2c or spi which may sleep, so schedule some work
 * to read the vbus gpio
 */
-   if (!work_pending(>vbus_timer_work))
-   schedule_work(>vbus_timer_work);
+   schedule_work(>vbus_timer_work);
 }
 
 static int at91_start(struct usb_gadget *gadget,
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 21/25] tty/max3100: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from max3100.  Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Greg Kroah-Hartman 
Cc: Jiri Slaby 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/tty/serial/max3100.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/tty/serial/max3100.c b/drivers/tty/serial/max3100.c
index 7ce3197..dd6277e 100644
--- a/drivers/tty/serial/max3100.c
+++ b/drivers/tty/serial/max3100.c
@@ -179,8 +179,7 @@ static void max3100_work(struct work_struct *w);
 
 static void max3100_dowork(struct max3100_port *s)
 {
-   if (!s->force_end_work && !work_pending(>work) &&
-   !freezing(current) && !s->suspending)
+   if (!s->force_end_work && !freezing(current) && !s->suspending)
queue_work(s->workqueue, >work);
 }
 
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 18/25] TMIO MMC: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from tmio mmc.  Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Guennadi Liakhovetski 
Cc: Ian Molton 
Cc: linux-...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/mmc/host/tmio_mmc_pio.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/mmc/host/tmio_mmc_pio.c b/drivers/mmc/host/tmio_mmc_pio.c
index 50bf495..f4f18b3 100644
--- a/drivers/mmc/host/tmio_mmc_pio.c
+++ b/drivers/mmc/host/tmio_mmc_pio.c
@@ -573,8 +573,7 @@ static bool __tmio_mmc_card_detect_irq(struct tmio_mmc_host 
*host,
tmio_mmc_ack_mmc_irqs(host, TMIO_STAT_CARD_INSERT |
TMIO_STAT_CARD_REMOVE);
if ireg & TMIO_STAT_CARD_REMOVE) && mmc->card) ||
-((ireg & TMIO_STAT_CARD_INSERT) && !mmc->card)) &&
-   !work_pending(>detect.work))
+((ireg & TMIO_STAT_CARD_INSERT) && !mmc->card)))
mmc_detect_change(host->mmc, msecs_to_jiffies(100));
return true;
}
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/25] sound/wm8350: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from wm8350.  Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Mark Brown 
Cc: patc...@opensource.wolfsonmicro.com
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 sound/soc/codecs/wm8350.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/sound/soc/codecs/wm8350.c b/sound/soc/codecs/wm8350.c
index fb92fb4..ec0efc1 100644
--- a/sound/soc/codecs/wm8350.c
+++ b/sound/soc/codecs/wm8350.c
@@ -283,18 +283,16 @@ static int pga_event(struct snd_soc_dapm_widget *w,
out->ramp = WM8350_RAMP_UP;
out->active = 1;
 
-   if (!delayed_work_pending(>dapm.delayed_work))
-   schedule_delayed_work(>dapm.delayed_work,
- msecs_to_jiffies(1));
+   schedule_delayed_work(>dapm.delayed_work,
+ msecs_to_jiffies(1));
break;
 
case SND_SOC_DAPM_PRE_PMD:
out->ramp = WM8350_RAMP_DOWN;
out->active = 0;
 
-   if (!delayed_work_pending(>dapm.delayed_work))
-   schedule_delayed_work(>dapm.delayed_work,
- msecs_to_jiffies(1));
+   schedule_delayed_work(>dapm.delayed_work,
+ msecs_to_jiffies(1));
break;
}
 
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 04/25] ipw2x00: simplify scan_event handling

2012-12-21 Thread Tejun Heo

* Drop unnesssary delayd_work_pending() tests.

* Unify scan_event_{now|later} by using mod_delayed_work() w/ 0 delay
  for scan_event_now.

* Make ipw2200 scan_event handling match ipw2100 - use
  mod_delayed_work() w/ 0 delay for immediate scanning.

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Stanislav Yakovlev 
Cc: linux-wirel...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/net/wireless/ipw2x00/ipw2100.c | 31 ---
 drivers/net/wireless/ipw2x00/ipw2100.h |  3 +--
 drivers/net/wireless/ipw2x00/ipw2200.c | 13 +++--
 3 files changed, 12 insertions(+), 35 deletions(-)

diff --git a/drivers/net/wireless/ipw2x00/ipw2100.c 
b/drivers/net/wireless/ipw2x00/ipw2100.c
index d92b21a..b3ab7b7 100644
--- a/drivers/net/wireless/ipw2x00/ipw2100.c
+++ b/drivers/net/wireless/ipw2x00/ipw2100.c
@@ -2181,9 +2181,10 @@ static void isr_indicate_rf_kill(struct ipw2100_priv 
*priv, u32 status)
mod_delayed_work(system_wq, >rf_kill, round_jiffies_relative(HZ));
 }
 
-static void send_scan_event(void *data)
+static void ipw2100_scan_event(struct work_struct *work)
 {
-   struct ipw2100_priv *priv = data;
+   struct ipw2100_priv *priv = container_of(work, struct ipw2100_priv,
+scan_event.work);
union iwreq_data wrqu;
 
wrqu.data.length = 0;
@@ -2191,18 +2192,6 @@ static void send_scan_event(void *data)
wireless_send_event(priv->net_dev, SIOCGIWSCAN, , NULL);
 }
 
-static void ipw2100_scan_event_later(struct work_struct *work)
-{
-   send_scan_event(container_of(work, struct ipw2100_priv,
-   scan_event_later.work));
-}
-
-static void ipw2100_scan_event_now(struct work_struct *work)
-{
-   send_scan_event(container_of(work, struct ipw2100_priv,
-   scan_event_now));
-}
-
 static void isr_scan_complete(struct ipw2100_priv *priv, u32 status)
 {
IPW_DEBUG_SCAN("scan complete\n");
@@ -2212,13 +2201,11 @@ static void isr_scan_complete(struct ipw2100_priv 
*priv, u32 status)
 
/* Only userspace-requested scan completion events go out immediately */
if (!priv->user_requested_scan) {
-   if (!delayed_work_pending(>scan_event_later))
-   schedule_delayed_work(>scan_event_later,
- 
round_jiffies_relative(msecs_to_jiffies(4000)));
+   schedule_delayed_work(>scan_event,
+ 
round_jiffies_relative(msecs_to_jiffies(4000)));
} else {
priv->user_requested_scan = 0;
-   cancel_delayed_work(>scan_event_later);
-   schedule_work(>scan_event_now);
+   mod_delayed_work(system_wq, >scan_event, 0);
}
 }
 
@@ -4459,8 +4446,7 @@ static void ipw2100_kill_works(struct ipw2100_priv *priv)
cancel_delayed_work_sync(>wx_event_work);
cancel_delayed_work_sync(>hang_check);
cancel_delayed_work_sync(>rf_kill);
-   cancel_work_sync(>scan_event_now);
-   cancel_delayed_work_sync(>scan_event_later);
+   cancel_delayed_work_sync(>scan_event);
 }
 
 static int ipw2100_tx_allocate(struct ipw2100_priv *priv)
@@ -6195,8 +6181,7 @@ static struct net_device *ipw2100_alloc_device(struct 
pci_dev *pci_dev,
INIT_DELAYED_WORK(>wx_event_work, ipw2100_wx_event_work);
INIT_DELAYED_WORK(>hang_check, ipw2100_hang_check);
INIT_DELAYED_WORK(>rf_kill, ipw2100_rf_kill);
-   INIT_WORK(>scan_event_now, ipw2100_scan_event_now);
-   INIT_DELAYED_WORK(>scan_event_later, ipw2100_scan_event_later);
+   INIT_DELAYED_WORK(>scan_event, ipw2100_scan_event);
 
tasklet_init(>irq_tasklet, (void (*)(unsigned long))
 ipw2100_irq_tasklet, (unsigned long)priv);
diff --git a/drivers/net/wireless/ipw2x00/ipw2100.h 
b/drivers/net/wireless/ipw2x00/ipw2100.h
index 5fe17cb..c6d7879 100644
--- a/drivers/net/wireless/ipw2x00/ipw2100.h
+++ b/drivers/net/wireless/ipw2x00/ipw2100.h
@@ -577,8 +577,7 @@ struct ipw2100_priv {
struct delayed_work wx_event_work;
struct delayed_work hang_check;
struct delayed_work rf_kill;
-   struct work_struct scan_event_now;
-   struct delayed_work scan_event_later;
+   struct delayed_work scan_event;
 
int user_requested_scan;
 
diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c 
b/drivers/net/wireless/ipw2x00/ipw2200.c
index 844f201..2c2d6db 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -4480,18 +4480,11 @@ static void handle_scan_event(struct ipw_priv *priv)
 {
/* Only userspace-requested scan completion events go out immediately */
if (!priv->user_requested_scan) {
-   if (!delayed_work_pending(>scan_event))
-

[PATCH 07/25] mwifiex: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

Drop work_pending() test from mwifiex_sdio_card_reset().  As
work_pending() becomes %false before sdio_card_reset_worker() starts
executing, it doesn't really protect anything.  reset_host may change
between mmc_remove_host() and mmc_add_host().  Make
sdio_card_reset_worker() cache the target mmc_host so that it isn't
affected by mwifiex_sdio_card_reset() racing with it.

Only compile tested.

Signed-off-by: Tejun Heo 
Cc: Bing Zhao 
Cc: linux-wirel...@vger.kernel.org
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/net/wireless/mwifiex/sdio.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/mwifiex/sdio.c 
b/drivers/net/wireless/mwifiex/sdio.c
index 5a1c1d0..f2874c3 100644
--- a/drivers/net/wireless/mwifiex/sdio.c
+++ b/drivers/net/wireless/mwifiex/sdio.c
@@ -1752,6 +1752,8 @@ mwifiex_update_mp_end_port(struct mwifiex_adapter 
*adapter, u16 port)
 static struct mmc_host *reset_host;
 static void sdio_card_reset_worker(struct work_struct *work)
 {
+   struct mmc_host *target = reset_host;
+
/* The actual reset operation must be run outside of driver thread.
 * This is because mmc_remove_host() will cause the device to be
 * instantly destroyed, and the driver then needs to end its thread,
@@ -1761,10 +1763,10 @@ static void sdio_card_reset_worker(struct work_struct 
*work)
 */
 
pr_err("Resetting card...\n");
-   mmc_remove_host(reset_host);
+   mmc_remove_host(target);
/* 20ms delay is based on experiment with sdhci controller */
mdelay(20);
-   mmc_add_host(reset_host);
+   mmc_add_host(target);
 }
 static DECLARE_WORK(card_reset_work, sdio_card_reset_worker);
 
@@ -1773,9 +1775,6 @@ static void mwifiex_sdio_card_reset(struct 
mwifiex_adapter *adapter)
 {
struct sdio_mmc_card *card = adapter->card;
 
-   if (work_pending(_reset_work))
-   return;
-
reset_host = card->func->card->host;
schedule_work(_reset_work);
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/25] ab8500_charger: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it.  Most uses are unnecessary
and quite a few of them are buggy.

Remove unnecessary pending tests from ab8500_charger.  Only compile
tested.

Signed-off-by: Tejun Heo 
Cc: Srinidhi Kasagar 
Cc: Linus Walleij 
---
Please let me know how this patch should be routed.  I can take it
through the workqueue tree if necessary.

Thanks.

 drivers/power/ab8500_charger.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/power/ab8500_charger.c b/drivers/power/ab8500_charger.c
index 3be9c0e..40de240 100644
--- a/drivers/power/ab8500_charger.c
+++ b/drivers/power/ab8500_charger.c
@@ -1319,8 +1319,7 @@ static int ab8500_charger_usb_en(struct ux500_charger 
*charger,
dev_dbg(di->dev, "%s Disabled USB charging\n", __func__);
 
/* Cancel any pending Vbat check work */
-   if (delayed_work_pending(>check_vbat_work))
-   cancel_delayed_work(>check_vbat_work);
+   cancel_delayed_work(>check_vbat_work);
 
}
ab8500_power_supply_changed(di, >usb_chg.psy);
@@ -2460,11 +2459,8 @@ static int ab8500_charger_resume(struct platform_device 
*pdev)
dev_err(di->dev, "Failed to kick WD!\n");
 
/* If not already pending start a new timer */
-   if (!delayed_work_pending(
-   >kick_wd_work)) {
-   queue_delayed_work(di->charger_wq, >kick_wd_work,
-   round_jiffies(WD_KICK_INTERVAL));
-   }
+   queue_delayed_work(di->charger_wq, >kick_wd_work,
+  round_jiffies(WD_KICK_INTERVAL));
}
 
/* If we still have a HW failure, schedule a new check */
@@ -2482,8 +2478,7 @@ static int ab8500_charger_suspend(struct platform_device 
*pdev,
struct ab8500_charger *di = platform_get_drvdata(pdev);
 
/* Cancel any pending HW failure check */
-   if (delayed_work_pending(>check_hw_failure_work))
-   cancel_delayed_work(>check_hw_failure_work);
+   cancel_delayed_work(>check_hw_failure_work);
 
return 0;
 }
-- 
1.8.0.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCHSET] workqueue: don't use [delayed_]work_pending()

2012-12-21 Thread Tejun Heo

Hello,

Given the current set of workqueue APIs, there are very few cases
where [delayed_]work_pending() are actually necessary; however, it's
seemingly somewhat popular for a few purposes including skipping
queue/flush/cancel depending on the current state for optimization.

work_pending() could be slightly cheaper than performing the actual
operation because it can skip atomic bitops assuming that the user is
synchronizing against other workqueue operations properly; however,
most paths with this type of optimization are siberia-cold for this
level of optimization to matter - e.g. driver detach path or parameter
update via sysfs - and it's easy to get it subtly wrong and introduce
difficult-to-trigger race conditions.  It just isn't worth it.

Other use cases include using work_pending() state to decide the state
of previously scheduled async action.  This too, unfortunately, seems
easy to get wrong.  Several users forgot that work_pending() becomes
false *before* the work item starts execution and fails to synchronize
with on-going execution.  Unless one is specifically looking for
those, they can be tricky to spot.

Overall, [delayed_]work_pending() seem to bring more troubles than
benefits and not using them usually results in better code.  This
patchset removes [delayed_]work_pending() usages from various
subsystems.  A lot are straight-forward removal of unnecessary
optimizations.  Some fix bugs around work item handling.  Others
restructure code so that [delayed_]work_pending() isn't necessary.

After this patchset, there remain a handful of
[delayed_]work_pending() users.  Some of them legit.  Some quite
broken.  Hopefully, they can be converted too and we can unexport
these easy-to-misuse interface.

This patchset contains the following 25 patches.

 0001-charger_manager-don-t-use-delayed_-work_pending.patch
 0002-ab8500_charger-don-t-use-delayed_-work_pending.patch
 0003-sja1000-don-t-use-delayed_-work_pending.patch
 0004-ipw2x00-simplify-scan_event-handling.patch
 0005-devfreq-don-t-use-delayed_-work_pending.patch
 0006-libertas-don-t-use-delayed_-work_pending.patch
 0007-mwifiex-don-t-use-delayed_-work_pending.patch
 0008-thinkpad_acpi-don-t-use-delayed_-work_pending.patch
 0009-wl1251-don-t-use-delayed_-work_pending.patch
 0010-kprobes-fix-wait_for_kprobe_optimizer.patch
 0011-pm-don-t-use-delayed_-work_pending.patch
 0012-bluetooth-l2cap-don-t-use-delayed_-work_pending.patch
 0013-sound-wm8350-don-t-use-delayed_-work_pending.patch
 0014-rfkill-don-t-use-delayed_-work_pending.patch
 0015-x86-mce-don-t-use-delayed_-work_pending.patch
 0016-PM-Domains-don-t-use-delayed_-work_pending.patch
 0017-wm97xx-don-t-use-delayed_-work_pending.patch
 0018-TMIO-MMC-don-t-use-delayed_-work_pending.patch
 0019-net-caif-don-t-use-delayed_-work_pending.patch
 0020-wimax-i2400m-fix-i2400m-wake_tx_skb-handling.patch
 0021-tty-max3100-don-t-use-delayed_-work_pending.patch
 0022-usb-at91_udc-don-t-use-delayed_-work_pending.patch
 0023-video-exynos-don-t-use-delayed_-work_pending.patch
 0024-debugobjects-don-t-use-delayed_-work_pending.patch
 0025-ipc-don-t-use-delayed_-work_pending.patch

And available in the following git branch.

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git 
review-work_pending-cleanup

diffstat follows.

 arch/x86/kernel/cpu/mcheck/mce.c|   10 ++
 drivers/base/power/domain.c |3 +--
 drivers/devfreq/devfreq.c   |3 +--
 drivers/input/touchscreen/wm97xx-core.c |4 +---
 drivers/mmc/host/tmio_mmc_pio.c |3 +--
 drivers/net/caif/caif_shmcore.c |   15 +--
 drivers/net/can/sja1000/peak_pci.c  |3 +--
 drivers/net/wimax/i2400m/netdev.c   |   31 +--
 drivers/net/wireless/ipw2x00/ipw2100.c  |   31 ---
 drivers/net/wireless/ipw2x00/ipw2100.h  |3 +--
 drivers/net/wireless/ipw2x00/ipw2200.c  |   13 +++--
 drivers/net/wireless/libertas/cfg.c |2 +-
 drivers/net/wireless/libertas/if_sdio.c |9 -
 drivers/net/wireless/mwifiex/sdio.c |9 -
 drivers/net/wireless/ti/wl1251/ps.c |3 +--
 drivers/platform/x86/thinkpad_acpi.c|3 +--
 drivers/power/ab8500_charger.c  |   13 -
 drivers/power/charger-manager.c |   31 ---
 drivers/tty/serial/max3100.c|3 +--
 drivers/usb/gadget/at91_udc.c   |3 +--
 drivers/video/exynos/exynos_dp_core.c   |6 ++
 include/net/bluetooth/l2cap.h   |   24 
 ipc/util.c  |3 +--
 kernel/kprobes.c|   23 +++
 kernel/power/autosleep.c|2 +-
 kernel/power/qos.c  |9 +++--
 lib/debugobjects.c  |7 +++
 net/bluetooth/l2cap_core.c  |7 +++
 net/rfkill/input.c  |8 +++-

Re: [PATCH 3/3] perf tool: Add non arch events for SandyBridge microarchitecture

2012-12-21 Thread Vince Weaver

On Fri, 21 Dec 2012, Andi Kleen wrote:

> > I hate to sound like a broken record here, but, again, what's the 
> > rationalization for not using libpfm4 here?
> 
> Personally I always hated the libpfm4 syntax. It's even worse than
> oprofile.

how so?  The libpfm4 event names are more or less the same as that from 
the manuals with the exception that . is replaced with : in some cases.

> I'm probably biased, but it's usually best to use the format the CPU
> vendor releases the original event tables in. That gives you
> the fastest access with the minimum amount of hazzle.

Which vendors provide "original tables"?  None as far as I know, you are 
stuck digging through PDFs.

Unless you mean the tables that come with Vtune... are those licensed in a 
way that is useful?

libpfm4 also provides at least minimal event descriptions along with each 
event, will the kernel/perf provide those too?

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 >

1 - 100 of 858 matches

Mail list logo