Re: [PATCH 1/3] input: do not use tasklet_disable before tasklet_kill

2012-11-24 Thread Dmitry Torokhov
Hi Xiaotian,

On Wed, Oct 31, 2012 at 04:05:59PM +0800, Xiaotian Feng wrote:
> If tasklet_disable() is called before related tasklet handled,
> tasklet_kill will never be finished. tasklet_kill is enough.
> 

Could you please elaborate on this? Needing to disable tasket before
killing it is quite often needed when dealing with self-rescheduling
tasklets and so tasklet_disable() followed by tasklet_kill() must work.
If it does not we need to take care of it in softirq code instead of
individual drivers.

> Signed-off-by: Xiaotian Feng 
> Cc: Dmitry Torokhov  
> Cc: Tony Lindgren 
> Cc: Sourav Poddar 
> Cc: Josh 
> Cc: Greg Kroah-Hartman 
> Cc: linux-in...@vger.kernel.org
> ---
>  drivers/input/keyboard/omap-keypad.c |3 +--
>  drivers/input/serio/hil_mlc.c|1 -
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/input/keyboard/omap-keypad.c 
> b/drivers/input/keyboard/omap-keypad.c
> index 4a5fcc8..6c52447 100644
> --- a/drivers/input/keyboard/omap-keypad.c
> +++ b/drivers/input/keyboard/omap-keypad.c
> @@ -362,12 +362,11 @@ static int __devexit omap_kp_remove(struct 
> platform_device *pdev)
>   struct omap_kp *omap_kp = platform_get_drvdata(pdev);
>  
>   /* disable keypad interrupt handling */
> - tasklet_disable(_tasklet);
> + tasklet_kill(_tasklet);
>   omap_writew(1, OMAP1_MPUIO_BASE + OMAP_MPUIO_KBD_MASKIT);
>   free_irq(omap_kp->irq, omap_kp);
>  
>   del_timer_sync(_kp->timer);
> - tasklet_kill(_tasklet);

Exactly like here. If we do not disable tasklet before disabling IRQ and
freeing timer we may get into scenario when timer schedules tasket and
tasklet schedules timer again after we canceled it.

>  
>   /* unregister everything */
>   input_unregister_device(omap_kp->input);
> diff --git a/drivers/input/serio/hil_mlc.c b/drivers/input/serio/hil_mlc.c
> index bfd3865..7fc1700 100644
> --- a/drivers/input/serio/hil_mlc.c
> +++ b/drivers/input/serio/hil_mlc.c
> @@ -1011,7 +1011,6 @@ static void __exit hil_mlc_exit(void)
>  {
>   del_timer_sync(_mlcs_kicker);
>  
> - tasklet_disable(_mlcs_tasklet);
>   tasklet_kill(_mlcs_tasklet);

This seems like safe change.

Thanks.

-- 
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] balancenuma: add stats for huge pmd numa faults

2012-11-24 Thread Hillf Danton
On 11/24/12, Mel Gorman  wrote:
> On Sat, Nov 24, 2012 at 12:17:03PM +0800, Hillf Danton wrote:
>> A thp contributes 512 times more than a regular page to numa fault stats,
>> so deserves its own vm event counter. THP migration is also accounted.
>>
>
> I agree and mentioned it needed fixing. I did not create a new counter
> but I properly account for PGMIGRATE_SUCCESS and PGMIGRATE_FAIL now. I
> did not create a new NUMA_PAGE_MIGRATE counter because I didn't feel it
> was necessary. Instead I just do this
>
> count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR);
>
It could be read as: 512 pages are successfully migrated(though at the
cost of actually one page).

> count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR);
>
ditto, 512 pages go through migration(though actually only one page
takes the hard journey).

That said, in short, the new counters are different and clearer.

Hillf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread Yinghai Lu
On Sat, Nov 24, 2012 at 9:52 PM, H. Peter Anvin  wrote:
> But it doesn't solve the bigger problem, and it is just begging to be gotten 
> wrong.
>>
>>later all new kernel need to check USE_EXT_BOOT_PARAMS bit for all new
>>added field in boot_params.

Do you mean
later someone would forget checking USE_EXT_BOOT_PARAMS when accessing
new added fields in boot_params?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/19] sched, numa, mm: Describe the NUMA scheduling problem formally

2012-11-24 Thread abhishek agarwal
as per 4) move towards where "most" memory. If we have a large shared
memory than private memnory. Why not we just move the process towrds
the memory.. instead of the memory moving towards the node. This will
i guess be less cumbersome, then moving all the shared memory

On Fri, Nov 16, 2012 at 9:55 PM, Ingo Molnar  wrote:
> From: Peter Zijlstra 
>
> This is probably a first: formal description of a complex high-level
> computing problem, within the kernel source.
>
> Signed-off-by: Peter Zijlstra 
> Cc: Linus Torvalds 
> Cc: Andrew Morton 
> Cc: Peter Zijlstra 
> Cc: "H. Peter Anvin" 
> Cc: Mike Galbraith 
> Rik van Riel 
> Link: http://lkml.kernel.org/n/tip-mmnlpupoetcatimvjeld1...@git.kernel.org
> [ Next step: generate the kernel source from such formal descriptions and 
> retire to a tropical island! ]
> Signed-off-by: Ingo Molnar 
> ---
>  Documentation/scheduler/numa-problem.txt | 230 
> +++
>  1 file changed, 230 insertions(+)
>  create mode 100644 Documentation/scheduler/numa-problem.txt
>
> diff --git a/Documentation/scheduler/numa-problem.txt 
> b/Documentation/scheduler/numa-problem.txt
> new file mode 100644
> index 000..a5d2fee
> --- /dev/null
> +++ b/Documentation/scheduler/numa-problem.txt
> @@ -0,0 +1,230 @@
> +
> +
> +Effective NUMA scheduling problem statement, described formally:
> +
> + * minimize interconnect traffic
> +
> +For each task 't_i' we have memory, this memory can be spread over multiple
> +physical nodes, let us denote this as: 'p_i,k', the memory task 't_i' has on
> +node 'k' in [pages].
> +
> +If a task shares memory with another task let us denote this as:
> +'s_i,k', the memory shared between tasks including 't_i' residing on node
> +'k'.
> +
> +Let 'M' be the distribution that governs all 'p' and 's', ie. the page 
> placement.
> +
> +Similarly, lets define 'fp_i,k' and 'fs_i,k' resp. as the (average) usage
> +frequency over those memory regions [1/s] such that the product gives an
> +(average) bandwidth 'bp' and 'bs' in [pages/s].
> +
> +(note: multiple tasks sharing memory naturally avoid duplicat accounting
> +   because each task will have its own access frequency 'fs')
> +
> +(pjt: I think this frequency is more numerically consistent if you explicitly
> +  restrict p/s above to be the working-set. (It also makes explicit the
> +  requirement for  to change about a change in the working set.)
> +
> +  Doing this does have the nice property that it lets you use your 
> frequency
> +  measurement as a weak-ordering for the benefit a task would receive 
> when
> +  we can't fit everything.
> +
> +  e.g. task1 has working set 10mb, f=90%
> +   task2 has working set 90mb, f=10%
> +
> +  Both are using 9mb/s of bandwidth, but we'd expect a much larger 
> benefit
> +  from task1 being on the right node than task2. )
> +
> +Let 'C' map every task 't_i' to a cpu 'c_i' and its corresponding node 'n_i':
> +
> +  C: t_i -> {c_i, n_i}
> +
> +This gives us the total interconnect traffic between nodes 'k' and 'l',
> +'T_k,l', as:
> +
> +  T_k,l = \Sum_i bp_i,l + bs_i,l + \Sum bp_j,k + bs_j,k where n_i == k, n_j 
> == l
> +
> +And our goal is to obtain C0 and M0 such that:
> +
> +  T_k,l(C0, M0) =< T_k,l(C, M) for all C, M where k != l
> +
> +(note: we could introduce 'nc(k,l)' as the cost function of accessing memory
> +   on node 'l' from node 'k', this would be useful for bigger NUMA 
> systems
> +
> + pjt: I agree nice to have, but intuition suggests diminishing returns on 
> more
> +  usual systems given factors like things like Haswell's enormous 35mb l3
> +  cache and QPI being able to do a direct fetch.)
> +
> +(note: do we need a limit on the total memory per node?)
> +
> +
> + * fairness
> +
> +For each task 't_i' we have a weight 'w_i' (related to nice), and each cpu
> +'c_n' has a compute capacity 'P_n', again, using our map 'C' we can 
> formulate a
> +load 'L_n':
> +
> +  L_n = 1/P_n * \Sum_i w_i for all c_i = n
> +
> +using that we can formulate a load difference between CPUs
> +
> +  L_n,m = | L_n - L_m |
> +
> +Which allows us to state the fairness goal like:
> +
> +  L_n,m(C0) =< L_n,m(C) for all C, n != m
> +
> +(pjt: It can also be usefully stated that, having converged at C0:
> +
> +   | L_n(C0) - L_m(C0) | <= 4/3 * | G_n( U(t_i, t_j) ) - G_m( U(t_i, t_j) ) |
> +
> +  Where G_n,m is the greedy partition of tasks between L_n and L_m. This 
> is
> +  the "worst" partition we should accept; but having it gives us a useful
> +  bound on how much we can reasonably adjust L_n/L_m at a Pareto point to
> +  favor T_n,m. )
> +
> +Together they give us the complete multi-objective optimization problem:
> +
> +  min_C,M [ L_n,m(C), T_k,l(C,M) ]
> +
> +
> +
> +Notes:
> +
> + - the memory bandwidth problem is very much an inter-process problem, in
> +   particular there is no such concept as a process in the above problem.
> +
> + - the naive solution would completely 

Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread H. Peter Anvin
But it doesn't solve the bigger problem, and it is just begging to be gotten 
wrong.

Yinghai Lu  wrote:

>On Sat, Nov 24, 2012 at 4:11 PM, Yinghai Lu  wrote:
>> On Sat, Nov 24, 2012 at 4:04 PM, H. Peter Anvin 
>wrote:
>>>
>>> It sounds like we are leaning toward some form of the sentinel hack,
>which
>>> means we need an enumerated list of things that should *not* be
>zeroed if
>>> the sentinel is present.
>>>
>>> The option of declaring the list frozen makes me a bit nervous,
>because it
>>> isn't clear that we don't already have fields that will be
>misinterpreted by
>>> the kernel if filled in from the file.
>>
>> USE_EXT_BOOT_PARAMS bit in xloadflags should work.
>
>new kexec will clean around bit around setup head, and set that bit,
>if it is not with real_mode entry.
>
>32bit and 64bit entry:
>old kernel has no idea of this bit, and still use old ramdisk_image,
>cmd_line_ptr in setup header.
>new kernel will check that bit before it use ext_ramdisk_image, and
>ext_cmd_line_ptr.
>
>old kexec and new kernel is safe too, because that bit is not set, new
>kernel will not use ex_...
>
>later all new kernel need to check USE_EXT_BOOT_PARAMS bit for all new
>added field in boot_params.

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread Yinghai Lu
On Sat, Nov 24, 2012 at 4:11 PM, Yinghai Lu  wrote:
> On Sat, Nov 24, 2012 at 4:04 PM, H. Peter Anvin  wrote:
>>
>> It sounds like we are leaning toward some form of the sentinel hack, which
>> means we need an enumerated list of things that should *not* be zeroed if
>> the sentinel is present.
>>
>> The option of declaring the list frozen makes me a bit nervous, because it
>> isn't clear that we don't already have fields that will be misinterpreted by
>> the kernel if filled in from the file.
>
> USE_EXT_BOOT_PARAMS bit in xloadflags should work.

new kexec will clean around bit around setup head, and set that bit,
if it is not with real_mode entry.

32bit and 64bit entry:
old kernel has no idea of this bit, and still use old ramdisk_image,
cmd_line_ptr in setup header.
new kernel will check that bit before it use ext_ramdisk_image, and
ext_cmd_line_ptr.

old kexec and new kernel is safe too, because that bit is not set, new
kernel will not use ex_...

later all new kernel need to check USE_EXT_BOOT_PARAMS bit for all new
added field in boot_params.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging: gdm72xx: protect access of rx / tx structs

2012-11-24 Thread Ben Chan
This patch applies spinlock to protect access to rx / tx structs in
certain call sites, which fixes the following crash in gdm_suspend.
It also fixes usb_set_intfdata() in gdm_usb_probe to avoid setting an
already freed phy_dev.

<5>[ 4996.815018] [<7f0074b0>] (gdm_suspend+0x1c/0x2b4 [gdmwm]) from 
[<803020a4>] (usb_suspend_both+0x80/0x1a0)
<5>[ 4996.815055] [<803020a4>] (usb_suspend_both+0x80/0x1a0) from [<80302c84>] 
(usb_runtime_suspend+0x38/0x64)
<5>[ 4996.815089] [<80302c84>] (usb_runtime_suspend+0x38/0x64) from 
[<802becc0>] (__rpm_callback+0x48/0x78)
<5>[ 4996.815118] [<802becc0>] (__rpm_callback+0x48/0x78) from [<802bf8dc>] 
(rpm_suspend+0x394/0x5ec)
<5>[ 4996.815145] [<802bf8dc>] (rpm_suspend+0x394/0x5ec) from [<802c0550>] 
(pm_runtime_work+0x8c/0xa4)
<5>[ 4996.815177] [<802c0550>] (pm_runtime_work+0x8c/0xa4) from [<800456cc>] 
(process_one_work+0x264/0x438)
<5>[ 4996.815209] [<800456cc>] (process_one_work+0x264/0x438) from [<80045acc>] 
(worker_thread+0x22c/0x3b8)
<5>[ 4996.815239] [<80045acc>] (worker_thread+0x22c/0x3b8) from [<8004a43c>] 
(kthread+0x9c/0xa8)
<5>[ 4996.815270] [<8004a43c>] (kthread+0x9c/0xa8) from [<8000f160>] 
(kernel_thread_exit+0x0/0x8)
<0>[ 4996.815295] Code: e92d4000 e8bd4000 e2800020 eb4ab9a1 (e5905000)

Signed-off-by: Ben Chan 
Signed-off-by: Sameer Nanda 
---
 drivers/staging/gdm72xx/gdm_usb.c |   52 -
 1 files changed, 45 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/gdm72xx/gdm_usb.c 
b/drivers/staging/gdm72xx/gdm_usb.c
index 0cc6317..4426941 100644
--- a/drivers/staging/gdm72xx/gdm_usb.c
+++ b/drivers/staging/gdm72xx/gdm_usb.c
@@ -186,6 +186,7 @@ static int init_usb(struct usbwm_dev *udev)
struct rx_cxt   *rx = >rx;
struct usb_tx   *t;
struct usb_rx   *r;
+   unsigned long flags;
 
INIT_LIST_HEAD(>free_list);
INIT_LIST_HEAD(>sdu_list);
@@ -200,6 +201,7 @@ static int init_usb(struct usbwm_dev *udev)
spin_lock_init(>lock);
spin_lock_init(>lock);
 
+   spin_lock_irqsave(>lock, flags);
for (i = 0; i < MAX_NR_SDU_BUF; i++) {
t = alloc_tx_struct(tx);
if (t == NULL) {
@@ -208,6 +210,7 @@ static int init_usb(struct usbwm_dev *udev)
}
list_add(>list, >free_list);
}
+   spin_unlock_irqrestore(>lock, flags);
 
r = alloc_rx_struct(rx);
if (r == NULL) {
@@ -215,7 +218,9 @@ static int init_usb(struct usbwm_dev *udev)
goto fail;
}
 
+   spin_lock_irqsave(>lock, flags);
list_add(>list, >free_list);
+   spin_unlock_irqrestore(>lock, flags);
return ret;
 
 fail:
@@ -229,6 +234,9 @@ static void release_usb(struct usbwm_dev *udev)
struct rx_cxt   *rx = >rx;
struct usb_tx   *t, *t_next;
struct usb_rx   *r, *r_next;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
 
list_for_each_entry_safe(t, t_next, >sdu_list, list) {
list_del(>list);
@@ -245,6 +253,10 @@ static void release_usb(struct usbwm_dev *udev)
free_tx_struct(t);
}
 
+   spin_unlock_irqrestore(>lock, flags);
+
+   spin_lock_irqsave(>lock, flags);
+
list_for_each_entry_safe(r, r_next, >free_list, list) {
list_del(>list);
free_rx_struct(r);
@@ -254,6 +266,8 @@ static void release_usb(struct usbwm_dev *udev)
list_del(>list);
free_rx_struct(r);
}
+
+   spin_unlock_irqrestore(>lock, flags);
 }
 
 static void __gdm_usb_send_complete(struct urb *urb)
@@ -302,7 +316,7 @@ static int gdm_usb_send(void *priv_dev, void *data, int len,
int no_spc = 0, ret;
u8 *pkt = data;
u16 cmd_evt;
-   unsigned long flags;
+   unsigned long flags, flags2;
 
if (!udev->usbdev) {
dev_err(>dev, "%s: No such device\n", __func__);
@@ -371,13 +385,16 @@ static int gdm_usb_send(void *priv_dev, void *data, int 
len,
 
rx = >rx;
 
+   spin_lock_irqsave(>lock, flags2);
list_for_each_entry(r, >used_list, list)
usb_unlink_urb(r->urb);
+   spin_unlock_irqrestore(>lock, flags2);
+
udev->bw_switch = 1;
 
-   spin_lock(_lock);
+   spin_lock_irqsave(_lock, flags2);
list_add_tail(>list, _list);
-   spin_unlock(_lock);
+   spin_unlock_irqrestore(_lock, flags2);
 
wake_up(_wait);
}
@@ -416,7 +433,7 @@ static void gdm_usb_rcv_complete(struct urb *urb)
struct tx_cxt *tx = >tx;
struct usb_tx *t;
u16 cmd_evt;
-   unsigned long flags;
+   unsigned long flags, flags2;
 
 #ifdef CONFIG_WIMAX_GDM72XX_USB_PM
struct usb_device *dev = urb->dev;
@@ -462,9 +479,9 @@ static void gdm_usb_rcv_complete(struct urb *urb)
if (!urb->status && r->callback)

Re: [patch 6/8] kcmp selftests: build fix

2012-11-24 Thread Dave Young
On Sat, Nov 24, 2012 at 11:41:23AM +0200, Pekka Enberg wrote:
> On Sat, Nov 24, 2012 at 10:29 AM,   wrote:
> > For old glibc there's no the syscall number this tests will cause
> > make run_tests fail.
> > Add a macro to define the number. This should be ok because it will be
> > built in latest kernel source.
> >
> > Signed-off-by: Dave Young 
> > ---
> >  tools/testing/selftests/kcmp/kcmp_test.c |3 +++
> >  1 file changed, 3 insertions(+)
> >
> > --- linux-2.6.orig/tools/testing/selftests/kcmp/kcmp_test.c 2012-11-23 
> > 22:37:04.789058192 +0800
> > +++ linux-2.6/tools/testing/selftests/kcmp/kcmp_test.c  2012-11-23 
> > 22:38:43.195191747 +0800
> > @@ -17,6 +17,9 @@
> >  #include 
> >  #include 
> >
> > +#ifndef __NR_kcmp
> > +#define __NR_kcmp 272
> > +#endif
> 
> Is the syscall number really going to be the same across all architectures?

Oh, they are different. self NACK. Please ignore this patch.

> 
> >  static long sys_kcmp(int pid1, int pid2, int type, int fd1, int fd2)
> >  {
> > return syscall(__NR_kcmp, pid1, pid2, type, fd1, fd2);
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] kvm: remove max_high field in rb_int_node structure

2012-11-24 Thread Michel Lespinasse
Since nothing depends on the max_high field values anymore, we can just
remove the field and the code that was used to maintain it.

Signed-off-by: Michel Lespinasse 

---
 tools/kvm/include/kvm/rbtree-interval.h |   13 ---
 tools/kvm/util/rbtree-interval.c|   58 +--
 2 files changed, 8 insertions(+), 63 deletions(-)

diff --git a/tools/kvm/include/kvm/rbtree-interval.h 
b/tools/kvm/include/kvm/rbtree-interval.h
index fb2102ab33a6..730eb5e8551d 100644
--- a/tools/kvm/include/kvm/rbtree-interval.h
+++ b/tools/kvm/include/kvm/rbtree-interval.h
@@ -1,20 +1,17 @@
 #ifndef KVM__INTERVAL_RBTREE_H
 #define KVM__INTERVAL_RBTREE_H
 
-#include 
+#include 
 #include 
 
 #define RB_INT_INIT(l, h) \
-   (struct rb_int_node){.low = l, .high = h, .max_high = h}
+   (struct rb_int_node){.low = l, .high = h}
 #define rb_int(n) rb_entry(n, struct rb_int_node, node)
 
 struct rb_int_node {
struct rb_node  node;
u64 low;
u64 high;
-
-   /* max_high will store the highest high of it's 2 children. */
-   u64 max_high;
 };
 
 /* Return the rb_int_node interval in which 'point' is located. */
@@ -24,6 +21,10 @@ struct rb_int_node *rb_int_search_single(struct rb_root 
*root, u64 point);
 struct rb_int_node *rb_int_search_range(struct rb_root *root, u64 low, u64 
high);
 
 int rb_int_insert(struct rb_root *root, struct rb_int_node *data);
-void rb_int_erase(struct rb_root *root, struct rb_int_node *node);
+
+static inline void rb_int_erase(struct rb_root *root, struct rb_int_node *node)
+{
+   rb_erase(>node, root);
+}
 
 #endif
diff --git a/tools/kvm/util/rbtree-interval.c b/tools/kvm/util/rbtree-interval.c
index 740ff0d87536..3630a6d80d6e 100644
--- a/tools/kvm/util/rbtree-interval.c
+++ b/tools/kvm/util/rbtree-interval.c
@@ -35,57 +35,6 @@ struct rb_int_node *rb_int_search_range(struct rb_root 
*root, u64 low, u64 high)
return range;
 }
 
-/*
- * Update a node after it has been linked into the tree:
- */
-static void propagate_callback(struct rb_node *node, struct rb_node *stop)
-{
-   struct rb_int_node *i_node;
-
-   if (node == stop)
-   return;
-
-   i_node = rb_int(node);
-   i_node->max_high = i_node->high;
-
-   if (node->rb_left)
-   i_node->max_high = max(i_node->max_high, 
rb_int(node->rb_left)->max_high);
-   if (node->rb_right)
-   i_node->max_high = max(i_node->max_high, 
rb_int(node->rb_right)->max_high);
-}
-
-/*
- * Copy the extra data to a new node:
- */
-static void copy_callback(struct rb_node *node_old, struct rb_node *node_new)
-{
-   struct rb_int_node *i_node_old = rb_int(node_old);
-   struct rb_int_node *i_node_new = rb_int(node_new);
-
-   i_node_new->low = i_node_old->low;
-   i_node_new->high= i_node_old->high;
-
-   i_node_new->max_high= i_node_old->max_high;
-}
-
-/*
- * Update after tree rotation:
- */
-static void rotate_callback(struct rb_node *node_old, struct rb_node *node_new)
-{
-   propagate_callback(node_old, NULL);
-   propagate_callback(node_new, NULL);
-}
-
-/*
- * All augmented rbtree callbacks:
- */
-struct rb_augment_callbacks callbacks = {
-   .propagate  = propagate_callback,
-   .copy   = copy_callback,
-   .rotate = rotate_callback,
-};
-
 int rb_int_insert(struct rb_root *root, struct rb_int_node *i_node)
 {
struct rb_node **node = >rb_node, *parent = NULL;
@@ -103,12 +52,7 @@ int rb_int_insert(struct rb_root *root, struct rb_int_node 
*i_node)
}
 
rb_link_node(_node->node, parent, node);
-   rb_insert_augmented(_node->node, root, );
+   rb_insert_color(_node->node, root);
 
return 0;
 }
-
-void rb_int_erase(struct rb_root *root, struct rb_int_node *node)
-{
-   rb_erase_augmented(>node, root, );
-}
-- 
1.7.7.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3] kvm: rb_int_search_single simplification

2012-11-24 Thread Michel Lespinasse
As the rbtree intervals are not overlapping, rb_int_search_single can
trivially be implemented without making use of the max_high field.

Signed-off-by: Michel Lespinasse 

---
 tools/kvm/util/rbtree-interval.c |   18 +-
 1 files changed, 5 insertions(+), 13 deletions(-)

diff --git a/tools/kvm/util/rbtree-interval.c b/tools/kvm/util/rbtree-interval.c
index fd69252bea02..740ff0d87536 100644
--- a/tools/kvm/util/rbtree-interval.c
+++ b/tools/kvm/util/rbtree-interval.c
@@ -5,27 +5,19 @@
 struct rb_int_node *rb_int_search_single(struct rb_root *root, u64 point)
 {
struct rb_node *node = root->rb_node;
-   struct rb_node *lowest = NULL;
 
while (node) {
struct rb_int_node *cur = rb_int(node);
 
-   if (node->rb_left && (rb_int(node->rb_left)->max_high > point)) 
{
+   if (point < cur->low)
node = node->rb_left;
-   } else if (cur->low <= point && cur->high > point) {
-   lowest = node;
-   break;
-   } else if (point > cur->low) {
+   else if (cur->high <= point)
node = node->rb_right;
-   } else {
-   break;
-   }
+   else
+   return cur;
}
 
-   if (lowest == NULL)
-   return NULL;
-
-   return rb_int(lowest);
+   return NULL;
 }
 
 struct rb_int_node *rb_int_search_range(struct rb_root *root, u64 low, u64 
high)
-- 
1.7.7.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] kvm: ensure non-overlapping intervals in rb_int_insert()

2012-11-24 Thread Michel Lespinasse
The rbtree interval API is designed for handling non-overlapping intervals;
modify rb_int_insert() to guarantee this property is maintained by
returning -EEXIST when attempting to insert a new interval that overlaps
an existing interval.

Also fix an issue where the computation of 'result' could trigger an
integer overflow which would break the rbtree ordering.

Signed-off-by: Michel Lespinasse 

---
 tools/kvm/util/rbtree-interval.c |   10 +-
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/tools/kvm/util/rbtree-interval.c b/tools/kvm/util/rbtree-interval.c
index d7fa96a06a92..fd69252bea02 100644
--- a/tools/kvm/util/rbtree-interval.c
+++ b/tools/kvm/util/rbtree-interval.c
@@ -99,13 +99,13 @@ int rb_int_insert(struct rb_root *root, struct rb_int_node 
*i_node)
struct rb_node **node = >rb_node, *parent = NULL;
 
while (*node) {
-   int result = i_node->low - rb_int(*node)->low;
+   struct rb_int_node *cur = rb_int(*node);
 
parent = *node;
-   if (result < 0)
-   node= &((*node)->rb_left);
-   else if (result > 0)
-   node= &((*node)->rb_right);
+   if (i_node->high <= cur->low)
+   node = >node.rb_left;
+   else if (cur->high <= i_node->low)
+   node = >node.rb_right;
else
return -EEXIST;
}
-- 
1.7.7.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] remove kvm's use of augmented rbtree

2012-11-24 Thread Michel Lespinasse
On Thu, Nov 22, 2012 at 9:49 PM, Michel Lespinasse  wrote:
> On Thu, Nov 22, 2012 at 9:14 AM, Sasha Levin  wrote:
>> The following patch fixed the problem for me:
>>
>> diff --git a/include/linux/rbtree_augmented.h 
>> b/include/linux/rbtree_augmented.h
>> index 214caa3..5cfdca6 100644
>> --- a/include/linux/rbtree_augmented.h
>> +++ b/include/linux/rbtree_augmented.h
>> @@ -47,6 +47,7 @@ rb_insert_augmented(struct rb_node *node, struct rb_root 
>> *root,
>> const struct rb_augment_callbacks *augment)
>>  {
>> __rb_insert_augmented(node, root, augment->rotate);
>> +   augment->propagate(node, NULL);
>>  }
>
> This would work, but would slow down all sites which already take care
> of updating the augmented information before calling
> rb_insert_augmented, so please don't do that.
>
> The simplest fix would be to add the propagate call where your
> rb_insert_augmented() call site is; the better fix would be to do the
> update incrementally as you search down the tree for the insertion
> point; and the best fix may be to just avoid duplicating that code and
> use interval_tree.h (if your keys are longs) or
> interval_tree_generic.h to generate the proper insert / remove
> functions.

So I had a quick look at linux-next, and my understanding is that the
rbtree-interval API in kvm always stores non-overlapping intervals.
Based on this, the use of augmented rbtrees isn't really justified; it
is just as easy to use a simple rbtree of intervals sorted by the
addresses they cover.

This patchset was generated against the current linux-next. I only
verified that kvm still compiled; obviously this would need more
testing. On the other hand, there are currently some correctness
issues in kvm's implementatin of rbtree intervals, so I think this
simplification should be beneficial.

Michel Lespinasse (3):
  kvm: ensure non-overlapping intervals in rb_int_insert()
  kvm: rb_int_search_single simplification
  kvm: remove max_high field in rb_int_node structure

 tools/kvm/include/kvm/rbtree-interval.h |   13 +++--
 tools/kvm/util/rbtree-interval.c|   86 ---
 2 files changed, 18 insertions(+), 81 deletions(-)

Sasha, could you please check my logic and apply this to the kvm tree ?

Thanks,

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] UVC: use GFP_ATOMIC under spin lock.

2012-11-24 Thread Cyril Roelandt
Found using the following semantic patch:

@@
@@
spin_lock_irqsave(...);
... when != spin_unlock_irqrestore(...);
* GFP_KERNEL


Signed-off-by: Cyril Roelandt 
---
 drivers/usb/gadget/uvc_video.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/gadget/uvc_video.c b/drivers/usb/gadget/uvc_video.c
index b0e53a8..cd067a6 100644
--- a/drivers/usb/gadget/uvc_video.c
+++ b/drivers/usb/gadget/uvc_video.c
@@ -309,7 +309,8 @@ uvc_video_pump(struct uvc_video *video)
video->encode(req, video, buf);
 
/* Queue the USB request */
-   if ((ret = usb_ep_queue(video->ep, req, GFP_KERNEL)) < 0) {
+   ret = usb_ep_queue(video->ep, req, GFP_ATOMIC);
+   if (ret < 0) {
printk(KERN_INFO "Failed to queue request (%d)\n", ret);
usb_ep_set_halt(video->ep);
spin_unlock_irqrestore(>queue.irqlock, flags);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 10/11] x86, boot: add fields to support load bzImage and ramdisk above 4G

2012-11-24 Thread Yinghai Lu
On Sat, Nov 24, 2012 at 12:35 PM, Yinghai Lu  wrote:
> ext_ramdisk_image/size will record high 32bits for ramdisk info.
>
> xloadflags bit0 will be set if relocatable with 64bit.
>
> Let get_ramdisk_image/size to use ext_ramdisk_image/size to get
> right positon for ramdisk.
>
> bootloader will fill value to ext_ramdisk_image/size when it load
> ramdisk above 4G.
>
> Also bootloader will check if xloadflags bit0 is set to decicde if
> it could load ramdisk high above 4G.
>
> Update header version to 2.12.
>
> -v2: add ext_cmd_line_ptr for above 4G support.
> -v3: update to xloadflags from HPA.
> -v4: use fields from bootparam instead setup_header accoring to HPA.

-v5 attached...


ext_ramdisk_image_v5.patch
Description: Binary data


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread Yinghai Lu
On Sat, Nov 24, 2012 at 4:04 PM, H. Peter Anvin  wrote:
>
> It sounds like we are leaning toward some form of the sentinel hack, which
> means we need an enumerated list of things that should *not* be zeroed if
> the sentinel is present.
>
> The option of declaring the list frozen makes me a bit nervous, because it
> isn't clear that we don't already have fields that will be misinterpreted by
> the kernel if filled in from the file.

USE_EXT_BOOT_PARAMS bit in xloadflags should work.


ext_ramdisk_image.patch
Description: Binary data


Re: memory-cgroup bug

2012-11-24 Thread azurIt
>Could you take few snapshots over time?


Here it is, now from different server, snapshot was taken every second for 10 
minutes (hope it's enough):
www.watchdog.sk/lkml/memcg-bug-2.tar.gz
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread H. Peter Anvin

On 11/24/2012 04:04 PM, Yinghai Lu wrote:

On Sat, Nov 24, 2012 at 3:50 PM, Eric W. Biederman
 wrote:


I believe all added variables between the last version of the boot
protocol /sbin/kexec knows about and the current time were added in the
initialized data section.  Certainly we can check and that will tell us
how likely changes in arch/x86/boot/ have been regressions in the 32bit
entry point support.

As for solving this there is a simple solution.  Add a second jump
right after the first jump.   The variables after the second jump can
all be zero initialized.


could use .org to force start_of_setup start from 0x1000

but how about area before setup_header ? how it is full of EFI_STUB suff there.



Yes, it doesn't really solve the problem I fear.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread H. Peter Anvin

On 11/24/2012 03:50 PM, Eric W. Biederman wrote:


It was conservative at the time the code was introduced and it most
definitely is not wrong.  The code predates the verbage in boot.txt.
Apparently no one bothered to see what /sbin/kexec was actually doing
when they documented the 32bit boot loader interface.  I was under the
impression that it was actual practice that was documented but in this
particular something else was documented instead.  Since /sbin/kexec did
not need any of the more recent features we simply have not noticed it
until now.



The problem is that kexec and others didn't follow any protocol at all, 
but rather did something that happened to work... but could trivially be 
shown had no way of being forward compatible.



We could work around it with a sentinel hack... except you *also*
probably modify *some* fields and now we have a horrid mix of
initialized and uninitialized fields to sort out... and there really
isn't any sane way for the kernel to sort that out.

We have a huge problem on our hands now because of it.


So, given the mess we now have on our hands... any suggestions how to best solve
it?  There is the option of simply declaring old kexec binaries broken; they
will then not work reliably with newer kernels, if they even work reliably now
-- it is hard to know for certain.


I believe all added variables between the last version of the boot
protocol /sbin/kexec knows about and the current time were added in the
initialized data section.  Certainly we can check and that will tell us
how likely changes in arch/x86/boot/ have been regressions in the 32bit
entry point support.

As for solving this there is a simple solution.  Add a second jump
right after the first jump.   The variables after the second jump can
all be zero initialized.


It doesn't work for the variables *before* the initialized section, and 
that is actually where we have most problems... there really are only 
very few bytes left after the initialized section.  The reason we can't 
do anything about the area before it is because that has to have stuff 
in it, like the EFI header, to work.



And if we really care about breaking other boot loaders we can take a
survey and actually look and see what they do.  There really aren't that
many x86 boot loaders.


There are more than you think... a lot of them are hiding in grotty 
corners.  However, they are minority users.


It sounds like we are leaning toward some form of the sentinel hack, 
which means we need an enumerated list of things that should *not* be 
zeroed if the sentinel is present.


The option of declaring the list frozen makes me a bit nervous, because 
it isn't clear that we don't already have fields that will be 
misinterpreted by the kernel if filled in from the file.


-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread Yinghai Lu
On Sat, Nov 24, 2012 at 3:50 PM, Eric W. Biederman
 wrote:
>
> I believe all added variables between the last version of the boot
> protocol /sbin/kexec knows about and the current time were added in the
> initialized data section.  Certainly we can check and that will tell us
> how likely changes in arch/x86/boot/ have been regressions in the 32bit
> entry point support.
>
> As for solving this there is a simple solution.  Add a second jump
> right after the first jump.   The variables after the second jump can
> all be zero initialized.

could use .org to force start_of_setup start from 0x1000

but how about area before setup_header ? how it is full of EFI_STUB suff there.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread Eric W. Biederman
"H. Peter Anvin"  writes:

> On 11/24/2012 09:32 AM, H. Peter Anvin wrote:
>> On 11/24/2012 04:37 AM, Eric W. Biederman wrote:
>>>
>>> Certainly /sbin/kexec isn't bothering to calculate the end of the setup
>>> header and just being far more conservative and using all of the 16bit
>>> real mode code as it's initializer.
>>>
>>
>> That's not conservative... that's just plain wrong.  It means you're
>> initializing the fields in struct boot_params with garbage instead of a
>> predictable value (zero).

It was conservative at the time the code was introduced and it most
definitely is not wrong.  The code predates the verbage in boot.txt.
Apparently no one bothered to see what /sbin/kexec was actually doing
when they documented the 32bit boot loader interface.  I was under the
impression that it was actual practice that was documented but in this
particular something else was documented instead.  Since /sbin/kexec did
not need any of the more recent features we simply have not noticed it
until now.

>> We could work around it with a sentinel hack... except you *also*
>> probably modify *some* fields and now we have a horrid mix of
>> initialized and uninitialized fields to sort out... and there really
>> isn't any sane way for the kernel to sort that out.
>>
>> We have a huge problem on our hands now because of it.
>>
>
> So, given the mess we now have on our hands... any suggestions how to best 
> solve
> it?  There is the option of simply declaring old kexec binaries broken; they
> will then not work reliably with newer kernels, if they even work reliably now
> -- it is hard to know for certain.

I believe all added variables between the last version of the boot
protocol /sbin/kexec knows about and the current time were added in the
initialized data section.  Certainly we can check and that will tell us
how likely changes in arch/x86/boot/ have been regressions in the 32bit
entry point support.

As for solving this there is a simple solution.  Add a second jump
right after the first jump.   The variables after the second jump can
all be zero initialized.

And if we really care about breaking other boot loaders we can take a
survey and actually look and see what they do.  There really aren't that
many x86 boot loaders.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpuidle: add Calxeda SOC idle support

2012-11-24 Thread Rafael J. Wysocki
On Saturday, November 24, 2012 03:21:49 PM Olof Johansson wrote:
> Rafael,
> 
> On Sat, Nov 24, 2012 at 2:03 AM, Rafael J. Wysocki  wrote:
> > On Monday, November 12, 2012 10:00:07 PM Arnd Bergmann wrote:
> >> On Wednesday 07 November 2012, Rob Herring wrote:
> >> > From: Rob Herring 
> >> >
> >> > Add support for core powergating on Calxeda platforms. Initially, this
> >> > supports ECX-1000 (highbank), but support will be added for ECX-2000
> >> > later.
> >> >
> >> > Signed-off-by: Rob Herring 
> >> > Cc: Len Brown 
> >> > Cc: "Rafael J. Wysocki" 
> >>
> >> Acked-by: Arnd Bergmann 
> >>
> >> > It's not really clear where we want ARM cpuidle drivers. We're moving
> >> > everything else out of arch/arm, and my understanding is Len doesn't want
> >> > them in drivers/idle. It seems kind of silly to me to have the framework
> >> > and drivers in 2 places. I've put this in drivers/cpuidle, but it doesn't
> >> > make any difference to me.
> >>
> >> Fine with me. I just don't want in in arch/arm because I'm guessing this 
> >> will
> >> be shared with arm64, which in turn shares idle (and other) drivers with 
> >> various
> >> powerpc, mips and x86 socs.
> >
> > Applied to linux-pm.git/linux-next as v3.8 material.
> 
> We already have it in arm-soc. Since there's no maintainer listed for
> the directory, it didn't seem like there were any better merge paths.

Cool, I'll drop it then.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread Yinghai Lu
On Sat, Nov 24, 2012 at 2:32 PM, H. Peter Anvin  wrote:
> On 11/24/2012 02:18 PM, Yinghai Lu wrote:

> Well, that solves the problem for *this specific instance* but I fear
> therein lies madness in the general case.
>

use

   Bit 0 (read): CAN_BE_LOADED_ABOVE_4G
 - If 1, kernel/boot_params/cmdline/ramdisk can be above 4g,
 set by kernel.

   Bit 1 (write): USE_EXT_BOOT_PARAMS
 - If 1, set by bootloader, and kernel could check new fields
in boot_params
  that are added from 2.12 safely.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Recent kernel "mount" slow

2012-11-24 Thread Jeff Chua
On Sun, Nov 25, 2012 at 5:09 AM, Mikulas Patocka  wrote:
> So it's better to slow down mount.

I am quite proud of the linux boot time pitting against other OS. Even
with 10 partitions. Linux can boot up in just a few seconds, but now
you're saying that we need to do this semaphore check at boot up. By
doing so, it's inducing additional 4 seconds during boot up.

What about moving the locking mechanism to the "mount" program itself?
Won't that be more feasible?

As for the cases of simultaneous mounts, it's usually administrator
that's doing something bad. I would say this is not a kernel issue.

Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpuidle: add Calxeda SOC idle support

2012-11-24 Thread Olof Johansson
Rafael,

On Sat, Nov 24, 2012 at 2:03 AM, Rafael J. Wysocki  wrote:
> On Monday, November 12, 2012 10:00:07 PM Arnd Bergmann wrote:
>> On Wednesday 07 November 2012, Rob Herring wrote:
>> > From: Rob Herring 
>> >
>> > Add support for core powergating on Calxeda platforms. Initially, this
>> > supports ECX-1000 (highbank), but support will be added for ECX-2000
>> > later.
>> >
>> > Signed-off-by: Rob Herring 
>> > Cc: Len Brown 
>> > Cc: "Rafael J. Wysocki" 
>>
>> Acked-by: Arnd Bergmann 
>>
>> > It's not really clear where we want ARM cpuidle drivers. We're moving
>> > everything else out of arch/arm, and my understanding is Len doesn't want
>> > them in drivers/idle. It seems kind of silly to me to have the framework
>> > and drivers in 2 places. I've put this in drivers/cpuidle, but it doesn't
>> > make any difference to me.
>>
>> Fine with me. I just don't want in in arch/arm because I'm guessing this will
>> be shared with arm64, which in turn shares idle (and other) drivers with 
>> various
>> powerpc, mips and x86 socs.
>
> Applied to linux-pm.git/linux-next as v3.8 material.

We already have it in arm-soc. Since there's no maintainer listed for
the directory, it didn't seem like there were any better merge paths.



-Olof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] autofs4: allow autofs to work outside the initial PID namespace

2012-11-24 Thread Eric W. Biederman
Miklos Szeredi  writes:

> On Sat, Nov 24, 2012 at 1:07 PM, Eric W. Biederman
>  wrote:
>> Ian Kent  writes:
>>
>>> On Sat, 2012-11-24 at 10:23 +0800, Ian Kent wrote:
 On Fri, 2012-11-23 at 15:30 +0100, Miklos Szeredi wrote:
>
 AFAICS autofs mounts mounted with MS_PRIVATE in the initial namespace do
 propagate to the clone when it's created so I'm assuming subsequent
 mounts would also. If these mounts are busy in some way they can't be
 umounted in the clone unless "/" is marked private before attempting the
 umount.

Subsequent mounts after the clone do not have a mechanism to propogate
with MS_PRIVATE.  As creating a new mount namespaces is essentially
an instance of mount --bind.  Those semantics are a little unintuitive
I have to admit.

>>> This may sound stupid but if there something like, say, MS_NOPROPAGATE
>>> then the problem I see would pretty much just go away. No more need to
>>> umount existing mounts and container instances would be isolated. But, I
>>> guess, I'm not considering the possibility of cloned of processes as
>>> well  if that makes sense, ;)
>>
>> Something is very weird is going on.  MS_PRIVATE should be the
>> MS_NOPROPOGATE you are looking for.  There is also MS_UNBINDABLE.
>> which is a stronger form of MS_PRIVATE and probably worth play with.
>>
>
> MS_UNBINDABLE says:  skip this mount when copying a mount tree, such
> as when the mount namespace is cloned.
>
> If you set MS_UNBINDABLE on autofs mounts then they will simply not
> appear in a cloned namespace.  Which sounds like a good idea,  no?

Good point.  If the desire is for a mount to be managed by autofs
setting MS_UNBINDABLE seems required.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/4] ACPI: Support system notify handler via .sys_notify

2012-11-24 Thread Rafael J. Wysocki
On Saturday, November 24, 2012 11:01:56 PM Rafael J. Wysocki wrote:
> On Thursday, November 08, 2012 01:23:44 PM Toshi Kani wrote:
> > Added a new .sys_notify interface, which allows ACPI drivers to
> > register their system-level (ex. hotplug) notify handlers through
> > their acpi_driver table.  This removes redundant ACPI namespace
> > walks from ACPI drivers for faster booting.
> > 
> > The global notify handler acpi_bus_notify() is called for all
> > system-level ACPI notifications, which then calls an appropriate
> > driver's handler if any.  ACPI drivers no longer need to register
> > or unregister driver's handler to each ACPI device object.  It also
> > supports dynamic ACPI namespace with LoadTable & Unload opcode
> > without any modification in ACPI drivers.
> > 
> > Added a common system notify handler acpi_bus_sys_notify(), which
> > allows ACPI drivers to set it to .sys_notify when this function is
> > fully implemented.
> 
> I don't really understand this.
> 
> > It removes functional conflict between driver's
> > notify handler and the global notify handler acpi_bus_notify().
> > 
> > Note that the changes maintain backward compatibility for ACPI
> > drivers.  Any drivers registered their hotplug handler through the
> > existing interfaces, such as acpi_install_notify_handler() and
> > register_acpi_bus_notifier(), will continue to work as before.
> 
> I really wouldn't like to add new callbacks to struct acpi_device_ops, because
> I'd like that whole thing to go away entirely eventually, along with struct
> acpi_driver.
> 
> Moreover, in this particular case, it really is not useful to have to define
> a struct acpi_driver so that one can register for receiving system
> notifications from ACPI.  It would be really nice if non-ACPI drivers, such
> as PCI or platform, could do that too.

Which they do by using acpi_install_notify_handler() directly.

> Besides, acpi_os_execute_deferred() is always run on CPU0, because of some
> SMI-related peculiarity, which is not very efficient as far as the events
> handling is concerned, but we can improve the situation a bit by queing the
> execution of the registered handlers in a different workqueue.  Maybe it's
> worth considering if we're going to change this code anyway?

Well, perhaps we really don't need to change it after all?  Maybe we can just
switch everyone to using acpi_install_notify_handler() and then we can just
drop that code entirely?

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread H. Peter Anvin

On 11/24/2012 02:18 PM, Yinghai Lu wrote:


Careful... consider the people who use a kexec-based solution as
bootloaders.


yes, those may not update kexec in the flash...

then, may need to use another bit in xloadflags to tell new kernel if
need to check ext_...

Field name: xloadflags
Type:   modify (obligatory)
Offset/size:0x236/2
Protocol:   2.12+

   This field is a bitmask.

   Bit 0 (read): CAN_BE_LOADED_ABOVE_4G
 - If 1, kernel/boot_params/cmdline/ramdisk can be above 4g,
 set by kernel.

   Bit 1 (write): LOADED_ABOVE_4G
 - If 1, kernel/boot_params/cmdline/ramdisk is loaded above 4g,
 set by bootloader, and kernel will check ext_ramdisk_image,
 ext_ramdisk_size and ext_cmd_line_ptr.



Well, that solves the problem for *this specific instance* but I fear 
therein lies madness in the general case.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread Yinghai Lu
On Sat, Nov 24, 2012 at 1:38 PM, H. Peter Anvin  wrote:
> On 11/24/2012 01:30 PM, Yinghai Lu wrote:
>>>
>>>
>>> So, given the mess we now have on our hands... any suggestions how to
>>> best
>>> solve it?  There is the option of simply declaring old kexec binaries
>>> broken; they will then not work reliably with newer kernels, if they even
>>> work reliably now -- it is hard to know for certain.
>>
>>
>> yes, if the user updates kernel to be kexeced, then would be
>> reasonable to ask them to
>> update kexec-tools.
>>
>
> Careful... consider the people who use a kexec-based solution as
> bootloaders.

yes, those may not update kexec in the flash...

then, may need to use another bit in xloadflags to tell new kernel if
need to check ext_...

Field name: xloadflags
Type:   modify (obligatory)
Offset/size:0x236/2
Protocol:   2.12+

  This field is a bitmask.

  Bit 0 (read): CAN_BE_LOADED_ABOVE_4G
- If 1, kernel/boot_params/cmdline/ramdisk can be above 4g,
set by kernel.

  Bit 1 (write): LOADED_ABOVE_4G
- If 1, kernel/boot_params/cmdline/ramdisk is loaded above 4g,
set by bootloader, and kernel will check ext_ramdisk_image,
ext_ramdisk_size and ext_cmd_line_ptr.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/4] ACPI: Support system notify handler via .sys_notify

2012-11-24 Thread Rafael J. Wysocki
On Saturday, November 24, 2012 11:01:56 PM Rafael J. Wysocki wrote:
> On Thursday, November 08, 2012 01:23:44 PM Toshi Kani wrote:
> > Added a new .sys_notify interface, which allows ACPI drivers to
> > register their system-level (ex. hotplug) notify handlers through
> > their acpi_driver table.  This removes redundant ACPI namespace
> > walks from ACPI drivers for faster booting.
> > 
> > The global notify handler acpi_bus_notify() is called for all
> > system-level ACPI notifications, which then calls an appropriate
> > driver's handler if any.  ACPI drivers no longer need to register
> > or unregister driver's handler to each ACPI device object.  It also
> > supports dynamic ACPI namespace with LoadTable & Unload opcode
> > without any modification in ACPI drivers.
> > 
> > Added a common system notify handler acpi_bus_sys_notify(), which
> > allows ACPI drivers to set it to .sys_notify when this function is
> > fully implemented.
> 
> I don't really understand this.
> 
> > It removes functional conflict between driver's
> > notify handler and the global notify handler acpi_bus_notify().
> > 
> > Note that the changes maintain backward compatibility for ACPI
> > drivers.  Any drivers registered their hotplug handler through the
> > existing interfaces, such as acpi_install_notify_handler() and
> > register_acpi_bus_notifier(), will continue to work as before.
> 
> I really wouldn't like to add new callbacks to struct acpi_device_ops, because
> I'd like that whole thing to go away entirely eventually, along with struct
> acpi_driver.
> 
> Moreover, in this particular case, it really is not useful to have to define
> a struct acpi_driver so that one can register for receiving system
> notifications from ACPI.  It would be really nice if non-ACPI drivers, such
> as PCI or platform, could do that too.
> 
> Besides, acpi_os_execute_deferred() is always run on CPU0, because of some
> SMI-related peculiarity, which is not very efficient as far as the events
> handling is concerned, but we can improve the situation a bit by queing the
> execution of the registered handlers in a different workqueue.  Maybe it's
> worth considering if we're going to change this code anyway?
> 
> > Signed-off-by: Toshi Kani 
> > ---
> >  drivers/acpi/bus.c  | 64 --
> >  drivers/acpi/scan.c | 83 
> > +
> >  include/acpi/acpi_bus.h |  6 
> >  3 files changed, 137 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> > index 07a20ee..b256bcf2 100644
> > --- a/drivers/acpi/bus.c
> > +++ b/drivers/acpi/bus.c
> > @@ -779,21 +779,16 @@ void unregister_acpi_bus_notifier(struct 
> > notifier_block *nb)
> >  EXPORT_SYMBOL_GPL(unregister_acpi_bus_notifier);
> >  
> >  /**
> > - * acpi_bus_notify
> > - * ---
> > - * Callback for all 'system-level' device notifications (values 0x00-0x7F).
> > + * acpi_bus_sys_notify: Common system notify handler
> > + *
> > + * ACPI drivers may specify this common handler to its sys_notify entry.
> > + * TBD: This handler is not implemented yet.
> >   */
> > -static void acpi_bus_notify(acpi_handle handle, u32 type, void *data)
> > +void acpi_bus_sys_notify(acpi_handle handle, u32 type, void *data)
> 
> This isn't used anywhere.  Are drivers supposed to use it?  If so, what about
> the BUS_CHECK and DEVICE_CHECK notifications?
> 
> >  {
> > -   struct acpi_device *device = NULL;
> > -   struct acpi_driver *driver;
> > -
> > ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Notification %#02x to handle %p\n",
> >   type, handle));
> >  
> > -   blocking_notifier_call_chain(_bus_notify_list,
> > -   type, (void *)handle);

By the way, there is exacly one user of this chain, which is dock.c.

What about convering that to something different and dropping the chain to
start with?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 1/4] ACPI: Support system notify handler via .sys_notify

2012-11-24 Thread Rafael J. Wysocki
On Thursday, November 08, 2012 01:23:44 PM Toshi Kani wrote:
> Added a new .sys_notify interface, which allows ACPI drivers to
> register their system-level (ex. hotplug) notify handlers through
> their acpi_driver table.  This removes redundant ACPI namespace
> walks from ACPI drivers for faster booting.
> 
> The global notify handler acpi_bus_notify() is called for all
> system-level ACPI notifications, which then calls an appropriate
> driver's handler if any.  ACPI drivers no longer need to register
> or unregister driver's handler to each ACPI device object.  It also
> supports dynamic ACPI namespace with LoadTable & Unload opcode
> without any modification in ACPI drivers.
> 
> Added a common system notify handler acpi_bus_sys_notify(), which
> allows ACPI drivers to set it to .sys_notify when this function is
> fully implemented.

I don't really understand this.

> It removes functional conflict between driver's
> notify handler and the global notify handler acpi_bus_notify().
> 
> Note that the changes maintain backward compatibility for ACPI
> drivers.  Any drivers registered their hotplug handler through the
> existing interfaces, such as acpi_install_notify_handler() and
> register_acpi_bus_notifier(), will continue to work as before.

I really wouldn't like to add new callbacks to struct acpi_device_ops, because
I'd like that whole thing to go away entirely eventually, along with struct
acpi_driver.

Moreover, in this particular case, it really is not useful to have to define
a struct acpi_driver so that one can register for receiving system
notifications from ACPI.  It would be really nice if non-ACPI drivers, such
as PCI or platform, could do that too.

Besides, acpi_os_execute_deferred() is always run on CPU0, because of some
SMI-related peculiarity, which is not very efficient as far as the events
handling is concerned, but we can improve the situation a bit by queing the
execution of the registered handlers in a different workqueue.  Maybe it's
worth considering if we're going to change this code anyway?

> Signed-off-by: Toshi Kani 
> ---
>  drivers/acpi/bus.c  | 64 --
>  drivers/acpi/scan.c | 83 
> +
>  include/acpi/acpi_bus.h |  6 
>  3 files changed, 137 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 07a20ee..b256bcf2 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -779,21 +779,16 @@ void unregister_acpi_bus_notifier(struct notifier_block 
> *nb)
>  EXPORT_SYMBOL_GPL(unregister_acpi_bus_notifier);
>  
>  /**
> - * acpi_bus_notify
> - * ---
> - * Callback for all 'system-level' device notifications (values 0x00-0x7F).
> + * acpi_bus_sys_notify: Common system notify handler
> + *
> + * ACPI drivers may specify this common handler to its sys_notify entry.
> + * TBD: This handler is not implemented yet.
>   */
> -static void acpi_bus_notify(acpi_handle handle, u32 type, void *data)
> +void acpi_bus_sys_notify(acpi_handle handle, u32 type, void *data)

This isn't used anywhere.  Are drivers supposed to use it?  If so, what about
the BUS_CHECK and DEVICE_CHECK notifications?

>  {
> - struct acpi_device *device = NULL;
> - struct acpi_driver *driver;
> -
>   ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Notification %#02x to handle %p\n",
> type, handle));
>  
> - blocking_notifier_call_chain(_bus_notify_list,
> - type, (void *)handle);
> -
>   switch (type) {
>  
>   case ACPI_NOTIFY_BUS_CHECK:
> @@ -842,14 +837,51 @@ static void acpi_bus_notify(acpi_handle handle, u32 
> type, void *data)
> type));
>   break;
>   }
> +}
> +
> +/**
> + * acpi_bus_drv_notify: Call driver's system-level notify handler
> + */
> +void acpi_bus_drv_notify(struct acpi_driver *driver,
> + struct acpi_device *device, acpi_handle handle,
> + u32 type, void *data)
> +{
> + BUG_ON(!driver);

Rule: Don't crash the kernel if you don't have to.  Try to recover instead.

It seems that

if (WARN_ON(!driver))
return;

would be sufficient in this particulare case, wouldn't it?

> +
> + if (driver->ops.sys_notify)
> + driver->ops.sys_notify(handle, type, data);
> + else if (device && driver->ops.notify &&

Why "else if"?  The existing code does this unconditionally.  Is that incorrect?

> +  (driver->flags & ACPI_DRIVER_ALL_NOTIFY_EVENTS))
> + driver->ops.notify(device, type);
> +
> + return;
> +}
> +
> +/**
> + * acpi_bus_notify: The system-level global notify handler
> + *
> + * The global notify handler for all 'system-level' device notifications
> + * (values 0x00-0x7F).  This handler calls a driver's notify handler for
> + * the notified ACPI device.
> + */
> +static void acpi_bus_notify(acpi_handle handle, u32 type, void *data)
> +{
> + struct acpi_device *device = 

Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread H. Peter Anvin

On 11/24/2012 01:30 PM, Yinghai Lu wrote:


So, given the mess we now have on our hands... any suggestions how to best
solve it?  There is the option of simply declaring old kexec binaries
broken; they will then not work reliably with newer kernels, if they even
work reliably now -- it is hard to know for certain.


yes, if the user updates kernel to be kexeced, then would be
reasonable to ask them to
update kexec-tools.



Careful... consider the people who use a kexec-based solution as 
bootloaders.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread Yinghai Lu
On Sat, Nov 24, 2012 at 11:50 AM, H. Peter Anvin  wrote:
> On 11/24/2012 09:32 AM, H. Peter Anvin wrote:
>>
>> On 11/24/2012 04:37 AM, Eric W. Biederman wrote:
>>>
>>>
>>> Certainly /sbin/kexec isn't bothering to calculate the end of the setup
>>> header and just being far more conservative and using all of the 16bit
>>> real mode code as it's initializer.
>>>
>>
>> That's not conservative... that's just plain wrong.  It means you're
>> initializing the fields in struct boot_params with garbage instead of a
>> predictable value (zero).
>>
>> We could work around it with a sentinel hack... except you *also*
>> probably modify *some* fields and now we have a horrid mix of
>> initialized and uninitialized fields to sort out... and there really
>> isn't any sane way for the kernel to sort that out.
>>
>> We have a huge problem on our hands now because of it.
>>
>
> So, given the mess we now have on our hands... any suggestions how to best
> solve it?  There is the option of simply declaring old kexec binaries
> broken; they will then not work reliably with newer kernels, if they even
> work reliably now -- it is hard to know for certain.

yes, if the user updates kernel to be kexeced, then would be
reasonable to ask them to
update kexec-tools.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm, percpu: Make sure percpu_alloc early parameter has an argument

2012-11-24 Thread Cyrill Gorcunov
Otherwise we are getting a nil dereference

 | [0.00] BUG: unable to handle kernel NULL pointer dereference at  
 (null)
 | [0.00] IP: [] strcmp+0x10/0x30

Signed-off-by: Cyrill Gorcunov 
---
 mm/percpu.c |3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6.git/mm/percpu.c
===
--- linux-2.6.git.orig/mm/percpu.c
+++ linux-2.6.git/mm/percpu.c
@@ -1380,6 +1380,9 @@ enum pcpu_fc pcpu_chosen_fc __initdata =
 
 static int __init percpu_alloc_setup(char *str)
 {
+   if (!str)
+   return -EINVAL;
+
if (0)
/* nada */;
 #ifdef CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] autofs4: allow autofs to work outside the initial PID namespace

2012-11-24 Thread Miklos Szeredi
On Sat, Nov 24, 2012 at 1:07 PM, Eric W. Biederman
 wrote:
> Ian Kent  writes:
>
>> On Sat, 2012-11-24 at 10:23 +0800, Ian Kent wrote:
>>> On Fri, 2012-11-23 at 15:30 +0100, Miklos Szeredi wrote:

>>> AFAICS autofs mounts mounted with MS_PRIVATE in the initial namespace do
>>> propagate to the clone when it's created so I'm assuming subsequent
>>> mounts would also. If these mounts are busy in some way they can't be
>>> umounted in the clone unless "/" is marked private before attempting the
>>> umount.
>>
>> This may sound stupid but if there something like, say, MS_NOPROPAGATE
>> then the problem I see would pretty much just go away. No more need to
>> umount existing mounts and container instances would be isolated. But, I
>> guess, I'm not considering the possibility of cloned of processes as
>> well  if that makes sense, ;)
>
> Something is very weird is going on.  MS_PRIVATE should be the
> MS_NOPROPOGATE you are looking for.  There is also MS_UNBINDABLE.
> which is a stronger form of MS_PRIVATE and probably worth play with.
>

MS_UNBINDABLE says:  skip this mount when copying a mount tree, such
as when the mount namespace is cloned.

If you set MS_UNBINDABLE on autofs mounts then they will simply not
appear in a cloned namespace.  Which sounds like a good idea,  no?

Thanks,
Miklos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Recent kernel "mount" slow

2012-11-24 Thread Mikulas Patocka


On Sat, 24 Nov 2012, Jeff Chua wrote:

> On Fri, Nov 23, 2012 at 9:24 PM, Jens Axboe  wrote:
> > On 2012-11-22 20:21, Linus Torvalds wrote:
> >> Doesn't sound like a fsdevel issue since it seems to be independent of
> >> filesystems. More like some generic block layer thing. Adding Jens
> >> (and quoting the whole thing)
> >>
> >> Jens, any ideas? Most of your stuff came in after -rc2, which would
> >> fit with the fact that most of the slowdown seems to be after -rc2
> >> according to Jeff.
> >
> > No ideas. Looking at what went in from my side, only the rq plug sorting
> > is a core change, and that should not cause any change in behaviour for
> > a single device. That's commit 975927b9.
> >
> >> Jeff, more bisecting would be good, though.
> >
> > Probably required, yes...
> 
> 
> This one slows mount from 0.012s to 0.168s.
> 
> commit 62ac665ff9fc07497ca524bd20d6a96893d11071
> Author: Mikulas Patocka 
> Date:   Wed Sep 26 07:46:43 2012 +0200
> 
> blockdev: turn a rw semaphore into a percpu rw semaphore
> 
> 
> There were couple of more changes to percpu-rw-semaphores after
> 3.7.0-rc2 and those slows mount further from 0.168s to 0.500s. I don't
> really know, but I'm suspecting these. Still bisecting.

The problem there is that you either use normal semaphores and slow down 
I/O or you use percpu-semaphores, you don't slow down I/O, but you slow 
down mount.

So it's better to slow down mount.

(if you don't use any semaphore at all, as it was in 3.6 kernel and 
before, there is a race condition that can crash the kernel if someone 
does mount and direct I/O read on the same device at the same time)

You can improve mount time if you change all occurences of 
synchronize_sched() in include/linux/percpu-rwsem.h to 
synchronize_sched_expedited().

But some people say that synchronize_sched_expedited() is bad for real 
time latency. (can there be something like: if (realtime) 
synchronize_sched(); else synchronize_sched_expedited(); ?)

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] iio: adc: Add Texas Instruments ADC081C021/027 support

2012-11-24 Thread Thierry Reding
On Sat, Nov 24, 2012 at 10:54:03AM +, Jonathan Cameron wrote:
> On 11/23/2012 03:13 PM, Thierry Reding wrote:
> > Add support for reading conversion results from the ADC and provide them
> > through a single IIO channel. A proper scaling factor is also exported
> > based on the reference voltage provided by a regulator.
> >
> > Signed-off-by: Thierry Reding 
> 
> Looks good to me.  I think timing is against you (depending on what Linus
> says with his next rc).  IIO patches are routed through Greg KH. His cut
> off is 1 week before the merge window opens.  Mine tends to as a result
> be a few days before that.  Linus stated in last rc message that the one he'll
> do today or tomorrow will be the last for this cycle (and hence merge window
> will open in a week from now). Hence this will probably hit linux next after
> the merge window closes and merge in the 3.9 cycle.

Not a problem. I wasn't relying on getting this merged for 3.8. Official
support for the board that uses this driver is still some way out, so no
need to hurry in any way.

Thanks for merging and thanks for reviewing, Lars-Peter.

Thierry


pgpPP0d74H6tR.pgp
Description: PGP signature


[PATCH v4 09/11] x86: use io_remap to access real_mode_data

2012-11-24 Thread Yinghai Lu
When 64bit bootloader put real mode data above 4g, We can not
access real mode data directly yet.

because in arch/x86/kernel/head_64.S, only set ident mapping
for 0-1g, and kernel code/data/bss.

So need to move early_ioremap_init() calling early from setup_arch()
to x86_64_start_kernel().

Also use rsi/rdi instead of esi/edi for real_data pointer passing
between asm code and c code.

Signed-off-by: Yinghai Lu 
---
 arch/x86/kernel/head64.c  |   17 ++---
 arch/x86/kernel/head_64.S |4 ++--
 arch/x86/kernel/setup.c   |2 ++
 3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 3ac6cad..735cd47 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -52,12 +52,21 @@ static void __init copy_bootdata(char *real_mode_data)
 {
char * command_line;
unsigned long cmd_line_ptr;
+   char *p;
 
-   memcpy(_params, real_mode_data, sizeof boot_params);
+   /*
+* for 64bit bootload path, those data could be above 4G,
+* and we do set ident mapping for them in head_64.S.
+* So need to ioremap to access them.
+*/
+   p = early_memremap((unsigned long)real_mode_data, sizeof(boot_params));
+   memcpy(_params, p, sizeof(boot_params));
+   early_iounmap(p, sizeof(boot_params));
cmd_line_ptr = get_cmd_line_ptr();
if (cmd_line_ptr) {
-   command_line = __va(cmd_line_ptr);
+   command_line = early_memremap(cmd_line_ptr, COMMAND_LINE_SIZE);
memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
+   early_iounmap(command_line, COMMAND_LINE_SIZE);
}
 }
 
@@ -104,7 +113,9 @@ void __init x86_64_start_kernel(char * real_mode_data)
 
 void __init x86_64_start_reservations(char *real_mode_data)
 {
-   copy_bootdata(__va(real_mode_data));
+   early_ioremap_init();
+
+   copy_bootdata(real_mode_data);
 
memblock_reserve(__pa_symbol(&_text),
 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 036dd0e..4e90af7 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -358,9 +358,9 @@ ENTRY(secondary_startup_64)
movlinitial_gs+4(%rip),%edx
wrmsr   
 
-   /* esi is pointer to real mode structure with interesting info.
+   /* rsi is pointer to real mode structure with interesting info.
   pass it to C */
-   movl%esi, %edi
+   movq%rsi, %rdi

/* Finally jump to run C code and to be on real kernel address
 * Since we are running on identity-mapped space we have to jump
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 194e151..573fa7d7 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -718,7 +718,9 @@ void __init setup_arch(char **cmdline_p)
 
early_trap_init();
early_cpu_init();
+#ifdef CONFIG_X86_32
early_ioremap_init();
+#endif
 
setup_olpc_ofw_pgd();
 
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 06/11] x86, boot: add get_cmd_line_ptr()

2012-11-24 Thread Yinghai Lu
later will check ext_cmd_line_ptr at the same time.

Signed-off-by: Yinghai Lu 
---
 arch/x86/boot/compressed/cmdline.c |   10 --
 arch/x86/kernel/head64.c   |   13 +++--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/boot/compressed/cmdline.c 
b/arch/x86/boot/compressed/cmdline.c
index 10f6b11..b4c913c 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -13,13 +13,19 @@ static inline char rdfs8(addr_t addr)
return *((char *)(fs + addr));
 }
 #include "../cmdline.c"
+static unsigned long get_cmd_line_ptr(void)
+{
+   unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
+
+   return cmd_line_ptr;
+}
 int cmdline_find_option(const char *option, char *buffer, int bufsize)
 {
-   return __cmdline_find_option(real_mode->hdr.cmd_line_ptr, option, 
buffer, bufsize);
+   return __cmdline_find_option(get_cmd_line_ptr(), option, buffer, 
bufsize);
 }
 int cmdline_find_option_bool(const char *option)
 {
-   return __cmdline_find_option_bool(real_mode->hdr.cmd_line_ptr, option);
+   return __cmdline_find_option_bool(get_cmd_line_ptr(), option);
 }
 
 #endif
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 00e612a..3ac6cad 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -41,13 +41,22 @@ static void __init clear_bss(void)
   (unsigned long) __bss_stop - (unsigned long) __bss_start);
 }
 
+static unsigned long get_cmd_line_ptr(void)
+{
+   unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+   return cmd_line_ptr;
+}
+
 static void __init copy_bootdata(char *real_mode_data)
 {
char * command_line;
+   unsigned long cmd_line_ptr;
 
memcpy(_params, real_mode_data, sizeof boot_params);
-   if (boot_params.hdr.cmd_line_ptr) {
-   command_line = __va(boot_params.hdr.cmd_line_ptr);
+   cmd_line_ptr = get_cmd_line_ptr();
+   if (cmd_line_ptr) {
+   command_line = __va(cmd_line_ptr);
memcpy(boot_command_line, command_line, COMMAND_LINE_SIZE);
}
 }
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 05/11] x86: add get_ramdisk_image/size()

2012-11-24 Thread Yinghai Lu
There several places to find ramdisk information early for reserving
and relocating.

Use functions to make code more readable and consistent.

Later will add ext_ramdisk_image/size in those functions to support
loading ramdisk above 4g.

Signed-off-by: Yinghai Lu 
---
 arch/x86/kernel/setup.c |   29 +
 1 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index ee6d267..194e151 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -298,12 +298,25 @@ static void __init reserve_brk(void)
 
 #ifdef CONFIG_BLK_DEV_INITRD
 
+static u64 __init get_ramdisk_image(void)
+{
+   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+
+   return ramdisk_image;
+}
+static u64 __init get_ramdisk_size(void)
+{
+   u64 ramdisk_size = boot_params.hdr.ramdisk_size;
+
+   return ramdisk_size;
+}
+
 #define MAX_MAP_CHUNK  (NR_FIX_BTMAPS << PAGE_SHIFT)
 static void __init relocate_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_image = get_ramdisk_image();
+   u64 ramdisk_size  = get_ramdisk_size();
u64 area_size = PAGE_ALIGN(ramdisk_size);
u64 ramdisk_here;
unsigned long slop, clen, mapaddr;
@@ -342,8 +355,8 @@ static void __init relocate_initrd(void)
ramdisk_size  -= clen;
}
 
-   ramdisk_image = boot_params.hdr.ramdisk_image;
-   ramdisk_size  = boot_params.hdr.ramdisk_size;
+   ramdisk_image = get_ramdisk_image();
+   ramdisk_size  = get_ramdisk_size();
printk(KERN_INFO "Move RAMDISK from [mem %#010llx-%#010llx] to"
" [mem %#010llx-%#010llx]\n",
ramdisk_image, ramdisk_image + ramdisk_size - 1,
@@ -367,8 +380,8 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 static void __init early_reserve_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_image = get_ramdisk_image();
+   u64 ramdisk_size  = get_ramdisk_size();
u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
 
if (!boot_params.hdr.type_of_loader ||
@@ -380,8 +393,8 @@ static void __init early_reserve_initrd(void)
 static void __init reserve_initrd(void)
 {
/* Assume only end is not page aligned */
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_image = get_ramdisk_image();
+   u64 ramdisk_size  = get_ramdisk_size();
u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
u64 mapped_size;
 
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 11/11] x86: remove 1024G limitation for kexec buffer on 64bit

2012-11-24 Thread Yinghai Lu
Now 64bit kernel supports more than 1T ram and kexec tools
could find buffer above 1T, remove that obsolete limitation.
and use MAXMEM instead.

Tested on system more than 1024G ram.

Signed-off-by: Yinghai Lu 
Cc: "Eric W. Biederman" 
---
 arch/x86/include/asm/kexec.h |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 317ff17..11bfdc5 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -48,11 +48,11 @@
 # define vmcore_elf_check_arch_cross(x) ((x)->e_machine == EM_X86_64)
 #else
 /* Maximum physical address we can use pages from */
-# define KEXEC_SOURCE_MEMORY_LIMIT  (0xFFUL)
+# define KEXEC_SOURCE_MEMORY_LIMIT  (MAXMEM-1)
 /* Maximum address we can reach in physical address mode */
-# define KEXEC_DESTINATION_MEMORY_LIMIT (0xFFUL)
+# define KEXEC_DESTINATION_MEMORY_LIMIT (MAXMEM-1)
 /* Maximum address we can use for the control pages */
-# define KEXEC_CONTROL_MEMORY_LIMIT (0xFFUL)
+# define KEXEC_CONTROL_MEMORY_LIMIT (MAXMEM-1)
 
 /* Allocate one page for the pdp and the second for the code */
 # define KEXEC_CONTROL_PAGE_SIZE  (4096UL + 4096UL)
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 08/11] x86, boot: update cmd_line_ptr to unsigned long

2012-11-24 Thread Yinghai Lu
boot/compressed/misc.c could be with 64 bit, and cmd_line_ptr could
above 4g.

So change to unsigned long instead, that will be 64bit in 64bit path
and 32bit in 32bit path.

Signed-off-by: Yinghai Lu 
---
 arch/x86/boot/boot.h|8 
 arch/x86/boot/cmdline.c |4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 7fadf80..5b75319 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -285,11 +285,11 @@ struct biosregs {
 void intcall(u8 int_no, const struct biosregs *ireg, struct biosregs *oreg);
 
 /* cmdline.c */
-int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, 
int bufsize);
-int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
+int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char 
*buffer, int bufsize);
+int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option);
 static inline int cmdline_find_option(const char *option, char *buffer, int 
bufsize)
 {
-   u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+   unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
if (cmd_line_ptr >= 0x10)
return -1;  /* inaccessible */
@@ -299,7 +299,7 @@ static inline int cmdline_find_option(const char *option, 
char *buffer, int bufs
 
 static inline int cmdline_find_option_bool(const char *option)
 {
-   u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+   unsigned long cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
 
if (cmd_line_ptr >= 0x10)
return -1;  /* inaccessible */
diff --git a/arch/x86/boot/cmdline.c b/arch/x86/boot/cmdline.c
index 768f00f..625d21b 100644
--- a/arch/x86/boot/cmdline.c
+++ b/arch/x86/boot/cmdline.c
@@ -27,7 +27,7 @@ static inline int myisspace(u8 c)
  * Returns the length of the argument (regardless of if it was
  * truncated to fit in the buffer), or -1 on not found.
  */
-int __cmdline_find_option(u32 cmdline_ptr, const char *option, char *buffer, 
int bufsize)
+int __cmdline_find_option(unsigned long cmdline_ptr, const char *option, char 
*buffer, int bufsize)
 {
addr_t cptr;
char c;
@@ -99,7 +99,7 @@ int __cmdline_find_option(u32 cmdline_ptr, const char 
*option, char *buffer, int
  * Returns the position of that option (starts counting with 1)
  * or 0 on not found
  */
-int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option)
+int __cmdline_find_option_bool(unsigned long cmdline_ptr, const char *option)
 {
addr_t cptr;
char c;
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 01/11] x86, boot: move verify_cpu.S after 0x200

2012-11-24 Thread Yinghai Lu
We are short of space before 0x200 that is entry for startup_64.

According to hpa, we can not change startup_64 to other offset and
that become ABI now.

We could move function verify_cpu down, and that could avoid extra
code of jmp back and forth if we would move other lines.

Signed-off-by: Yinghai Lu 
Cc: Matt Fleming 
---
 arch/x86/boot/compressed/head_64.S |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2c4b171..2c3cee4 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -182,8 +182,6 @@ no_longmode:
hlt
jmp 1b
 
-#include "../../kernel/verify_cpu.S"
-
/*
 * Be careful here startup_64 needs to be at a predictable
 * address so I can export it in an ELF header.  Bootloaders
@@ -349,6 +347,9 @@ relocated:
  */
jmp *%rbp
 
+   .code32
+#include "../../kernel/verify_cpu.S"
+
.data
 gdt:
.word   gdt_end - gdt
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 04/11] x86: Merge early_reserve_initrd for 32bit and 64bit

2012-11-24 Thread Yinghai Lu
They are the same, could move them out from head32/64.c to setup.c.

We are using memblock, and it could handle overlapping properly, so
we don't need to reserve some at first to hold the location, and just
need to make sure we reserve them before we are using memblock to find
free mem to use.

Signed-off-by: Yinghai Lu 
Reviewed-by: Pekka Enberg 
---
 arch/x86/kernel/head32.c |   11 ---
 arch/x86/kernel/head64.c |   11 ---
 arch/x86/kernel/setup.c  |   22 ++
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kernel/head32.c b/arch/x86/kernel/head32.c
index c18f59d..4c52efc 100644
--- a/arch/x86/kernel/head32.c
+++ b/arch/x86/kernel/head32.c
@@ -33,17 +33,6 @@ void __init i386_start_kernel(void)
memblock_reserve(__pa_symbol(&_text),
 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
 
-#ifdef CONFIG_BLK_DEV_INITRD
-   /* Reserve INITRD */
-   if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-   /* Assume only end is not page aligned */
-   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
-   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
-   u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
-   memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-   }
-#endif
-
/* Call the subarch specific early setup function */
switch (boot_params.hdr.hardware_subarch) {
case X86_SUBARCH_MRST:
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 037df57..00e612a 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -100,17 +100,6 @@ void __init x86_64_start_reservations(char *real_mode_data)
memblock_reserve(__pa_symbol(&_text),
 __pa_symbol(&__bss_stop) - __pa_symbol(&_text));
 
-#ifdef CONFIG_BLK_DEV_INITRD
-   /* Reserve INITRD */
-   if (boot_params.hdr.type_of_loader && boot_params.hdr.ramdisk_image) {
-   /* Assume only end is not page aligned */
-   unsigned long ramdisk_image = boot_params.hdr.ramdisk_image;
-   unsigned long ramdisk_size  = boot_params.hdr.ramdisk_size;
-   unsigned long ramdisk_end   = PAGE_ALIGN(ramdisk_image + 
ramdisk_size);
-   memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
-   }
-#endif
-
reserve_ebda_region();
 
/*
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 6d29d1f..ee6d267 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -364,6 +364,19 @@ static u64 __init get_mem_size(unsigned long limit_pfn)
 
return mapped_pages << PAGE_SHIFT;
 }
+static void __init early_reserve_initrd(void)
+{
+   /* Assume only end is not page aligned */
+   u64 ramdisk_image = boot_params.hdr.ramdisk_image;
+   u64 ramdisk_size  = boot_params.hdr.ramdisk_size;
+   u64 ramdisk_end   = PAGE_ALIGN(ramdisk_image + ramdisk_size);
+
+   if (!boot_params.hdr.type_of_loader ||
+   !ramdisk_image || !ramdisk_size)
+   return; /* No initrd provided by bootloader */
+
+   memblock_reserve(ramdisk_image, ramdisk_end - ramdisk_image);
+}
 static void __init reserve_initrd(void)
 {
/* Assume only end is not page aligned */
@@ -390,10 +403,6 @@ static void __init reserve_initrd(void)
if (pfn_range_is_mapped(PFN_DOWN(ramdisk_image),
PFN_DOWN(ramdisk_end))) {
/* All are mapped, easy case */
-   /*
-* don't need to reserve again, already reserved early
-* in i386_start_kernel
-*/
initrd_start = ramdisk_image + PAGE_OFFSET;
initrd_end = initrd_start + ramdisk_size;
return;
@@ -404,6 +413,9 @@ static void __init reserve_initrd(void)
memblock_free(ramdisk_image, ramdisk_end - ramdisk_image);
 }
 #else
+static void __init early_reserve_initrd(void)
+{
+}
 static void __init reserve_initrd(void)
 {
 }
@@ -665,6 +677,8 @@ early_param("reservelow", parse_reservelow);
 
 void __init setup_arch(char **cmdline_p)
 {
+   early_reserve_initrd();
+
 #ifdef CONFIG_X86_32
memcpy(_cpu_data, _cpu_data, sizeof(new_cpu_data));
visws_early_detect();
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 07/11] x86, boot: move checking of cmd_line_ptr out of common path

2012-11-24 Thread Yinghai Lu
cmdline.c::__cmdline_find_option... are shared between
16-bit setup code and 32/64 bit decompressor code.

for 32/64 only path via kexec, we should not check if ptr less 1M.
as those cmdline could be put above 1M, or even 4G.

Move out accessible checking out of __cmdline_find_option()
So decompressor in misc.c can parse cmdline correctly.

Signed-off-by: Yinghai Lu 
---
 arch/x86/boot/boot.h|   14 --
 arch/x86/boot/cmdline.c |8 
 2 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/boot.h b/arch/x86/boot/boot.h
index 18997e5..7fadf80 100644
--- a/arch/x86/boot/boot.h
+++ b/arch/x86/boot/boot.h
@@ -289,12 +289,22 @@ int __cmdline_find_option(u32 cmdline_ptr, const char 
*option, char *buffer, int
 int __cmdline_find_option_bool(u32 cmdline_ptr, const char *option);
 static inline int cmdline_find_option(const char *option, char *buffer, int 
bufsize)
 {
-   return __cmdline_find_option(boot_params.hdr.cmd_line_ptr, option, 
buffer, bufsize);
+   u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+   if (cmd_line_ptr >= 0x10)
+   return -1;  /* inaccessible */
+
+   return __cmdline_find_option(cmd_line_ptr, option, buffer, bufsize);
 }
 
 static inline int cmdline_find_option_bool(const char *option)
 {
-   return __cmdline_find_option_bool(boot_params.hdr.cmd_line_ptr, option);
+   u32 cmd_line_ptr = boot_params.hdr.cmd_line_ptr;
+
+   if (cmd_line_ptr >= 0x10)
+   return -1;  /* inaccessible */
+
+   return __cmdline_find_option_bool(cmd_line_ptr, option);
 }
 
 
diff --git a/arch/x86/boot/cmdline.c b/arch/x86/boot/cmdline.c
index 6b3b6f7..768f00f 100644
--- a/arch/x86/boot/cmdline.c
+++ b/arch/x86/boot/cmdline.c
@@ -41,8 +41,8 @@ int __cmdline_find_option(u32 cmdline_ptr, const char 
*option, char *buffer, int
st_bufcpy   /* Copying this to buffer */
} state = st_wordstart;
 
-   if (!cmdline_ptr || cmdline_ptr >= 0x10)
-   return -1;  /* No command line, or inaccessible */
+   if (!cmdline_ptr)
+   return -1;  /* No command line */
 
cptr = cmdline_ptr & 0xf;
set_fs(cmdline_ptr >> 4);
@@ -111,8 +111,8 @@ int __cmdline_find_option_bool(u32 cmdline_ptr, const char 
*option)
st_wordskip,/* Miscompare, skip */
} state = st_wordstart;
 
-   if (!cmdline_ptr || cmdline_ptr >= 0x10)
-   return -1;  /* No command line, or inaccessible */
+   if (!cmdline_ptr)
+   return -1;  /* No command line */
 
cptr = cmdline_ptr & 0xf;
set_fs(cmdline_ptr >> 4);
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 10/11] x86, boot: add fields to support load bzImage and ramdisk above 4G

2012-11-24 Thread Yinghai Lu
ext_ramdisk_image/size will record high 32bits for ramdisk info.

xloadflags bit0 will be set if relocatable with 64bit.

Let get_ramdisk_image/size to use ext_ramdisk_image/size to get
right positon for ramdisk.

bootloader will fill value to ext_ramdisk_image/size when it load
ramdisk above 4G.

Also bootloader will check if xloadflags bit0 is set to decicde if
it could load ramdisk high above 4G.

Update header version to 2.12.

-v2: add ext_cmd_line_ptr for above 4G support.
-v3: update to xloadflags from HPA.
-v4: use fields from bootparam instead setup_header accoring to HPA.

Signed-off-by: Yinghai Lu 
Cc: Rob Landley 
Cc: Matt Fleming 
---
 Documentation/x86/boot.txt |   15 ++-
 Documentation/x86/zero-page.txt|3 +++
 arch/x86/boot/compressed/cmdline.c |2 ++
 arch/x86/boot/header.S |   12 ++--
 arch/x86/include/asm/bootparam.h   |8 ++--
 arch/x86/kernel/head64.c   |2 ++
 arch/x86/kernel/setup.c|4 
 7 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/Documentation/x86/boot.txt b/Documentation/x86/boot.txt
index 9efceff..b2d95ae 100644
--- a/Documentation/x86/boot.txt
+++ b/Documentation/x86/boot.txt
@@ -57,6 +57,9 @@ Protocol 2.10:(Kernel 2.6.31) Added a protocol for 
relaxed alignment
 Protocol 2.11: (Kernel 3.6) Added a field for offset of EFI handover
protocol entry point.
 
+Protocol 2.12: (Kernel 3.9) Added three fields for loading bzImage and
+ramdisk above 4G with 64bit in bootparam.
+
  MEMORY LAYOUT
 
 The traditional memory map for the kernel loader, used for Image or
@@ -182,7 +185,7 @@ Offset  Proto   NameMeaning
 0230/4 2.05+   kernel_alignment Physical addr alignment required for kernel
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
 0235/1 2.10+   min_alignment   Minimum alignment, as a power of two
-0236/2 N/A pad3Unused
+0236/2 2.12+   xloadflags  Boot protocol option flags
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
 023C/4 2.07+   hardware_subarch Hardware subarchitecture
 0240/8 2.07+   hardware_subarch_data Subarchitecture-specific data
@@ -581,6 +584,16 @@ Protocol:  2.10+
   misaligned kernel.  Therefore, a loader should typically try each
   power-of-two alignment from kernel_alignment down to this alignment.
 
+Field name: xloadflags
+Type:   modify (obligatory)
+Offset/size:0x236/2
+Protocol:   2.12+
+
+  This field is a bitmask.
+
+  Bit 0 (read): LOADED_ABOVE_4G
+- If 1, kernel/boot_params/cmdline/ramdisk could be above 4g
+
 Field name:cmdline_size
 Type:  read
 Offset/size:   0x238/4
diff --git a/Documentation/x86/zero-page.txt b/Documentation/x86/zero-page.txt
index cf5437d..0e19657 100644
--- a/Documentation/x86/zero-page.txt
+++ b/Documentation/x86/zero-page.txt
@@ -19,6 +19,9 @@ OffsetProto   NameMeaning
 090/010ALL hd1_infohd1 disk parameter, OBSOLETE!!
 0A0/010ALL sys_desc_table  System description table (struct 
sys_desc_table)
 0B0/010ALL olpc_ofw_header OLPC's OpenFirmware CIF and friends
+0C0/004 ALLext_ramdisk_image ramdisk_image high 32bits
+0C4/004 ALLext_ramdisk_size  ramdisk_size high 32bits
+0C8/004 ALLext_cmd_line_ptr  cmd_line_ptr high 32bits
 140/080ALL edid_info   Video mode setup (struct edid_info)
 1C0/020ALL efi_infoEFI 32 information (struct efi_info)
 1E0/004ALL alk_mem_k   Alternative mem check, in KB
diff --git a/arch/x86/boot/compressed/cmdline.c 
b/arch/x86/boot/compressed/cmdline.c
index b4c913c..bffd73b 100644
--- a/arch/x86/boot/compressed/cmdline.c
+++ b/arch/x86/boot/compressed/cmdline.c
@@ -17,6 +17,8 @@ static unsigned long get_cmd_line_ptr(void)
 {
unsigned long cmd_line_ptr = real_mode->hdr.cmd_line_ptr;
 
+   cmd_line_ptr |= (u64)real_mode->ext_cmd_line_ptr << 32;
+
return cmd_line_ptr;
 }
 int cmdline_find_option(const char *option, char *buffer, int bufsize)
diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
index 2a01744..ae5b00d 100644
--- a/arch/x86/boot/header.S
+++ b/arch/x86/boot/header.S
@@ -279,7 +279,7 @@ _start:
# Part 2 of the header, from the old setup.S
 
.ascii  "HdrS"  # header signature
-   .word   0x020b  # header version number (>= 0x0105)
+   .word   0x020c  # header version number (>= 0x0105)
# or else old loadlin-1.5 will fail)
.globl realmode_swtch
 realmode_swtch:.word   0, 0# default_switch, SETUPSEG
@@ -369,7 +369,15 @@ relocatable_kernel:.byte 1
 relocatable_kernel:.byte 0
 #endif
 min_alignment: .byte MIN_KERNEL_ALIGN_LG2  # minimum alignment
-pad3:  .word 0
+
+xloadflags:
+LOADED_ABOVE_4G

[PATCH v4 03/11] x86, 64bit: Set extra ident mapping for whole kernel range

2012-11-24 Thread Yinghai Lu
Current when kernel is loaded above 1G, only [_text, _text+2M] is set
up with extra ident page table.
That is not enough, some variables that could be used early are out of
that range, like BRK for early page table.
Need to set map for [_text, _end] include text/data/bss/brk...

Also current kernel is not allowed to be loaded above 512g, it thinks
that address is too big.
We need to add one extra spare page for level3 to point that 512g range.
Need to check _text range and set level4 pg with that spare level3 page,
and set level3 with level2 page to cover [_text, _end] with extra mapping.

At last, to handle crossing GB boundary, we need to add another
level2 spare page. To handle crossing 512GB boundary, we need to
add another level3 spare page to next 512G range.

Test on with kexec-tools with local test code to force loading kernel
cross 1G, 5G, 512g, 513g.

We need this to put relocatable 64bit bzImage high above 1g.

-v4: add crossing GB boundary handling.

Signed-off-by: Yinghai Lu 
Cc: "Eric W. Biederman" 
---
 arch/x86/kernel/head_64.S |  149 ++---
 1 files changed, 139 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 94bf9cc..036dd0e 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -78,12 +78,6 @@ startup_64:
testl   %eax, %eax
jnz bad_address
 
-   /* Is the address too large? */
-   leaq_text(%rip), %rdx
-   movq$PGDIR_SIZE, %rax
-   cmpq%rax, %rdx
-   jae bad_address
-
/* Fixup the physical addresses in the page table
 */
addq%rbp, init_level4_pgt + 0(%rip)
@@ -97,25 +91,153 @@ startup_64:
 
addq%rbp, level2_fixmap_pgt + (506*8)(%rip)
 
-   /* Add an Identity mapping if I am above 1G */
+   /* Add an Identity mapping if _end is above 1G */
+   leaq_end(%rip), %r9
+   decq%r9
+   cmp $PUD_SIZE, %r9
+   jl  ident_complete
+
+   /* get end */
+   andq$PMD_PAGE_MASK, %r9
+   /* round start to 1G if it is below 1G */
leaq_text(%rip), %rdi
andq$PMD_PAGE_MASK, %rdi
+   cmp $PUD_SIZE, %rdi
+   jg  1f
+   movq$PUD_SIZE, %rdi
+1:
+   /* get 512G index */
+   movq%r9, %r8
+   shrq$PGDIR_SHIFT, %r8
+   andq$(PTRS_PER_PGD - 1), %r8
+   movq%rdi, %rax
+   shrq$PGDIR_SHIFT, %rax
+   andq$(PTRS_PER_PGD - 1), %rax
+
+   /* cross two 512G ? */
+   cmp %r8, %rax
+   jne set_level3_other_512g
+
+   /* all in first 512G ? */
+   cmp $0, %rax
+   je  skip_level3_spare
+
+   /* same 512G other than first 512g */
+   leaq(level3_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), 
%rdx
+   leaqinit_level4_pgt(%rip), %rbx
+   movq%rdx, 0(%rbx, %rax, 8)
+   addq$L4_PAGE_OFFSET, %rax
+   movq%rdx, 0(%rbx, %rax, 8)
+
+   /* get 1G index */
+   movq%r9, %r8
+   shrq$PUD_SHIFT, %r8
+   andq$(PTRS_PER_PUD - 1), %r8
+   movq%rdi, %rax
+   shrq$PUD_SHIFT, %rax
+   andq$(PTRS_PER_PUD - 1), %rax
+
+   /* same 1G ? */
+   cmp %r8, %rax
+   je  set_level2_start_only_not_first_512g
+
+   /* set level2 for end */
+   leaqlevel3_spare_pgt(%rip), %rbx
+   leaq(level2_spare2_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), 
%rdx
+   movq%rdx, 0(%rbx, %r8, 8)
 
+set_level2_start_only_not_first_512g:
+   leaqlevel3_spare_pgt(%rip), %rbx
+   leaq(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), 
%rdx
+   movq%rdx, 0(%rbx, %rax, 8)
+
+   jmp set_level2_spare
+
+set_level3_other_512g:
+   /* for level2 last on first 512g */
+   leaqlevel3_ident_pgt(%rip), %rcx
+   /* start is in first 512G ? */
+   cmp $0, %rax
+   je  set_level2_start_other_512g
+
+   /* Set level3 for _text */
+   leaq(level3_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), 
%rdx
+   leaqinit_level4_pgt(%rip), %rbx
+   movq%rdx, 0(%rbx, %rax, 8)
+   addq$L4_PAGE_OFFSET, %rax
+   movq%rdx, 0(%rbx, %rax, 8)
+
+   /* for level2 last not on first 512G */
+   leaqlevel3_spare_pgt(%rip), %rcx
+
+set_level2_start_other_512g:
+   /* always need to set level2 */
movq%rdi, %rax
shrq$PUD_SHIFT, %rax
andq$(PTRS_PER_PUD - 1), %rax
-   jz  ident_complete
-
+   movq%rcx, %rbx/* %rcx has level3_spare_pgt or level3_ident_pgt 
*/
leaq(level2_spare_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), 
%rdx
+   movq%rdx, 0(%rbx, %rax, 8)
+
+set_level3_end_other_512g:
+   leaq(level3_spare2_pgt - __START_KERNEL_map + _KERNPG_TABLE)(%rbp), 
%rdx
+   leaqinit_level4_pgt(%rip), %rbx
+   movq%rdx, 0(%rbx, %r8, 

[PATCH v4 02/11] x86, boot: Move lldt/ltr out of 64bit code section

2012-11-24 Thread Yinghai Lu
commit 08da5a2ca

x86_64: Early segment setup for VT

add lldt/ltr to clean more segments.

Those code are put in code64, and it is using gdt that is only
loaded from code32 path.

That breaks booting with 64bit bootloader that does not go through
code32 path. It get at startup_64 directly,  and it has different
gdt.

Move those lines into code32 after their gdt is loaded.

Signed-off-by: Yinghai Lu 
Cc: Zachary Amsden 
Cc: Matt Fleming 
---
 arch/x86/boot/compressed/head_64.S |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index 2c3cee4..375af23 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -154,6 +154,12 @@ ENTRY(startup_32)
btsl$_EFER_LME, %eax
wrmsr
 
+   /* After gdt is loaded */
+   xorl%eax, %eax
+   lldt%ax
+   movl$0x20, %eax
+   ltr %ax
+
/*
 * Setup for the jump to 64bit mode
 *
@@ -245,9 +251,6 @@ preferred_addr:
movl%eax, %ss
movl%eax, %fs
movl%eax, %gs
-   lldt%ax
-   movl$0x20, %eax
-   ltr %ax
 
/*
 * Compute the decompressed kernel start address.  It is where
-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v4 00/11] x86, boot, 64bit: Add support for loading ramdisk and bzImage above 4G

2012-11-24 Thread Yinghai Lu
Now we have limit kdump reseved under 896M, because kexec has the limitation.
and also bzImage need to stay under 4g.

To make kexec/kdump could use range above 4g, we need to make bzImage and
ramdisk could be loaded above 4g.
During booting bzImage will be unpacked on same postion and stay high.

The patches add fields in setup_header and boot_params to
1. get info about ramdisk position info above 4g from bootloader/kexec
2. get info about cmd_line_ptr info above 4g from bootloader/kexec
2. set xloadflags bit0 in header for bzImage and bootloader/kexec load
   could check that to decide if it could to put bzImage high.

This patches is tested with kexec tools with local changes and they are sent
to kexec list.

could be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git 
for-x86-boot

and it is on top of for-x86-mm

-v2: add ext_cmd_line_ptr support, and handle boot_param/cmd_line is above
 4G case.
-v3: according to hpa, use xloadflags instead code32_start_offset.
 0x200 will not be changed...
-v4: move ext_ramdisk_image/ext_ramdisk_size/ext_cmd_line_ptr to boot_params.
 add handling cross GB boundary case.

Thanks

Yinghai

Yinghai Lu (11):
  x86, boot: move verify_cpu.S after 0x200
  x86, boot: Move lldt/ltr out of 64bit code section
  x86, 64bit: Set extra ident mapping for whole kernel range
  x86: Merge early_reserve_initrd for 32bit and 64bit
  x86: add get_ramdisk_image/size()
  x86, boot: add get_cmd_line_ptr()
  x86, boot: move checking of cmd_line_ptr out of common path
  x86, boot: update cmd_line_ptr to unsigned long
  x86: use io_remap to access real_mode_data
  x86, boot: add fields to support load bzImage and ramdisk above 4G
  x86: remove 1024G limitation for kexec buffer on 64bit

 Documentation/x86/boot.txt |   15 +++-
 Documentation/x86/zero-page.txt|3 +
 arch/x86/boot/boot.h   |   18 +++-
 arch/x86/boot/cmdline.c|   12 ++--
 arch/x86/boot/compressed/cmdline.c |   12 +++-
 arch/x86/boot/compressed/head_64.S |   14 ++-
 arch/x86/boot/header.S |   12 +++-
 arch/x86/include/asm/bootparam.h   |8 ++-
 arch/x86/include/asm/kexec.h   |6 +-
 arch/x86/kernel/head32.c   |   11 ---
 arch/x86/kernel/head64.c   |   41 ++
 arch/x86/kernel/head_64.S  |  153 +---
 arch/x86/kernel/setup.c|   53 ++---
 13 files changed, 285 insertions(+), 73 deletions(-)

-- 
1.7.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 05/11] pwm: pwm-tiecap: pinctrl support

2012-11-24 Thread Thierry Reding
On Fri, Nov 23, 2012 at 01:48:51PM +0100, Peter Korsgaard wrote:
> > "Thierry" == Thierry Reding  writes:
> 
> Hi,
> 
>  Thierry> Everybody seems to be doing it with a warning, so I guess
>  Thierry> that's fine for now. I just find it strange that if you
>  Thierry> request the default pin group to be selected when in fact the
>  Thierry> hardware doesn't support pinctrl at all you shouldn't be
>  Thierry> getting an error either.
> 
> There's several different situations:
> 
> - Platform without pinctrl support
> - Platform with pinctrl support but no pinmux specified in dt for device
>   (E.G. pinmux setup in bootloader)
> - Pinmux specified in dt
> - Some kind of misconfiguration in dt
> 
> You could argue that devm_pinctrl_get_select_default() shouldn't return
> an error for the first situation, but how should it be able to know the
> difference between 2 and 4?

In case where the platform supports pinctrl but no pinmux is specified
for the device it should just assume that no pinmuxing is needed. That
sounds like the most logical behaviour to me. In those cases pinctrl
could just assume that the default has already been selected and not
return an error.

But you can't reasonably expect to cope with misconfigured DT content.
Heck, there's no way for you to even know if it is misconfigured.

That said, I'm not sure how much of an issue this really is. Pinmuxing
is only used for functions local to a given chip, right? So if an SoC
supports pinctrl and a given peripheral needs pinmuxing then we can
reasonably assume that your second case can't happen, can't we?

Thierry


pgpnHdPO8wy8w.pgp
Description: PGP signature


Re: [PATCH 22/24] MAINTAINERS: fix BAST

2012-11-24 Thread Paul Bolle
On Sat, 2012-11-24 at 21:28 +0100, Paul Bolle wrote:
> I submitted an identical patch in https://lkml.org/lkml/2012/6/25/154 .
> In a reaction on that patch Kukjin Kim wondered whether "Simtec Linux
> Team and Vincent are still supporting BAST". No-one bothered to reply on
> Kukjin's message, which suggests BAST is unsupported and this entry
> might as well be dropped entirely.

And this autoreply to my above message also suggests BAST is
unsupported:
> This message was created automatically by mail delivery software.
> 
> A message that you sent could not be delivered to one or more of its
> recipients. This is a permanent error. The following address(es)
> failed:
> 
>   vi...@simtec.co.uk
> retry time not reached for any host after a long failure period
> 
> -- This is a copy of the message, including all the headers. 
>[...]


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 22/24] MAINTAINERS: fix BAST

2012-11-24 Thread Paul Bolle
On Fri, 2012-11-23 at 22:26 -0200, Cesar Eduardo Barros wrote:
> These files were renamed by commit 85fd6d6 (ARM: S3C2410: move
> mach-s3c2410/* into mach-s3c24xx/).

I submitted an identical patch in https://lkml.org/lkml/2012/6/25/154 .
In a reaction on that patch Kukjin Kim wondered whether "Simtec Linux
Team and Vincent are still supporting BAST". No-one bothered to reply on
Kukjin's message, which suggests BAST is unsupported and this entry
might as well be dropped entirely.


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread H. Peter Anvin

On 11/24/2012 09:32 AM, H. Peter Anvin wrote:

On 11/24/2012 04:37 AM, Eric W. Biederman wrote:


Certainly /sbin/kexec isn't bothering to calculate the end of the setup
header and just being far more conservative and using all of the 16bit
real mode code as it's initializer.



That's not conservative... that's just plain wrong.  It means you're
initializing the fields in struct boot_params with garbage instead of a
predictable value (zero).

We could work around it with a sentinel hack... except you *also*
probably modify *some* fields and now we have a horrid mix of
initialized and uninitialized fields to sort out... and there really
isn't any sane way for the kernel to sort that out.

We have a huge problem on our hands now because of it.



So, given the mess we now have on our hands... any suggestions how to 
best solve it?  There is the option of simply declaring old kexec 
binaries broken; they will then not work reliably with newer kernels, if 
they even work reliably now -- it is hard to know for certain.


Another option is the sentinel hack I mentioned... permanently reserve a 
field that if it is nonzero we will have the kernel erase the remainder 
of struct boot_params... except for *some fields* to be defined.  This 
is a total hack workaround and will not work if we have the same class 
of problems in another bootloader which initializes different fields, 
but, well, it might provide some value and might solve problems with 
other bootloaders which have similar enough misbehavior.


The final idea would be to declare the current struct boot_params frozen 
indefinitely, and instead create a whole new set of data structures 
going forward, perhaps inserting them into the linked list.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

2012-11-24 Thread Justin Piszcz
Hi,

Is the following normal on an X9SRL-F board (bios 1.0a)?

In the manual it states:

Data Direct I/O
Select Enabled to enable Intel I/OAT (I/O Acceleration Technology), which
significantly reduces CPU overhead by leveraging CPU architectural
improvements and freeing the system resource for other tasks. The options
are Disabled and Enabled.

Default is Enabled.

When enabled in the kernel, I see the following:

[0.696357] ioatdma: Intel(R) QuickData Technology Driver 4.00
[0.696487] ioatdma :00:04.0: channel error register unreachable
[0.696546] ioatdma :00:04.0: channel enumeration error
[0.696604] ioatdma :00:04.0: Intel(R) I/OAT DMA Engine init failed
[0.696721] ioatdma :00:04.1: channel error register unreachable
[0.696779] ioatdma :00:04.1: channel enumeration error
[0.697522] ioatdma :00:04.1: Intel(R) I/OAT DMA Engine init failed
[0.697617] ioatdma :00:04.2: channel error register unreachable
[0.697681] ioatdma :00:04.2: channel enumeration error
[0.697739] ioatdma :00:04.2: Intel(R) I/OAT DMA Engine init failed
[0.697831] ioatdma :00:04.3: channel error register unreachable
[0.697890] ioatdma :00:04.3: channel enumeration error
[0.697948] ioatdma :00:04.3: Intel(R) I/OAT DMA Engine init failed
[0.698037] ioatdma :00:04.4: channel error register unreachable
[0.698095] ioatdma :00:04.4: channel enumeration error
[0.698153] ioatdma :00:04.4: Intel(R) I/OAT DMA Engine init failed
[0.698245] ioatdma :00:04.5: channel error register unreachable
[0.698303] ioatdma :00:04.5: channel enumeration error
[0.698360] ioatdma :00:04.5: Intel(R) I/OAT DMA Engine init failed
[0.698449] ioatdma :00:04.6: channel error register unreachable
[0.698508] ioatdma :00:04.6: channel enumeration error
[0.698565] ioatdma :00:04.6: Intel(R) I/OAT DMA Engine init failed
[0.698676] ioatdma :00:04.7: channel error register unreachable
[0.698735] ioatdma :00:04.7: channel enumeration error
[0.698792] ioatdma :00:04.7: Intel(R) I/OAT DMA Engine init failed

--

Also, I tried using ASPM (enabled in BIOS), but since ACPI Linux query is
ignored, it fails to work:
[0.562229] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored

I assume this is something Supermicro has to fix?

Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: possible regression in kernel 3.6: system hangs during nightly tape backup

2012-11-24 Thread Tilman Schmidt
Any ideas on that? I'm currently avoiding the 3.6 series because
of that problem but would be willing to reproduce the hang if I'd
know what to do once it happens, ie. what kind of information to
collect in order to identify the cause of the problem.
I may also try 3.7-rc if there's any interest.

Am 20.11.2012 01:14, schrieb Tilman Schmidt:
> For the 4th time now after switching to kernel 3.6, my system became
> unresponsive during the nightly Bacula backup run. It looks as if
> all disk accesses are suddenly blocked:
> - Desktop apps stop responding one after another, starting with
>   Firefox followed by other "heavy" apps, while Konsole windows
>   continue being usable for a while.
> - "top" shows the load average steadily increasing with no process
>   actually consuming relevant quantities of CPU.
> - I can do "dmesg > /root/dmesg.out" followed by "less /root/dmesg.out"
>   in a Konsole window just fine, but after the inevitable hard reset
>   the file /root/dmesg.out isn't there.
> - The "sync" command hangs indefinitely.
> - The "shutdown" command and ctrl/alt/Del emit "system going down"
>   broadcast messages but never get anywhere.
> - Killing processes manually works for some (bacula-sd even ejects
>   the tape before exiting) but most remain in state D or Z.
> - Eventually, all text consoles are blocked and a hardware reset is
>   the only remaining option.
> - After the reboot, a Bacula spool file is left behind in
>   /var/spool/bacula, proof that the hang happened during the backup.
> 
> This does not happen during every backup run, but frequently enough
> to be annoying. (About once per week.) It never happened with kernel
> 3.5. For comparison went back to kernel 3.5.7 for a week and it
> never happened during that time. Last night I booted 3.6.7 and the
> very next backup caused the hang again. The last kernel message that
> made it to the syslog on disk was
> 
> Nov 19 23:05:04 xenon kernel: [73877.128546] st0: Block limits 256 -
> 524288 bytes.
> 
> triggered by the start of the backup. In dmesg the next message was
> 
> [74401.249091] INFO: task flush-253:2:1320 blocked for more than 120
> seconds.
> 
> followed by a backtrace. I have photos of the remaining dmesg output
> which I'll try to upload somewhere accessible tomorrow.
> 
> Hardware configuration:
> Intel Pentium D, Intel DQ965GF mainboard, 6 GB RAM
> onboard S-ATA controller driving two 500 GB S-ATA disks
> and a Pioneer DVR-216D DVD-RW drive
> Adaptec 29160B Ultra160 SCSI adapter driving a
> Tandberg TS400 LTO-2 tape drive
> 
> Disk configuration: md RAID1, LVM, ext3 and ext4 volumes
> 
> Software: Opensuse 11.4 64 bit, vanilla kernel 3.5.7 and 3.6.7,
> Bacula 5.2.12
> 
> HTH
> T.
> 


-- 
Tilman SchmidtE-Mail: til...@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 3/4] ARM: Dove: Convert to DT GPIO and pinctrl

2012-11-24 Thread Jason Cooper
On Sat, Nov 24, 2012 at 07:10:24PM +0100, Thomas Petazzoni wrote:
> Dear Jason Cooper,
> 
> On Sat, 24 Nov 2012 10:00:04 -0500, Jason Cooper wrote:
> 
> > Yes, so that's what I thought happened.  This would have made orion/dt
> > depend upon mvebu/everything.  It already had two other dependencies.
> > Not ideal.
> > 
> > The good thing is, the build is not broken.  Once v3.8-rc1 drops with
> > all of our stuff merged, I'll post a fixup patch adding this back in.
> 
> It unfortunately means that Dove will be basically unbootable in
> 3.8-rc1, as the driver will not be clk_get()ing its gatable clock, and
> the clock driver will disable it. Maybe we can just live with it, I
> don't know.

Yes, I thought as much after I sent this reply.  Definitely a choice of
the lesser of two evils.  As long as we don't break the build or have
horrendous merge conflicts, I think it's tolerable.

Any one who is booting -rc1's is typically bug hunting.  This means
Sebastian, who has been CC'd on all of this.  I don't want to rely on
this in the future, but doing it once due to the circumstances is
something I'm comfortable answering to.

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 20/24] MAINTAINERS: remove include/linux/of_pwm.h

2012-11-24 Thread Thierry Reding
On Fri, Nov 23, 2012 at 10:26:44PM -0200, Cesar Eduardo Barros wrote:
> Added by commit 200efed (pwm: Take over maintainership of the PWM
> subsystem), but I could not find any trace of that file being ever added
> to the repository.
> 
> Cc: Thierry Reding 
> Signed-off-by: Cesar Eduardo Barros 
> ---
>  MAINTAINERS | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 3a3a57a..331b3a2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5946,7 +5946,6 @@ T:  git git://gitorious.org/linux-pwm/linux-pwm.git
>  F:   Documentation/pwm.txt
>  F:   Documentation/devicetree/bindings/pwm/
>  F:   include/linux/pwm.h
> -F:   include/linux/of_pwm.h
>  F:   drivers/pwm/
>  F:   drivers/video/backlight/pwm_bl.c
>  F:   include/linux/pwm_backlight.h

Yes, that file never made it into the final patch series and I must have
forgotten to remove the entry from MAINTAINERS. Thanks for spotting:

Acked-by: Thierry Reding 


pgp9YY3mAFBhH.pgp
Description: PGP signature


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread H. Peter Anvin

On 11/24/2012 10:12 AM, Yinghai Lu wrote:


Now I have a fix ready, also found fix for kexec real mode path working
with recently kernel by settin heap end ptr correctly.

Please decide if we need to add 64 bit entry offset in setup header,
Or just stick to 0x200.

I check grub2 and gujin and qemu , looks like they are all using bzimage
16 bit entry.

Do you have pointer for any boot loader that is using 64 bit entry in
bzimage?



I'm fairly certain Grub2 does *not* use the 16-bit entry point by 
default even on BIOS platforms, needing the "linux16" directive to 
behave sanely (this is one of many complete facepalsm in Grub2).


efilinux or elilo compiled for a 64-bit EFI platform would be a good 
example, bit even if we can't find a 64-bit boot loader example I don't 
think we can rule one out, so let's just define 0x200 as an ABI constant 
and be done with it.  The cost is minimal and the consequences of 
changing it are potentially severe.


-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] regulator: add device tree support for max8997

2012-11-24 Thread Thomas Abraham
On 24 November 2012 23:29, Mark Brown
 wrote:
> On Fri, Nov 23, 2012 at 01:33:15PM +0530, Thomas Abraham wrote:
>
>> This v6 patch is rebased to the latest max8997 driver code and there are no
>> functional changes from v5.
>
> That doesn't seem to be in mainline yet so the patch won't apply.

Hi Mark,

The max8997 driver is mainlined. I actually meant to say that, this v6
version of dt support patch for max8997, is similar in functionality
to the v5 version of this patch. I did prepare this patch based on
your latest for-next branch.

Thanks,
Thomas.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] ARM: Dove: Convert to DT GPIO and pinctrl

2012-11-24 Thread Thomas Petazzoni
Dear Jason Cooper,

On Sat, 24 Nov 2012 10:00:04 -0500, Jason Cooper wrote:

> Yes, so that's what I thought happened.  This would have made orion/dt
> depend upon mvebu/everything.  It already had two other dependencies.
> Not ideal.
> 
> The good thing is, the build is not broken.  Once v3.8-rc1 drops with
> all of our stuff merged, I'll post a fixup patch adding this back in.

It unfortunately means that Dove will be basically unbootable in
3.8-rc1, as the driver will not be clk_get()ing its gatable clock, and
the clock driver will disable it. Maybe we can just live with it, I
don't know.

Thomas
-- 
Thomas Petazzoni, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] uprobes: Change filter_chain() to iterate ->consumers list

2012-11-24 Thread Oleg Nesterov
Now that it safe to use ->consumer_rwsem under ->mmap_sem we can
almost finish the implementation of filter_chain(). It still lacks
the actual uc->filter(...) call but othewrwise it is ready, just
it pretends that ->filter() always returns true.

Signed-off-by: Oleg Nesterov 
---
 kernel/events/uprobes.c |   21 +
 1 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 03ffbb5..873c993 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -614,14 +614,19 @@ static int prepare_uprobe(struct uprobe *uprobe, struct 
file *file,
 
 static bool filter_chain(struct uprobe *uprobe)
 {
-   /*
-* TODO:
-*  for_each_consumer(uc)
-*  if (uc->filter(...))
-*  return true;
-*  return false;
-*/
-   return uprobe->consumers != NULL;
+   struct uprobe_consumer *uc;
+   bool ret = false;
+
+   down_read(>consumer_rwsem);
+   for (uc = uprobe->consumers; uc; uc = uc->next) {
+   /* TODO: ret = uc->filter(...) */
+   ret = true;
+   if (ret)
+   break;
+   }
+   up_read(>consumer_rwsem);
+
+   return ret;
 }
 
 static int
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4] uprobes: Kill uprobe->copy_mutex

2012-11-24 Thread Oleg Nesterov
Now that ->register_rwsem is safe under ->mmap_sem we can kill
->copy_mutex and abuse down_write(>consumer_rwsem).

This makes prepare_uprobe() even more ugly, but we should kill
it anyway.

Signed-off-by: Oleg Nesterov 
---
 kernel/events/uprobes.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 97c3874..1e047f8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -91,7 +91,6 @@ struct uprobe {
atomic_tref;
struct rw_semaphore register_rwsem;
struct rw_semaphore consumer_rwsem;
-   struct mutexcopy_mutex; /* TODO: kill me and 
UPROBE_COPY_INSN */
struct list_headpending_list;
struct uprobe_consumer  *consumers;
struct inode*inode; /* Also hold a ref to inode */
@@ -450,7 +449,6 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset)
uprobe->offset = offset;
init_rwsem(>register_rwsem);
init_rwsem(>consumer_rwsem);
-   mutex_init(>copy_mutex);
/* For now assume that the instruction need not be single-stepped */
__set_bit(UPROBE_SKIP_SSTEP, >flags);
 
@@ -578,7 +576,8 @@ static int prepare_uprobe(struct uprobe *uprobe, struct 
file *file,
if (test_bit(UPROBE_COPY_INSN, >flags))
return ret;
 
-   mutex_lock(>copy_mutex);
+   /* TODO: move this into _register, until then we abuse this sem. */
+   down_write(>consumer_rwsem);
if (test_bit(UPROBE_COPY_INSN, >flags))
goto out;
 
@@ -602,7 +601,7 @@ static int prepare_uprobe(struct uprobe *uprobe, struct 
file *file,
set_bit(UPROBE_COPY_INSN, >flags);
 
  out:
-   mutex_unlock(>copy_mutex);
+   up_write(>consumer_rwsem);
 
return ret;
 }
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] uprobes: Kill UPROBE_RUN_HANDLER flag

2012-11-24 Thread Oleg Nesterov
Simply remove UPROBE_RUN_HANDLER and the corresponding code.

It can only help if uprobe has a single consumer, and in fact
it is no longer needed after handler_chain() was changed to use
->register_rwsem, we simply can not race with uprobe_register().

Signed-off-by: Oleg Nesterov 
---
 kernel/events/uprobes.c |   23 +--
 1 files changed, 5 insertions(+), 18 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 873c993..97c3874 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -83,10 +83,8 @@ static atomic_t uprobe_events = ATOMIC_INIT(0);
 
 /* Have a copy of original instruction */
 #define UPROBE_COPY_INSN   0
-/* Dont run handlers when first register/ last unregister in progress*/
-#define UPROBE_RUN_HANDLER 1
 /* Can skip singlestep */
-#define UPROBE_SKIP_SSTEP  2
+#define UPROBE_SKIP_SSTEP  1
 
 struct uprobe {
struct rb_node  rb_node;/* node in the rb tree */
@@ -475,9 +473,6 @@ static void handler_chain(struct uprobe *uprobe, struct 
pt_regs *regs)
 {
struct uprobe_consumer *uc;
 
-   if (!test_bit(UPROBE_RUN_HANDLER, >flags))
-   return;
-
down_read(>register_rwsem);
for (uc = uprobe->consumers; uc; uc = uc->next)
uc->handler(uc, regs);
@@ -825,13 +820,8 @@ static int register_for_each_vma(struct uprobe *uprobe, 
bool is_register)
 
 static int __uprobe_register(struct uprobe *uprobe, struct uprobe_consumer *uc)
 {
-   int err;
-
consumer_add(uprobe, uc);
-   err = register_for_each_vma(uprobe, true);
-   if (!err) /* TODO: pointless unless the first consumer */
-   set_bit(UPROBE_RUN_HANDLER, >flags);
-   return err;
+   return register_for_each_vma(uprobe, true);
 }
 
 static void __uprobe_unregister(struct uprobe *uprobe, struct uprobe_consumer 
*uc)
@@ -842,12 +832,9 @@ static void __uprobe_unregister(struct uprobe *uprobe, 
struct uprobe_consumer *u
return;
 
err = register_for_each_vma(uprobe, false);
-   if (!uprobe->consumers) {
-   clear_bit(UPROBE_RUN_HANDLER, >flags);
-   /* TODO : cant unregister? schedule a worker thread */
-   if (!err)
-   delete_uprobe(uprobe);
-   }
+   /* TODO : cant unregister? schedule a worker thread */
+   if (!uprobe->consumers && !err)
+   delete_uprobe(uprobe);
 }
 
 /*
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] uprobes: Introduce uprobe->register_rwsem

2012-11-24 Thread Oleg Nesterov
Introduce uprobe->register_rwsem. It is taken for writing around
__uprobe_register/unregister.

Change handler_chain() to use this sem rather than consumer_rwsem.

The main reason for this change is that we have the nasty problem
with mmap_sem/consumer_rwsem dependency. filter_chain() needs to
protect uprobe->consumers like handler_chain(), but they can not
use the same lock. filter_chain() can be called under ->mmap_sem
(currently this is always true), but we want to allow ->handler()
to play with the probed task's memory, and this needs ->mmap_sem.

Alternatively we could use srcu, but synchronize_srcu() is very
slow and ->register_rwsem allows us to do more. In particular, we
can teach handler_chain() to do remove_breakpoint() if this bp is
"nacked" by all consumers, we know that we can't race with the
new consumer which does uprobe_register().

See also the next patches. uprobes_mutex[] is almost ready to die.

Signed-off-by: Oleg Nesterov 
---
 kernel/events/uprobes.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c80507d..03ffbb5 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -91,6 +91,7 @@ static atomic_t uprobe_events = ATOMIC_INIT(0);
 struct uprobe {
struct rb_node  rb_node;/* node in the rb tree */
atomic_tref;
+   struct rw_semaphore register_rwsem;
struct rw_semaphore consumer_rwsem;
struct mutexcopy_mutex; /* TODO: kill me and 
UPROBE_COPY_INSN */
struct list_headpending_list;
@@ -449,6 +450,7 @@ static struct uprobe *alloc_uprobe(struct inode *inode, 
loff_t offset)
 
uprobe->inode = igrab(inode);
uprobe->offset = offset;
+   init_rwsem(>register_rwsem);
init_rwsem(>consumer_rwsem);
mutex_init(>copy_mutex);
/* For now assume that the instruction need not be single-stepped */
@@ -476,10 +478,10 @@ static void handler_chain(struct uprobe *uprobe, struct 
pt_regs *regs)
if (!test_bit(UPROBE_RUN_HANDLER, >flags))
return;
 
-   down_read(>consumer_rwsem);
+   down_read(>register_rwsem);
for (uc = uprobe->consumers; uc; uc = uc->next)
uc->handler(uc, regs);
-   up_read(>consumer_rwsem);
+   up_read(>register_rwsem);
 }
 
 static void consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc)
@@ -873,9 +875,11 @@ int uprobe_register(struct inode *inode, loff_t offset, 
struct uprobe_consumer *
mutex_lock(uprobes_hash(inode));
uprobe = alloc_uprobe(inode, offset);
if (uprobe) {
+   down_write(>register_rwsem);
ret = __uprobe_register(uprobe, uc);
if (ret)
__uprobe_unregister(uprobe, uc);
+   up_write(>register_rwsem);
}
mutex_unlock(uprobes_hash(inode));
if (uprobe)
@@ -899,7 +903,9 @@ void uprobe_unregister(struct inode *inode, loff_t offset, 
struct uprobe_consume
return;
 
mutex_lock(uprobes_hash(inode));
+   down_write(>register_rwsem);
__uprobe_unregister(uprobe, uc);
+   up_write(>register_rwsem);
mutex_unlock(uprobes_hash(inode));
put_uprobe(uprobe);
 }
-- 
1.5.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] uprobes: locking changes for filtering

2012-11-24 Thread Oleg Nesterov
Hello.

On top of
"[PATCH 0/7] uprobes: register/unregister preparations for filtering"

4/4 is not really needed and I won't insist if you dislike it.
Just this ->copy_mutex annoys me ;)

Please review. filter_chain() is almost ready, just we need to
discuss (again) its arguments/etc and reintroduce uc->filter().

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] regulator: add device tree support for max8997

2012-11-24 Thread Mark Brown
On Fri, Nov 23, 2012 at 01:33:15PM +0530, Thomas Abraham wrote:

> This v6 patch is rebased to the latest max8997 driver code and there are no
> functional changes from v5.

That doesn't seem to be in mainline yet so the patch won't apply.


signature.asc
Description: Digital signature


Re: [PATCH 2/3] regulator: max8997: limit the number of dvs registers programmed in non-dvs mode

2012-11-24 Thread Mark Brown
On Fri, Nov 23, 2012 at 01:33:14PM +0530, Thomas Abraham wrote:
> In case the gpio based volatage selection mode is not used for either of
> buck 1/2/5, then only the BUCKxDVS1 register need to be programmed. So
> determine whether dvs mode is used and limit the loop count appropriately.

Applied, thanks.


signature.asc
Description: Digital signature


Re: [PATCH 1/3] regulator: max8997: reorder buck1/2/5 dvs setup code

2012-11-24 Thread Mark Brown
On Fri, Nov 23, 2012 at 01:33:13PM +0530, Thomas Abraham wrote:
> The BUCKxDVSx register programming is now moved prior to setting up of the
> gpio based dvs mode. This will ensure that all the BUCKxDVSx registers
> are programmed with appropriate voltage values before the gpio based dvs
> mode is selected for buck1/2/5.

Applied, thanks.


signature.asc
Description: Digital signature


Re: [PATCH] regulator: max8925: fix compiler warnings

2012-11-24 Thread Mark Brown
On Fri, Nov 23, 2012 at 10:27:12AM +0800, Qing Xu wrote:

> But, in fact, it is not necessary to initialize regulator_idx.

> for (i = 0; i < ARRAY_SIZE(max8925_regulator_info); i++) {
> ri = _regulator_info[i];
> if (ri->vol_reg == res->start) {

> ** if regulator_idx can not get a match "i" here, it will return
> -EINVAL in below code

> regulator_idx = i;
> break;
> }
> }

> if (i == ARRAY_SIZE(max8925_regulator_info)) {
> dev_err(>dev, "Failed to find regulator %llu\n",
> (unsigned long long)res->start);
> return -EINVAL;
> }

> How to solve such compiler warning?

Typically by reporting a compiler bug, though sometimes in the process
of doing that one finds out that there's some non-obvious way in which
the code can break.


signature.asc
Description: Digital signature


Re: [PATCH] regulator: tps65090: Add MODULE_ALIAS

2012-11-24 Thread Mark Brown
On Fri, Nov 23, 2012 at 11:47:16PM +0800, Axel Lin wrote:
> This driver can be built as a module, add MODULE_ALIAS for it.

Applied, thanks.


signature.asc
Description: Digital signature


Re: [PATCH] gpiolib: rename pin range arguments

2012-11-24 Thread Stephen Warren
On 11/21/2012 12:50 AM, Linus Walleij wrote:
> To be crystal clear on what the arguments mean in this
> funtion dealing with both GPIO and PIN ranges with confusing
> naming, we now have gpio_offset and pin_offset and we are
> on the clear that these are offsets into the specific GPIO
> and pin controller respectively. The GPIO chip itself will
> of course keep track of the base offset into the global
> GPIO number space.

Reviewed-by: Stephen Warren 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread H. Peter Anvin

On 11/24/2012 04:37 AM, Eric W. Biederman wrote:


Certainly /sbin/kexec isn't bothering to calculate the end of the setup
header and just being far more conservative and using all of the 16bit
real mode code as it's initializer.



That's not conservative... that's just plain wrong.  It means you're 
initializing the fields in struct boot_params with garbage instead of a 
predictable value (zero).


We could work around it with a sentinel hack... except you *also* 
probably modify *some* fields and now we have a horrid mix of 
initialized and uninitialized fields to sort out... and there really 
isn't any sane way for the kernel to sort that out.


We have a huge problem on our hands now because of it.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC] spidev.c: add sysfs attributes for SPI configuration

2012-11-24 Thread Federico Vaga
This patch introduce the use of the sysfs attribute for the spidev
configuration. This avoid the user to have a specific program which does
ioctl() on spidev. The user can easily does cat (to read) and echo (to
write) on the sysfs file and configure SPI.

The patch exports the following attributes: bits-per-word, lsb-first,
mode and speed-hz.

Example:
# cat /sys/bus/spi/devices/spi1.0/speed-hz
50
# echo 45 > /sys/bus/spi/devices/spi1.0/speed-hz
# dmesg | tail -n 4
spidev spi1.0: DEactivate 60, mr 000f0011
spidev spi1.0: setup: 449447 Hz bpw 8 mode 0x0 -> csr0 dd02
spidev spi1.0: setup mode 0, 8 bits/w, 45 Hz max --> 0
spidev spi1.0: 45 Hz (max)

Signed-off-by: Federico Vaga 
---
 drivers/spi/spidev.c | 258 +--
 1 file modificato, 208 inserzioni(+), 50 rimozioni(-)

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 830adbe..4aa0832 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -92,6 +93,201 @@ static unsigned bufsiz = 4096;
 module_param(bufsiz, uint, S_IRUGO);
 MODULE_PARM_DESC(bufsiz, "data bytes in biggest supported SPI message");
 
+
+/*-*/
+
+/* SYSFS */
+enum spidev_config_enum {
+   SPIDEV_SPEED_HZ,
+   SPIDEV_BIT_PER_WORD,
+   SPIDEV_LSB_FIRST,
+   SPIDEV_MODE,
+};
+struct spidev_config_attr {
+   struct device_attribute attr;
+   enum spidev_config_enum cmd;
+};
+#define to_spidev_attr(_attr) \
+   container_of(_attr, struct spidev_config_attr, attr)
+
+static int spidev_conf_mode(struct spi_device *spi, u32 tmp)
+{
+   u8 save = spi->mode;
+   int err = 0;
+
+   if (tmp & ~SPI_MODE_MASK)
+   return -EINVAL;
+
+   tmp |= spi->mode & ~SPI_MODE_MASK;
+   spi->mode = (u8)tmp;
+   err = spi_setup(spi);
+   if (err < 0)
+   spi->mode = save;
+   else
+   dev_dbg(>dev, "spi mode %02x\n", tmp);
+
+   return err;
+}
+static int spidev_conf_lsb(struct spi_device *spi, u32 tmp)
+{
+   u8 save = spi->mode;
+   int err = 0;
+
+   if (tmp)
+   spi->mode |= SPI_LSB_FIRST;
+   else
+   spi->mode &= ~SPI_LSB_FIRST;
+   err = spi_setup(spi);
+   if (err < 0)
+   spi->mode = save;
+   else
+   dev_dbg(>dev, "%csb first\n", (tmp ? 'l' : 'm'));
+
+   return err;
+}
+static int spidev_conf_bpw(struct spi_device *spi, u32 tmp)
+{
+   u8 save = spi->bits_per_word;
+   int err = 0;
+
+   spi->bits_per_word = tmp;
+   err = spi_setup(spi);
+   if (err < 0)
+   spi->bits_per_word = save;
+   else
+   dev_dbg(>dev, "%d bits per word\n", tmp);
+
+   return err;
+}
+static int spidev_conf_speedhz(struct spi_device *spi, u32 tmp)
+{
+   u32 save = spi->max_speed_hz;
+   int err = 0;
+
+   spi->max_speed_hz = tmp;
+   err = spi_setup(spi);
+   if (err < 0)
+   spi->max_speed_hz = save;
+   else
+   dev_dbg(>dev, "%d Hz (max)\n", tmp);
+
+   return err;
+}
+
+/* Return to user space the current SPI configuration */
+static ssize_t spidev_show(struct device *dev, struct device_attribute *attr,
+   char *buf)
+{
+   struct spidev_config_attr *sattr = to_spidev_attr(attr);
+   struct spidev_data *spidev;
+   struct spi_device *spi;
+   ssize_t count = 0;
+
+   spidev = spi_get_drvdata(to_spi_device(dev));
+
+   spin_lock_irq(>spi_lock);
+   spi = spi_dev_get(spidev->spi);
+   spin_unlock_irq(>spi_lock);
+
+   mutex_lock(>buf_lock);
+   switch (sattr->cmd) {
+   case SPIDEV_MODE:
+   count = sprintf(buf, "%d\n", (spi->mode & SPI_MODE_MASK));
+   break;
+   case SPIDEV_LSB_FIRST:
+   count = sprintf(buf, "%d\n",
+   ((spi->mode & SPI_LSB_FIRST) ?  1 : 0));
+   break;
+   case SPIDEV_BIT_PER_WORD:
+   count = sprintf(buf, "%d\n", spi->bits_per_word);
+   break;
+   case SPIDEV_SPEED_HZ:
+   count = sprintf(buf, "%d\n", spi->max_speed_hz);
+   break;
+   }
+   mutex_unlock(>buf_lock);
+   spi_dev_put(spi);
+
+   return count;
+}
+/* Configure the SPI from userspace */
+static ssize_t spidev_store(struct device *dev, struct device_attribute *attr,
+   const char *buf, size_t count)
+{
+   struct spidev_config_attr *sattr = to_spidev_attr(attr);
+   struct spidev_data *spidev;
+   struct spi_device *spi;
+   int err = 0;
+   u32 tmp;
+
+   spidev = spi_get_drvdata(to_spi_device(dev));
+
+   spin_lock_irq(>spi_lock);
+   spi = spi_dev_get(spidev->spi);
+   spin_unlock_irq(>spi_lock);
+
+   

[PATCH] Remove unnecessary declarations from Documentation/accounting/getdelays.c

2012-11-24 Thread Anthony G. Basile
From: "Anthony G. Basile" 

stime and utime are declared __u64 but are never used.  On a glibc system
this is harmless lint, but on a uClibc system, because of the difference
in they way header files stack, including stdio.h brings in time.h and
this causes a name collision with stime.  Since these are useless anyhow,
we remove them.

Signed-off-by: Anthony G. Basile 
---
 Documentation/accounting/getdelays.c |1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/Documentation/accounting/getdelays.c 
b/Documentation/accounting/getdelays.c
index 6f706ac..f8ebcde 100644
--- a/Documentation/accounting/getdelays.c
+++ b/Documentation/accounting/getdelays.c
@@ -51,7 +51,6 @@ int dbg;
 int print_delays;
 int print_io_accounting;
 int print_task_context_switch_counts;
-__u64 stime, utime;
 
 #define PRINTF(fmt, arg...) {  \
if (dbg) {  \
-- 
1.7.6.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 2/3] acpi_memhotplug: Add prepare_remove operation

2012-11-24 Thread Wen Congyang
At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> Offlining and removal of memory is now done in the prepare_remove callback,
> not in the remove callback.
> 
> The prepare_remove callback will be called when trying to remove a memory 
> device
> with the following ways:
> 
> 1. send eject request by SCI
> 2. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
> 
> Note that unbinding the acpi driver from a memory device with:
> echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 
> will no longer try to remove the memory. This is in compliance with normal
> unbind driver core semantics, see the discussion in v2 of this patchset:
> https://lkml.org/lkml/2012/11/16/649

If we don't remove it when unbinding it, it may cause kernel panicked.

I have explained in another mail.

Thanks
Wen Congyang

> 
> After a successful unbind of the driver:
> - OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
> return -ENODEV on missing driver.
> - SCI ejects of the memory device also cannot proceed, as they will also get
> a "driver data is NULL" error.
> So the memory can continue to be used safely after unbind.
> 
> Signed-off-by: Vasilis Liaskovitis
> ---
>   drivers/acpi/acpi_memhotplug.c |   18 --
>   1 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index eb30e5a..d0cfbd9 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -55,6 +55,7 @@ MODULE_LICENSE("GPL");
> 
>   static int acpi_memory_device_add(struct acpi_device *device);
>   static int acpi_memory_device_remove(struct acpi_device *device, int type);
> +static int acpi_memory_device_prepare_remove(struct acpi_device *device);
> 
>   static const struct acpi_device_id memory_device_ids[] = {
>   {ACPI_MEMORY_DEVICE_HID, 0},
> @@ -69,6 +70,7 @@ static struct acpi_driver acpi_memory_device_driver = {
>   .ops = {
>   .add = acpi_memory_device_add,
>   .remove = acpi_memory_device_remove,
> + .prepare_remove = acpi_memory_device_prepare_remove,
>   },
>   };
> 
> @@ -448,6 +450,20 @@ static int acpi_memory_device_add(struct acpi_device 
> *device)
>   static int acpi_memory_device_remove(struct acpi_device *device, int type)
>   {
>   struct acpi_memory_device *mem_device = NULL;
> +
> + if (!device || !acpi_driver_data(device))
> + return -EINVAL;
> +
> + mem_device = acpi_driver_data(device);
> +
> + acpi_memory_device_free(mem_device);
> +
> + return 0;
> +}
> +
> +static int acpi_memory_device_prepare_remove(struct acpi_device *device)
> +{
> + struct acpi_memory_device *mem_device = NULL;
>   int result;
> 
>   if (!device || !acpi_driver_data(device))
> @@ -459,8 +475,6 @@ static int acpi_memory_device_remove(struct acpi_device 
> *device, int type)
>   if (result)
>   return result;
> 
> - acpi_memory_device_free(mem_device);
> -
>   return 0;
>   }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario

2012-11-24 Thread Wen Congyang
At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> Consider the following sequence of operations for a hotplugged memory device:
> 
> 1. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 2. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/bind
> 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
> 
> The driver is successfully re-bound to the device in step 2. However step 3 
> will
> not attempt to remove the memory. This is because the acpi_memory_info enabled
> bit for the newly bound driver has not been set to 1. This bit needs to be set
> in the case where the memory is already used by the kernel (add_memory returns
> -EEXIST)

Hmm, I think the reason is that we don't offline/remove memory when
unbinding it
from the driver. I have sent a patch to fix this problem, and this patch
is in
pm tree now. With this patch, we will offline/remove memory when
unbinding it from
the drriver.

Consider the following sequence of operations for a hotplugged memory
device:

1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

If we don't offline/remove the memory, we have no chance to do it in
step 2. After
step2, the memory is used by the kernel, but we have powered off it. It
is very
dangerous.

So this patch is unnecessary now.

Thanks
Wen Congyang

> 
> Setting the enabled bit in this case (in acpi_memory_enable_device) makes the
> driver function properly after a rebind of the driver i.e. eject operation
> attempts to remove memory after a successful rebind.
> 
> I am not sure if this breaks some other usage of the enabled bit (see commit
> 65479472). When is it possible for the memory to be in use by the kernel but
> not managed by the acpi driver, apart from a driver unbind scenario?
> 
> Perhaps the patch is not needed, depending on expected semantics of 
> re-binding.
> Is the newly bound driver supposed to manage the device, if it was earlier
> managed by the same driver?
> 
> This patch is only specific to this scenario, and can be dropped from the 
> patch
> series if needed.
> 
> Signed-off-by: Vasilis Liaskovitis
> ---
>   drivers/acpi/acpi_memhotplug.c |3 +--
>   1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index d0cfbd9..0562cb4 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -271,12 +271,11 @@ static int acpi_memory_enable_device(struct 
> acpi_memory_device *mem_device)
>   continue;
>   }
> 
> - if (!result)
> - info->enabled = 1;
>   /*
>* Add num_enable even if add_memory() returns -EEXIST, so the
>* device is bound to this driver.
>*/
> + info->enabled = 1;
>   num_enabled++;
>   }
>   if (!num_enabled) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] iio: adc: Add Texas Instruments ADC081C021/027 support

2012-11-24 Thread Jonathan Cameron
On 11/24/2012 03:54 PM, Lars-Peter Clausen wrote:
> On 11/24/2012 11:54 AM, Jonathan Cameron wrote:
>> On 11/23/2012 03:13 PM, Thierry Reding wrote:
>>> Add support for reading conversion results from the ADC and provide them
>>> through a single IIO channel. A proper scaling factor is also exported
>>> based on the reference voltage provided by a regulator.
>>>
>>> Signed-off-by: Thierry Reding 
>>
>> Looks good to me.  I think timing is against you (depending on what Linus
>> says with his next rc).  IIO patches are routed through Greg KH. His cut
>> off is 1 week before the merge window opens.  Mine tends to as a result
>> be a few days before that.  Linus stated in last rc message that the one 
>> he'll
>> do today or tomorrow will be the last for this cycle (and hence merge window
>> will open in a week from now). Hence this will probably hit linux next after
>> the merge window closes and merge in the 3.9 cycle.
>>
>> Shall I add a reviewed by from you Lars? Looks like Thierry has directly
>> addressed all your comments (thanks for doing the review by the way!)
> 
> Yes, Reviewed-by: Lars-Peter Clausen 
Added to togreg branch of iio.git.

Note this was a little interesting as the patch was (I guess) generated against
linux next.  The am35xx driver is coming in via a different tree so the context 
was
wrong.  I've fixed up for iio.git but it'll cause trouble the other way around
at some point...
> 
>>
>>> ---
>>> Changes in v2:
>>> - use the more common IIO_VAL_FRACTIONAL_LOG2 instead of 
>>> IIO_VAL_INT_PLUS_MICRO
>>>   for the ADC scale factor
>>> - make the channel specification static const since it is the same for
>>>   all devices
>>> - convert the scale factor such that the result of multiplying the raw
>>>   value with the scale factor yields a voltage in millivolts
>>>
>>>  drivers/iio/adc/Kconfig  |  10 +++
>>>  drivers/iio/adc/Makefile |   1 +
>>>  drivers/iio/adc/ti-adc081c.c | 161 
>>> +++
>>>  3 files changed, 172 insertions(+)
>>>  create mode 100644 drivers/iio/adc/ti-adc081c.c
>>>
>>> diff --git a/drivers/iio/adc/Kconfig b/drivers/iio/adc/Kconfig
>>> index b719f3b..e8be025 100644
>>> --- a/drivers/iio/adc/Kconfig
>>> +++ b/drivers/iio/adc/Kconfig
>>> @@ -91,6 +91,16 @@ config MAX1363
>>>   max11646, max11647) Provides direct access via sysfs and buffered
>>>   data via the iio dev interface.
>>>
>>> +config TI_ADC081C
>>> +   tristate "Texas Instruments ADC081C021/027"
>>> +   depends on I2C
>>> +   help
>>> + If you say yes here you get support for Texas Instruments ADC081C021
>>> + and ADC081C027 ADC chips.
>>> +
>>> + This driver can also be built as a module. If so, the module will be
>>> + called ti-adc081c.
>>> +
>>>  config TI_AM335X_ADC
>>> tristate "TI's ADC driver"
>>> depends on MFD_TI_AM335X_TSCADC
>>> diff --git a/drivers/iio/adc/Makefile b/drivers/iio/adc/Makefile
>>> index 19d709c..6ad20aa 100644
>>> --- a/drivers/iio/adc/Makefile
>>> +++ b/drivers/iio/adc/Makefile
>>> @@ -10,4 +10,5 @@ obj-$(CONFIG_AD7887) += ad7887.o
>>>  obj-$(CONFIG_AT91_ADC) += at91_adc.o
>>>  obj-$(CONFIG_LP8788_ADC) += lp8788_adc.o
>>>  obj-$(CONFIG_MAX1363) += max1363.o
>>> +obj-$(CONFIG_TI_ADC081C) += ti-adc081c.o
>>>  obj-$(CONFIG_TI_AM335X_ADC) += ti_am335x_adc.o
>>> diff --git a/drivers/iio/adc/ti-adc081c.c b/drivers/iio/adc/ti-adc081c.c
>>> new file mode 100644
>>> index 000..f4a46dd
>>> --- /dev/null
>>> +++ b/drivers/iio/adc/ti-adc081c.c
>>> @@ -0,0 +1,161 @@
>>> +/*
>>> + * Copyright (C) 2012 Avionic Design GmbH
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License version 2 as
>>> + * published by the Free Software Foundation.
>>> + */
>>> +
>>> +#include 
>>> +#include 
>>> +#include 
>>> +
>>> +#include 
>>> +#include 
>>> +
>>> +struct adc081c {
>>> +   struct i2c_client *i2c;
>>> +   struct regulator *ref;
>>> +};
>>> +
>>> +#define REG_CONV_RES 0x00
>>> +
>>> +static int adc081c_read_raw(struct iio_dev *iio,
>>> +   struct iio_chan_spec const *channel, int *value,
>>> +   int *shift, long mask)
>>> +{
>>> +   struct adc081c *adc = iio_priv(iio);
>>> +   int err;
>>> +
>>> +   switch (mask) {
>>> +   case IIO_CHAN_INFO_RAW:
>>> +   err = i2c_smbus_read_word_swapped(adc->i2c, REG_CONV_RES);
>>> +   if (err < 0)
>>> +   return err;
>>> +
>>> +   *value = (err >> 4) & 0xff;
>>> +   return IIO_VAL_INT;
>>> +
>>> +   case IIO_CHAN_INFO_SCALE:
>>> +   err = regulator_get_voltage(adc->ref);
>>> +   if (err < 0)
>>> +   return err;
>>> +
>>> +   *value = err / 1000;
>>> +   *shift = 8;
>>> +
>>> +   return IIO_VAL_FRACTIONAL_LOG2;
>>> +
>>> +   default:
>>> +   break;
>>> +   }
>>> +
>>> +   return -EINVAL;
>>> +}
>>> +
>>> +static const struct iio_chan_spec 

Re: [PATCH 5/7] uprobes: Introduce filter_chain()

2012-11-24 Thread Oleg Nesterov
On 11/23, Oleg Nesterov wrote:
>
> Change install_breakpoint() to call filter_chain() instead of checking
> uprobe->consumers != NULL. We obviously need this, and this equally
> closes the race with _unregister().
>
> Change remove_breakpoint() to call this helper too. Currently this is
> pointless because remove_breakpoint() is only called when the last
> consumer goes away, but we will change this.

Just in case...

This is only to make the initial change as simple as possible. filter_chain()
will have more arguments and more callers, say, perhaps build_map_info().
And perhaps these 2 callsites should be moved from install/remove to the
caller later.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] iio: adc: Add Texas Instruments ADC081C021/027 support

2012-11-24 Thread Lars-Peter Clausen
On 11/24/2012 11:54 AM, Jonathan Cameron wrote:
> On 11/23/2012 03:13 PM, Thierry Reding wrote:
>> Add support for reading conversion results from the ADC and provide them
>> through a single IIO channel. A proper scaling factor is also exported
>> based on the reference voltage provided by a regulator.
>>
>> Signed-off-by: Thierry Reding 
> 
> Looks good to me.  I think timing is against you (depending on what Linus
> says with his next rc).  IIO patches are routed through Greg KH. His cut
> off is 1 week before the merge window opens.  Mine tends to as a result
> be a few days before that.  Linus stated in last rc message that the one he'll
> do today or tomorrow will be the last for this cycle (and hence merge window
> will open in a week from now). Hence this will probably hit linux next after
> the merge window closes and merge in the 3.9 cycle.
> 
> Shall I add a reviewed by from you Lars? Looks like Thierry has directly
> addressed all your comments (thanks for doing the review by the way!)

Yes, Reviewed-by: Lars-Peter Clausen 

> 
>> ---
>> Changes in v2:
>> - use the more common IIO_VAL_FRACTIONAL_LOG2 instead of 
>> IIO_VAL_INT_PLUS_MICRO
>>   for the ADC scale factor
>> - make the channel specification static const since it is the same for
>>   all devices
>> - convert the scale factor such that the result of multiplying the raw
>>   value with the scale factor yields a voltage in millivolts
>>
>>  drivers/iio/adc/Kconfig  |  10 +++
>>  drivers/iio/adc/Makefile |   1 +
>>  drivers/iio/adc/ti-adc081c.c | 161 
>> +++
>>  3 files changed, 172 insertions(+)
>>  create mode 100644 drivers/iio/adc/ti-adc081c.c
>>
>> diff --git a/drivers/iio/adc/Kconfig b/drivers/iio/adc/Kconfig
>> index b719f3b..e8be025 100644
>> --- a/drivers/iio/adc/Kconfig
>> +++ b/drivers/iio/adc/Kconfig
>> @@ -91,6 +91,16 @@ config MAX1363
>>max11646, max11647) Provides direct access via sysfs and buffered
>>data via the iio dev interface.
>>
>> +config TI_ADC081C
>> +tristate "Texas Instruments ADC081C021/027"
>> +depends on I2C
>> +help
>> +  If you say yes here you get support for Texas Instruments ADC081C021
>> +  and ADC081C027 ADC chips.
>> +
>> +  This driver can also be built as a module. If so, the module will be
>> +  called ti-adc081c.
>> +
>>  config TI_AM335X_ADC
>>  tristate "TI's ADC driver"
>>  depends on MFD_TI_AM335X_TSCADC
>> diff --git a/drivers/iio/adc/Makefile b/drivers/iio/adc/Makefile
>> index 19d709c..6ad20aa 100644
>> --- a/drivers/iio/adc/Makefile
>> +++ b/drivers/iio/adc/Makefile
>> @@ -10,4 +10,5 @@ obj-$(CONFIG_AD7887) += ad7887.o
>>  obj-$(CONFIG_AT91_ADC) += at91_adc.o
>>  obj-$(CONFIG_LP8788_ADC) += lp8788_adc.o
>>  obj-$(CONFIG_MAX1363) += max1363.o
>> +obj-$(CONFIG_TI_ADC081C) += ti-adc081c.o
>>  obj-$(CONFIG_TI_AM335X_ADC) += ti_am335x_adc.o
>> diff --git a/drivers/iio/adc/ti-adc081c.c b/drivers/iio/adc/ti-adc081c.c
>> new file mode 100644
>> index 000..f4a46dd
>> --- /dev/null
>> +++ b/drivers/iio/adc/ti-adc081c.c
>> @@ -0,0 +1,161 @@
>> +/*
>> + * Copyright (C) 2012 Avionic Design GmbH
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + */
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +#include 
>> +
>> +struct adc081c {
>> +struct i2c_client *i2c;
>> +struct regulator *ref;
>> +};
>> +
>> +#define REG_CONV_RES 0x00
>> +
>> +static int adc081c_read_raw(struct iio_dev *iio,
>> +struct iio_chan_spec const *channel, int *value,
>> +int *shift, long mask)
>> +{
>> +struct adc081c *adc = iio_priv(iio);
>> +int err;
>> +
>> +switch (mask) {
>> +case IIO_CHAN_INFO_RAW:
>> +err = i2c_smbus_read_word_swapped(adc->i2c, REG_CONV_RES);
>> +if (err < 0)
>> +return err;
>> +
>> +*value = (err >> 4) & 0xff;
>> +return IIO_VAL_INT;
>> +
>> +case IIO_CHAN_INFO_SCALE:
>> +err = regulator_get_voltage(adc->ref);
>> +if (err < 0)
>> +return err;
>> +
>> +*value = err / 1000;
>> +*shift = 8;
>> +
>> +return IIO_VAL_FRACTIONAL_LOG2;
>> +
>> +default:
>> +break;
>> +}
>> +
>> +return -EINVAL;
>> +}
>> +
>> +static const struct iio_chan_spec adc081c_channel = {
>> +.type = IIO_VOLTAGE,
>> +.info_mask = IIO_CHAN_INFO_SCALE_SHARED_BIT |
>> + IIO_CHAN_INFO_RAW_SEPARATE_BIT,
>> +};
>> +
>> +static const struct iio_info adc081c_info = {
>> +.read_raw = adc081c_read_raw,
>> +.driver_module = THIS_MODULE,
>> +};
>> +
>> +static int adc081c_probe(struct i2c_client *client,
>> + const struct i2c_device_id *id)
>> +{
>> +struct iio_dev *iio;
>> + 

Re: Streamlining Developer's Certificate of Origin, Signed-off-by tag

2012-11-24 Thread W. Trevor King
On Wed, Nov 21, 2012 at 12:10:43AM +, Alan Cox wrote:
> > Not just a separate document but project / github / whatever given
> > that other projects are referring to it now, and we stand to gain more
> > in the community by streamlining it more and making it ubiquitous.
> 
> Cutting and pasting it somewhere works (subject to whatever licensing
> it may have itself), as does having a list and a location for a copy, but
> you still want it in the tree proper.

I'm transitioning a project to the DCO-1.1 and trying to get its
licensing straightened out.  The DCO-1.0 [1] and DCO-1.1 [2] commits
were both by Linus, but lacked SOB lines in the commit.  The initial
proposal was also by Linus [3] (and this initial DCO version is what
was committed in [1]).  However, the OSDL (where Linus was working at
the time) seems to claim copyright for itself and claims
CC-BY-SA-2.5-generic [4].  Strangely, the DCO-1.1 text listed on the
archived OSDL page does not match the text of the DCO-1.1 text in the
kernel tree (the differences look minor to me, but I'm not a laywer).

So.  What license is the DCO distributed under and who holds
copyright?

Cheers,
Trevor

[1]: From: Linus Torvalds
 Subject: Start documenting the sign-off procedure in SubmittingPatches
 Date: 2004-06-01 19:13:52 GMT
 Gmane: http://permalink.gmane.org/gmane.linux.kernel.commits.head/33254

 ChangeSet 1.1726.1.148, 2004/06/01 12:13:52-07:00, torvalds…

[2]: commit cbd83da82b15292337ff2b71e619c9a3a95f6d80
 Author: Linus Torvalds 
 Date:   Mon Jun 13 17:51:55 2005 -0700

   Update DCO ("signoff") rules to 1.1

[3]: From: Linus Torvalds  osdl.org>
 Subject: [RFD] Explicitly documenting patch submission
 Date: 2004-05-23 06:46:29 GMT
 Gmane: http://article.gmane.org/gmane.linux.kernel/205867

[4]: 
http://web.archive.org/web/20070306195036/http://osdlab.org/newsroom/press_releases/2004/2004_05_24_dco.html

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy


signature.asc
Description: OpenPGP digital signature


Re: Problem in Page Cache Replacement

2012-11-24 Thread Metin Döşlü
On Thu, Nov 22, 2012 at 5:41 PM, Fengguang Wu  wrote:
> On Wed, Nov 21, 2012 at 12:07:22PM +0200, Metin Döşlü wrote:
>> On Wed, Nov 21, 2012 at 12:00 PM, Jaegeuk Hanse  
>> wrote:
>> >
>> > On 11/21/2012 05:58 PM, metin d wrote:
>> >
>> > Hi Fengguang,
>> >
>> > I run tests and attached the results. The line below I guess shows the 
>> > data-1 page caches.
>> >
>> > 0x0008006c   658405125718  
>> > __RU_lA___P
>> > referenced,uptodate,lru,active,private
>> >
>> >
>> > I thinks this is just one state of page cache pages.
>>
>> But why these page caches are in this state as opposed to other page
>> caches. From the results I conclude that:
>>
>> data-1 pages are in state : referenced,uptodate,lru,active,private
>
> I wonder if it's this code that stops data-1 pages from being
> reclaimed:
>
> shrink_page_list():
>
> if (page_has_private(page)) {
> if (!try_to_release_page(page, sc->gfp_mask))
> goto activate_locked;
>
> What's the filesystem used?

It was ext3.

>> data-2 pages are in state : referenced,uptodate,lru,mappedtodisk
>
> Thanks,
> Fengguang



-- 
Metin Döşlü
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/24] MAINTAINERS: fix drivers/staging/sm7xx/

2012-11-24 Thread Javier Muñoz
Acked-by: Javier Muñoz 

Thanks Cesar!

Javier

On 11/24/2012 01:26 AM, Cesar Eduardo Barros wrote:
> This directory was moved to drivers/staging/sm7xxfb/ by commit 925aa66
> (staging: sm7xxfb: sm7xx becomes sm7xxfb).
> 
> Cc: Teddy Wang 
> Cc: Javier M. Mellid 
> Signed-off-by: Cesar Eduardo Barros 
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 328ba4f..e8990d2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7073,7 +7073,7 @@ F:  drivers/staging/rtl8712/
>  STAGING - SILICON MOTION SM7XX FRAME BUFFER DRIVER
>  M:   Teddy Wang 
>  S:   Odd Fixes
> -F:   drivers/staging/sm7xx/
> +F:   drivers/staging/sm7xxfb/
>  
>  STAGING - SOFTLOGIC 6x10 MPEG CODEC
>  M:   Ben Collins 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] ARM: Dove: Convert to DT GPIO and pinctrl

2012-11-24 Thread Jason Cooper
On Sat, Nov 24, 2012 at 08:02:40AM +0100, Thomas Petazzoni wrote:
> Dear Jason Cooper,
> 
> On Fri, 23 Nov 2012 21:39:42 -0500, Jason Cooper wrote:
> > > + pinctrl: pinctrl@d0200 {
> > > + compatible = "marvell,dove-pinctrl";
> > > + reg = <0xd0200 0x10>;
> > > + clocks = <_clk 22>;
> > 
> > The above line broke the dtbs build target for dove_defconfig.  I have
> > removed it.  Please let me know if that is not the correct answer.  This
> > was the only occurence of 'clk' in arch/arm/boot/dts/dove* .
> 
> Are you sure you merged
> 
>  [PATCH 7/8] ARM: dove: switch to DT clock providers ?
> 
> This one clearly adds gate_clk in dove.dtsi. This patch was part of the
> pull request:
> 
> Subject: [GIT PULL v3] core, cpu and gated clocks for mvebu
> Date: Tue, 20 Nov 2012 15:31:08 +0100

Yes, so that's what I thought happened.  This would have made orion/dt
depend upon mvebu/everything.  It already had two other dependencies.
Not ideal.

The good thing is, the build is not broken.  Once v3.8-rc1 drops with
all of our stuff merged, I'll post a fixup patch adding this back in.

thx,

Jason.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: arch_check_bp_in_kernelspace: fix the range check

2012-11-24 Thread Amnon Shiloh
Hi Oleg,

This patch may look ugly, but it is one way to solve my problem.

This way, "strace" too, which is broken since the introduction of
the vsyscall page, will again be able to report when the program
calls "time()" or "gettimeofday()" - currently it cannot!

I think that allowing to set the x86 debug-registers to the
vsyscall page is more elegant - but do whatever you prefer.

Best Regards,
Amnon.



> forgot to mention...
> 
> On 11/23, Oleg Nesterov wrote:
> >
> > On 11/23, Amnon Shiloh wrote:
> > >
> > > Or, there is an alternative: if only I (the ptracer or the traced process)
> > > was allowed to munmap the vsyscall page,
> >
> > It is not possible to unmap it. The kernel (swapper_pg_dir) has this
> > mapping, not the process. Unlike vdso. IOW, you can only "unmap" it
> > globally and obviously you can't do this from the userspace.
> 
> And even if this were possible, this can't help. Please look at
> __bad_area_nosemaphore()->emulate_vsyscall(), the process won't get
> SIGSEGV. IOW, in fact EMULATE already "unmaps" this page (sets _NX)
> to trigger the fault.
> 
> Sure, we can do something like below, but it doesn't look very nice
> too.
> 
> Oleg.
> 
> --- x/arch/x86/mm/fault.c
> +++ x/arch/x86/mm/fault.c
> @@ -744,7 +744,8 @@ __bad_area_nosemaphore(struct pt_regs *r
>*/
>   if (unlikely((error_code & PF_INSTR) &&
>((address & ~0xfff) == VSYSCALL_START))) {
> - if (emulate_vsyscall(regs, address))
> + if (!(tsk->ptrace & PTRACE_O_DONTEMULATE) &&
> + emulate_vsyscall(regs, address))
>   return;
>   }
>  #endif
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] sound fix for 3.7-rc7

2012-11-24 Thread Takashi Iwai
Linus,

The following changes since commit 947d299686aa9cc8aecf749d54e8475c6e498956:

  ALSA: snd-usb: properly initialize the sync endpoint (2012-11-22 21:22:33 
+0100)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git tags/sound-3.7

for you to fetch changes up to d846b17475d52f037437d125cd19c28f1d36e4f0:

  ALSA: hda - Fix build without CONFIG_PM (2012-11-24 12:00:43 +0100)


Sound fix #2 for 3.7-rc7

Only a single commit for fixing the build error without CONFIG_PM
in hda driver.


Takashi Iwai (1):
  ALSA: hda - Fix build without CONFIG_PM

 sound/pci/hda/hda_codec.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: arch_check_bp_in_kernelspace: fix the range check

2012-11-24 Thread Amnon Shiloh
Hi Oleg,

> Hello Amnon,
> 
> I am a bit confused,

So let's get things in order.

1) I asked for the ability to set hardware breakpoints on the vsyscall
   page (x86 debug registers), so that the ptracer can stop the process
   whenever it attempts to jump there, then the ptracer can emulate those
   system calls instead (gettimeofday, time, getcpu).

   That would solve all my problems, because the traced process will
   never even enter the vsyscall page (the ptracer will adjust its
   program-counter).

2) I was then told (in my own words): "oh, don't worry, the vsyscall page
   has now been minimized, all it contains now is *real* system calls,
   and it always calls them".

   [as a side-issue I was introduced to the new VDSO, had some issues there
and solved them separately, so we are back on the original topic]

3) I was thinking to myself - well, that's fine, if the vsyscall now
   always invokes a *real* system-call (and nothing else), then the
   ptracer can catch it just like any other system-call using
   PTRACE_SYSCALL (or PTRACE_SYSEMU), and emulate it as usual,
   vsyscall-or-no-vsyscall.

4) I made some tests and found that I was wrong in my assumption:
   PTRACE_SYSCALL does NOT work within the vsyscall page (nor does
   PTRACE_SINGLESTEP).  Ptracers are not even aware that their tracee
   ever issued a system call there (despite using PTRACE_SYSCALL or
   PTRACE_SYSEMU), so they are unable to emulate it (or even to report
   it, in the case of "strace").

5) Therefore, I still need the original feature - to relax
   "arch_check_bp_in_kernelspace()", or whatever else will allow me
   to set the x86 debug-registers to trap all attempts to enter the
   vsyscall page.

6) I just suggested an alternative: to have the whole vsyscall page
   removed on a per-process basis. I accept your reply that this is
   not possible.

7) I suggested a third alternative: to have the vsyscall page be
   unexecutable on a per-process basis, so attempts to use it will
   incur SIGSEGV.   I understand that this option is still under
   discussion.

8) Any solution that allows a ptracer to prevent its traced process
   from entering the vsyscall page and execute there system-calls
   unchecked (thus in effect escape its jailer), would do for me.

Best Regards,
Amnon.


> 
> On 11/23, Amnon Shiloh wrote:
> >
> > What I discovered now, is that PTRACE_SYSCALL (also PTRACE_SINGLESTEP)
> > does not work within the vsyscall page, so I cannot trap the kernel-calls
> > there (this is very simple to verify using "gdb" or "strace").
> 
> Sure, but we alredy discussed this?
> 
> Once again, PTRACE_SYSCALL should work in the NATIVE mode. Obviously it
> won't work in EMULATE mode but we can change emulate_vsyscall() to report
> TRAP_VSYSCALL or even introduce PTRACE_EVENT_VSYSCALL.
> 
> > The necessary patch was already discussed and is very simple.
> 
> Do you mean TRAP_VSYSCALL/PTRACE_EVENT_VSYSCALL above or additional
> in_gate_area_no_mm() check to allow the hw bp?
> 
> > Or, there is an alternative: if only I (the ptracer or the traced process)
> > was allowed to munmap the vsyscall page,
> 
> It is not possible to unmap it. The kernel (swapper_pg_dir) has this
> mapping, not the process. Unlike vdso. IOW, you can only "unmap" it
> globally and obviously you can't do this from the userspace.
> 
> Oleg.
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3] balancenuma: fix page locking in migrating thp

2012-11-24 Thread Mel Gorman
On Sat, Nov 24, 2012 at 12:27:39PM +0800, Hillf Danton wrote:
> If fail to migrate thp(due to unsuccessful isolation), and if the original
> pmd entry is changed after reaquiring page table lock, it is unsafe to
> release the page lock as page maybe unstable.
> 
> It is fixed by raising extra page count before trying migration.
> 

Less sure of this one but the locking here is a mess. A reference count
is already taken. Isolating the page takes another reference count so the
first one can be dropped without the page being freed underneath us. The
migration function takes care of unlocking the page on a successful
migration. On unsuccessful migration, the PMD is rechecked under the
lock before clearing the pmd_numa. I'm not sure where you are seeing
the instability but I haven't fired up my brain either (it's the weekend
:)). I'll take a fresh look at this Monday.

Thanks!

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] balancenuma: free new thp if fail to isolate the old

2012-11-24 Thread Mel Gorman
On Sat, Nov 24, 2012 at 12:18:51PM +0800, Hillf Danton wrote:
> Free newly allocated thp if fail to isolate the old.
> 
> Signed-off-by: Hillf Danton 

Thanks!

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] balancenuma: add stats for huge pmd numa faults

2012-11-24 Thread Mel Gorman
On Sat, Nov 24, 2012 at 12:17:03PM +0800, Hillf Danton wrote:
> A thp contributes 512 times more than a regular page to numa fault stats,
> so deserves its own vm event counter. THP migration is also accounted.
> 

I agree and mentioned it needed fixing. I did not create a new counter
but I properly account for PGMIGRATE_SUCCESS and PGMIGRATE_FAIL now. I
did not create a new NUMA_PAGE_MIGRATE counter because I didn't feel it
was necessary. Instead I just do this

count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR);
count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR);

> [A duplicated computation of page node idx is cleaned up]
> 

Got it. Thanks

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 6/6] USB: forbid memory allocation with I/O during bus reset

2012-11-24 Thread Ming Lei
If one storage interface or usb network interface(iSCSI case)
exists in current configuration, memory allocation with
GFP_KERNEL during usb_device_reset() might trigger I/O transfer
on the storage interface itself and cause deadlock because
the 'us->dev_mutex' is held in .pre_reset() and the storage
interface can't do I/O transfer when the reset is triggered
by other interface, or the error handling can't be completed
if the reset is triggered by the storage itself(error handling path).
Cc: Alan Stern 
Cc: Oliver Neukum 
Signed-off-by: Ming Lei 
---
v5:
- use inline memalloc_noio_save()
v4:
- mark current memalloc_noio for every usb device reset
---
 drivers/usb/core/hub.c |   13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 90accde..2d5cc1c 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5040,6 +5040,7 @@ int usb_reset_device(struct usb_device *udev)
 {
int ret;
int i;
+   unsigned int noio_flag;
struct usb_host_config *config = udev->actconfig;
 
if (udev->state == USB_STATE_NOTATTACHED ||
@@ -5049,6 +5050,17 @@ int usb_reset_device(struct usb_device *udev)
return -EINVAL;
}
 
+   /*
+* Don't allocate memory with GFP_KERNEL in current
+* context to avoid possible deadlock if usb mass
+* storage interface or usbnet interface(iSCSI case)
+* is included in current configuration. The easist
+* approach is to do it for every device reset,
+* because the device 'memalloc_noio' flag may have
+* not been set before reseting the usb device.
+*/
+   noio_flag = memalloc_noio_save();
+
/* Prevent autosuspend during the reset */
usb_autoresume_device(udev);
 
@@ -5093,6 +5105,7 @@ int usb_reset_device(struct usb_device *udev)
}
 
usb_autosuspend_device(udev);
+   memalloc_noio_restore(noio_flag);
return ret;
 }
 EXPORT_SYMBOL_GPL(usb_reset_device);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 5/6] PM / Runtime: force memory allocation with no I/O during Runtime PM callbcack

2012-11-24 Thread Ming Lei
This patch applies the introduced memalloc_noio_save() and
memalloc_noio_restore() to force memory allocation with no I/O
during runtime_resume/runtime_suspend callback on device with
the flag of 'memalloc_noio' set.

Cc: Alan Stern 
Cc: Oliver Neukum 
Cc: Rafael J. Wysocki 
Signed-off-by: Ming Lei 
---
v5:
- use inline memalloc_noio_save()
v4:
- runtime_suspend need this too because rpm_resume may wait for
completion of concurrent runtime_suspend, so deadlock still may
be triggered in runtime_suspend path.
---
 drivers/base/power/runtime.c |   32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3e198a0..96d99ea 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -371,6 +371,7 @@ static int rpm_suspend(struct device *dev, int rpmflags)
int (*callback)(struct device *);
struct device *parent = NULL;
int retval;
+   unsigned int noio_flag;
 
trace_rpm_suspend(dev, rpmflags);
 
@@ -480,7 +481,20 @@ static int rpm_suspend(struct device *dev, int rpmflags)
if (!callback && dev->driver && dev->driver->pm)
callback = dev->driver->pm->runtime_suspend;
 
-   retval = rpm_callback(callback, dev);
+   /*
+* Deadlock might be caused if memory allocation with GFP_KERNEL
+* happens inside runtime_suspend callback of one block device's
+* ancestor or the block device itself. Network device might be
+* thought as part of iSCSI block device, so network device and
+* its ancestor should be marked as memalloc_noio.
+*/
+   if (dev->power.memalloc_noio) {
+   noio_flag = memalloc_noio_save();
+   retval = rpm_callback(callback, dev);
+   memalloc_noio_restore(noio_flag);
+   } else {
+   retval = rpm_callback(callback, dev);
+   }
if (retval)
goto fail;
 
@@ -563,6 +577,7 @@ static int rpm_resume(struct device *dev, int rpmflags)
int (*callback)(struct device *);
struct device *parent = NULL;
int retval = 0;
+   unsigned int noio_flag;
 
trace_rpm_resume(dev, rpmflags);
 
@@ -712,7 +727,20 @@ static int rpm_resume(struct device *dev, int rpmflags)
if (!callback && dev->driver && dev->driver->pm)
callback = dev->driver->pm->runtime_resume;
 
-   retval = rpm_callback(callback, dev);
+   /*
+* Deadlock might be caused if memory allocation with GFP_KERNEL
+* happens inside runtime_resume callback of one block device's
+* ancestor or the block device itself. Network device might be
+* thought as part of iSCSI block device, so network device and
+* its ancestor should be marked as memalloc_noio.
+*/
+   if (dev->power.memalloc_noio) {
+   noio_flag = memalloc_noio_save();
+   retval = rpm_callback(callback, dev);
+   memalloc_noio_restore(noio_flag);
+   } else {
+   retval = rpm_callback(callback, dev);
+   }
if (retval) {
__update_runtime_status(dev, RPM_SUSPENDED);
pm_runtime_cancel_pending(dev);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 4/6] net/core: apply pm_runtime_set_memalloc_noio on network devices

2012-11-24 Thread Ming Lei
Deadlock might be caused by allocating memory with GFP_KERNEL in
runtime_resume and runtime_suspend callback of network devices in
iSCSI situation, so mark network devices and its ancestor as
'memalloc_noio' with the introduced pm_runtime_set_memalloc_noio().

Cc: "David S. Miller" 
Cc: Eric Dumazet 
Cc: David Decotigny 
Cc: Tom Herbert 
Cc: Ingo Molnar 
Signed-off-by: Ming Lei 
---
v4:
 - call pm_runtime_set_memalloc_noio(ddev, true) after
   device_add
---
 net/core/net-sysfs.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index bcf02f6..a55d255 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "net-sysfs.h"
@@ -1386,6 +1387,8 @@ void netdev_unregister_kobject(struct net_device * net)
 
remove_queue_kobjects(net);
 
+   pm_runtime_set_memalloc_noio(dev, false);
+
device_del(dev);
 }
 
@@ -1421,6 +1424,8 @@ int netdev_register_kobject(struct net_device *net)
return error;
}
 
+   pm_runtime_set_memalloc_noio(dev, true);
+
return error;
 }
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 3/6] block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices

2012-11-24 Thread Ming Lei
This patch applyes the introduced pm_runtime_set_memalloc_noio on
block device so that PM core will teach mm to not allocate memory with
GFP_IOFS when calling the runtime_resume and runtime_suspend callback
for block devices and its ancestors.

Cc: Jens Axboe 
Signed-off-by: Ming Lei 
---
v5:
- fix code style and one typo
v4:
- call pm_runtime_set_memalloc_noio(ddev, true) after device_add
---
 block/genhd.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/block/genhd.c b/block/genhd.c
index 9a289d7..1905966 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "blk.h"
 
@@ -532,6 +533,14 @@ static void register_disk(struct gendisk *disk)
return;
}
}
+
+   /*
+* avoid probable deadlock caused by allocating memory with
+* GFP_KERNEL in runtime_resume callback of its all ancestor
+* devices
+*/
+   pm_runtime_set_memalloc_noio(ddev, true);
+
disk->part0.holder_dir = kobject_create_and_add("holders", >kobj);
disk->slave_dir = kobject_create_and_add("slaves", >kobj);
 
@@ -661,6 +670,7 @@ void del_gendisk(struct gendisk *disk)
disk->driverfs_dev = NULL;
if (!sysfs_deprecated)
sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
+   pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
device_del(disk_to_dev(disk));
 }
 EXPORT_SYMBOL(del_gendisk);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 2/6] PM / Runtime: introduce pm_runtime_set_memalloc_noio()

2012-11-24 Thread Ming Lei
The patch introduces the flag of memalloc_noio in 'struct dev_pm_info'
to help PM core to teach mm not allocating memory with GFP_KERNEL
flag for avoiding probable deadlock.

As explained in the comment, any GFP_KERNEL allocation inside
runtime_resume() or runtime_suspend() on any one of device in
the path from one block or network device to the root device
in the device tree may cause deadlock, the introduced
pm_runtime_set_memalloc_noio() sets or clears the flag on
device in the path recursively.

Cc: Alan Stern 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Ming Lei 
---
v5:
- fix code style error
- add comment on clear the device memalloc_noio flag
v4:
- rename memalloc_noio_resume as memalloc_noio
- remove pm_runtime_get_memalloc_noio()
- add comments on pm_runtime_set_memalloc_noio
v3:
- introduce pm_runtime_get_memalloc_noio()
- hold one global lock on pm_runtime_set_memalloc_noio
- hold device power lock when accessing memalloc_noio_resume
  flag suggested by Alan Stern
- implement pm_runtime_set_memalloc_noio without recursion
  suggested by Alan Stern
v2:
- introduce pm_runtime_set_memalloc_noio()
---
 drivers/base/power/runtime.c |   60 ++
 include/linux/pm.h   |1 +
 include/linux/pm_runtime.h   |3 +++
 3 files changed, 64 insertions(+)

diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c
index 3148b10..3e198a0 100644
--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -124,6 +124,66 @@ unsigned long pm_runtime_autosuspend_expiration(struct 
device *dev)
 }
 EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
 
+static int dev_memalloc_noio(struct device *dev, void *data)
+{
+   return dev->power.memalloc_noio;
+}
+
+/*
+ * pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
+ * @dev: Device to handle.
+ * @enable: True for setting the flag and False for clearing the flag.
+ *
+ * Set the flag for all devices in the path from the device to the
+ * root device in the device tree if @enable is true, otherwise clear
+ * the flag for devices in the path whose siblings don't set the flag.
+ *
+ * The function should only be called by block device, or network
+ * device driver for solving the deadlock problem during runtime
+ * resume/suspend:
+ *
+ * If memory allocation with GFP_KERNEL is called inside runtime
+ * resume/suspend callback of any one of its ancestors(or the
+ * block device itself), the deadlock may be triggered inside the
+ * memory allocation since it might not complete until the block
+ * device becomes active and the involed page I/O finishes. The
+ * situation is pointed out first by Alan Stern. Network device
+ * are involved in iSCSI kind of situation.
+ *
+ * The lock of dev_hotplug_mutex is held in the function for handling
+ * hotplug race because pm_runtime_set_memalloc_noio() may be called
+ * in async probe().
+ *
+ * The function should be called between device_add() and device_del()
+ * on the affected device(block/network device).
+ */
+void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
+{
+   static DEFINE_MUTEX(dev_hotplug_mutex);
+
+   mutex_lock(_hotplug_mutex);
+   for (;;) {
+   /* hold power lock since bitfield is not SMP-safe. */
+   spin_lock_irq(>power.lock);
+   dev->power.memalloc_noio = enable;
+   spin_unlock_irq(>power.lock);
+
+   dev = dev->parent;
+
+   /*
+* clear flag of the parent device only if all the
+* children don't set the flag because ancestor's
+* flag was set by any one of the descendants.
+*/
+   if (!dev || (!enable &&
+device_for_each_child(dev, NULL,
+  dev_memalloc_noio)))
+   break;
+   }
+   mutex_unlock(_hotplug_mutex);
+}
+EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
+
 /**
  * rpm_check_suspend_allowed - Test whether a device may be suspended.
  * @dev: Device to test.
diff --git a/include/linux/pm.h b/include/linux/pm.h
index 03d7bb1..1a8a69d 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -538,6 +538,7 @@ struct dev_pm_info {
unsigned intirq_safe:1;
unsigned intuse_autosuspend:1;
unsigned inttimer_autosuspends:1;
+   unsigned intmemalloc_noio:1;
enum rpm_requestrequest;
enum rpm_status runtime_status;
int runtime_error;
diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index f271860..775e063 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -47,6 +47,7 @@ extern void pm_runtime_set_autosuspend_delay(struct device 
*dev, int 

[PATCH v6 1/6] mm: teach mm by current context info to not do I/O during memory allocation

2012-11-24 Thread Ming Lei
This patch introduces PF_MEMALLOC_NOIO on process flag('flags' field of
'struct task_struct'), so that the flag can be set by one task
to avoid doing I/O inside memory allocation in the task's context.

The patch trys to solve one deadlock problem caused by block device,
and the problem may happen at least in the below situations:

- during block device runtime resume, if memory allocation with
GFP_KERNEL is called inside runtime resume callback of any one
of its ancestors(or the block device itself), the deadlock may be
triggered inside the memory allocation since it might not complete
until the block device becomes active and the involed page I/O finishes.
The situation is pointed out first by Alan Stern. It is not a good
approach to convert all GFP_KERNEL[1] in the path into GFP_NOIO because
several subsystems may be involved(for example, PCI, USB and SCSI may
be involved for usb mass stoarage device, network devices involved too
in the iSCSI case)

- during block device runtime suspend, because runtime resume need
to wait for completion of concurrent runtime suspend.

- during error handling of usb mass storage deivce, USB bus reset
will be put on the device, so there shouldn't have any
memory allocation with GFP_KERNEL during USB bus reset, otherwise
the deadlock similar with above may be triggered. Unfortunately, any
usb device may include one mass storage interface in theory, so it
requires all usb interface drivers to handle the situation. In fact,
most usb drivers don't know how to handle bus reset on the device
and don't provide .pre_set() and .post_reset() callback at all, so
USB core has to unbind and bind driver for these devices. So it
is still not practical to resort to GFP_NOIO for solving the problem.

Also the introduced solution can be used by block subsystem or block
drivers too, for example, set the PF_MEMALLOC_NOIO flag before doing
actual I/O transfer.

It is not a good idea to convert all these GFP_KERNEL in the
affected path into GFP_NOIO because these functions doing that may be
implemented as library and will be called in many other contexts.

In fact, memalloc_noio_flags() can convert some of current static GFP_NOIO
allocation into GFP_KERNEL back in other non-affected contexts, at least
almost all GFP_NOIO in USB subsystem can be converted into GFP_KERNEL
after applying the approach and make allocation with GFP_NOIO
only happen in runtime resume/bus reset/block I/O transfer contexts
generally.

[1], several GFP_KERNEL allocation examples in runtime resume path

- pci subsystem
acpi_os_allocate
<-acpi_ut_allocate
<-ACPI_ALLOCATE_ZEROED
<-acpi_evaluate_object
<-__acpi_bus_set_power
<-acpi_bus_set_power
<-acpi_pci_set_power_state

<-platform_pci_set_power_state

<-pci_platform_power_transition

<-__pci_complete_power_transition

<-pci_set_power_state

<-pci_restore_standard_config

<-pci_pm_runtime_resume
- usb subsystem
usb_get_status
<-finish_port_resume
<-usb_port_resume
<-generic_resume
<-usb_resume_device
<-usb_resume_both
<-usb_runtime_resume

- some individual usb drivers
usblp, uvc, gspca, most of dvb-usb-v2 media drivers, cpia2, az6007, 

That is just what I have found.  Unfortunately, this allocation can
only be found by human being now, and there should be many not found
since any function in the resume path(call tree) may allocate memory
with GFP_KERNEL.

Cc: Alan Stern 
Cc: Oliver Neukum 
Cc: Jiri Kosina 
Cc: Andrew Morton 
Cc: Mel Gorman 
Cc: KAMEZAWA Hiroyuki 
Cc: Michal Hocko 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Minchan Kim 
Signed-off-by: Ming Lei 
---
v6:
- replace GFP_IO with __GFP_IO to fix compile failure

v5:
- use inline instead of macro to define memalloc_noio_*
- replace memalloc_noio() with memalloc_noio_flags() to
make code neater
- don't clear GFP_FS because no GFP_IO means
that allocation won't enter device driver as pointed by
Andrew Morton

v4:
- fix comment
v3:
- no change
v2:
- remove changes on 'may_writepage' and 'may_swap' because that
  isn't related with the patchset, and can't introduce I/O in
  allocation path if GFP_IOFS is 

[PATCH v6 0/6] solve deadlock caused by memory allocation with I/O

2012-11-24 Thread Ming Lei
Hi,

This patchset try to solve one deadlock problem which might be caused
by memory allocation with block I/O during runtime PM and block device
error handling path. Traditionly, the problem is addressed by passing
GFP_NOIO statically to mm, but that is not a effective solution, see
detailed description in patch 1's commit log.

This patch set introduces one process flag and trys to fix the deadlock
problem on block device/network device during runtime PM or usb bus reset.

The 1st one is the change on include/sched.h and mm.

The 2nd patch introduces the flag of memalloc_noio on 'dev_pm_info',
and pm_runtime_set_memalloc_noio(), so that PM Core can teach mm to not
allocate mm with GFP_IO during the runtime_resume callback only on
device with the flag set.

The following 2 patches apply the introduced pm_runtime_set_memalloc_noio()
to mark all devices as memalloc_noio_resume in the path from the block or
network device to the root device in device tree.

The last 2 patches are applied again PM and USB subsystem to demonstrate
how to use the introduced mechanism to fix the deadlock problem.

Andrew, could you queue these patches into your tree since V6 fixes all
your concerns and looks no one objects these patches?

Change logs:
V6:
- fix one compile failure(1/6), and only one line change

V5:
- don't clear GFP_FS
- coding style fix
- add comments
- see details in individual change logs

V4:
- patches from the 2nd to the 6th changed
- call pm_runtime_set_memalloc_noio() after device_add() as pointed
by Alan
- set PF_MEMALLOC_NOIO during runtime_suspend()

V3:
- patch 2/6 and 5/6 changed, see their commit log
- remove RFC from title since several guys have expressed that
it is a reasonable solution
V2:
- remove changes on 'may_writepage' and 'may_swap'(1/6)
- unset GFP_IOFS in try_to_free_pages() path(1/6)
- introduce pm_runtime_set_memalloc_noio()
- only apply the meachnism on block/network device and its ancestors
for runtime resume context
V1:
- take Minchan's change to avoid the check in alloc_page hot path
- change the helpers' style into save/restore as suggested by Alan
- memory allocation with no io in usb bus reset path for all devices
as suggested by Greg and Oliver

 block/genhd.c|   10 +
 drivers/base/power/runtime.c |   92 +-
 drivers/usb/core/hub.c   |   13 ++
 include/linux/pm.h   |1 +
 include/linux/pm_runtime.h   |3 ++
 include/linux/sched.h|   22 ++
 mm/page_alloc.c  |9 -
 mm/vmscan.c  |4 +-
 net/core/net-sysfs.c |5 +++
 9 files changed, 154 insertions(+), 5 deletions(-)

Thanks,
--
Ming Lei

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: vdso && cr (Was: arch_check_bp_in_kernelspace: fix the range

2012-11-24 Thread Amnon Shiloh
Hi Oleg,

> Amnon,
> 
> I am going to "ignore" this thread because this is not my area and
> I can't help anyway. Just one note:
> 
> On 11/23, Amnon Shiloh wrote:
> >
> > The solution can be to hold all catched signals while in the VDSO page.
> > ...
> >
> > 1) + introduce a kernel feature to prevent
> >catching signals within the VDSO page (probably a new prctl,
> >or make it the default)
> 
> Sorry, never ;)
> 
> Oleg.

It's OK with me because I already found a way to work around this that
works for me, but I suspect that other people who write checkpoint/restore
packages may not be able to use my soltion and so they will have a problem
with interrupts occuring within the VDSO page.

I therefore suggested an alternate solution, for all such systems where
applications can be checkpointed on one kernel and restarted on another:
to allow the user to ask for an ultra-compatible VDSO version, which would
be exactly the same on all kernels (from a given point in time) and all
kernel configurations, even if it means a loss of performance.  This is
needed for systems where applications can be checkpointed on one kernel
and restarted on another.

It could even be a kernel configuration option: CONFIG_ULTRA_COMPAT_VDSO,
but ideally it should be the user's choice.

Best Regards,
Amnon.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 11/12] x86, boot: add fields to support load bzImage and ramdisk high

2012-11-24 Thread Eric W. Biederman
"H. Peter Anvin"  writes:

> On 11/22/2012 10:28 AM, Yinghai Lu wrote:
>>
>> has problem with old kexec, it only copy header from bzImage include
>> setup_header as boot_param.
>>
>
> How old are we talking here? This is a clear and blatant bug, and it would
> affect a whole bunch of things, not just this.  In fact, one really has to
> wonder how it can work at all.
>
> One option I guess would be to have a sentinel field which, if it is not zero,
> causes the kernel to zero all of struct setup_info outside of
> setup_header... however, I have a nasty suspicion that this kexec botch might 
> be
> initializing some fields and leaving others unmodified, which basically means
> "there is no hope for sanity and it is just working by pure accident."
>
> Eric, do you have any insight here?

I seem to be missing something.

With respect to boot parameters when we are booting a bzImage
/sbin/kexec initializes the boot parameters with all of the 16bit real
mode code.  aka (setup_sects + 1) * 512 bytes.

I remember adding that as soon as we started having to deal with
pre-initialized fields in boot_params.

I don't have a clue what you folks are referring to as a bug.  

Looking I see this verbage in boot.txt

> For machine with some new BIOS other than legacy BIOS, such as EFI,
> LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
> based on legacy BIOS can not be used, so a 32-bit boot protocol needs
> to be defined.
> 
> In 32-bit boot protocol, the first step in loading a Linux kernel
> should be to setup the boot parameters (struct boot_params,
> traditionally known as "zero page"). The memory for struct boot_params
> should be allocated and initialized to all zero. Then the setup header
> from offset 0x01f1 of kernel image on should be loaded into struct
> boot_params and examined. The end of setup header can be calculated as
> follow:
> 
>   0x0202 + byte value at offset 0x0201
> 
> In addition to read/modify/write the setup header of the struct
> boot_params as that of 16-bit boot protocol, the boot loader should
> also fill the additional fields of the struct boot_params as that
> described in zero-page.txt.

Certainly /sbin/kexec isn't bothering to calculate the end of the setup
header and just being far more conservative and using all of the 16bit
real mode code as it's initializer.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] ewrk3: silence GCC warning

2012-11-24 Thread Paul Bolle
Building ewrk3.o triggers this GCC warning:
drivers/net/ethernet/dec/ewrk3.c: In function '__check_irq':
drivers/net/ethernet/dec/ewrk3.c:1915:1: warning: return from incompatible 
pointer type [enabled by default]

This can be trivially fixed by changing the 'irq' parameter from int to
byte (which is the alias for unsigned char for module parameters).

While we're touching this code also drop an outdated comment, that
should have been dropped with the patch named "MODULE_PARM conversions"
from early 2005.

Signed-off-by: Paul Bolle 
---
Compile tested only.

 drivers/net/ethernet/dec/ewrk3.c |3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/dec/ewrk3.c b/drivers/net/ethernet/dec/ewrk3.c
index 17ae8c6..9f992b9 100644
--- a/drivers/net/ethernet/dec/ewrk3.c
+++ b/drivers/net/ethernet/dec/ewrk3.c
@@ -1910,9 +1910,8 @@ static struct net_device *ewrk3_devs[MAX_NUM_EWRK3S];
 static int ndevs;
 static int io[MAX_NUM_EWRK3S+1] = { 0x300, 0, };
 
-/* '21' below should really be 'MAX_NUM_EWRK3S' */
 module_param_array(io, int, NULL, 0);
-module_param_array(irq, int, NULL, 0);
+module_param_array(irq, byte, NULL, 0);
 MODULE_PARM_DESC(io, "EtherWORKS 3 I/O base address(es)");
 MODULE_PARM_DESC(irq, "EtherWORKS 3 IRQ number(s)");
 
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] autofs4: allow autofs to work outside the initial PID namespace

2012-11-24 Thread Eric W. Biederman
Ian Kent  writes:

> On Sat, 2012-11-24 at 10:23 +0800, Ian Kent wrote:
>> On Fri, 2012-11-23 at 15:30 +0100, Miklos Szeredi wrote:
>> > Ian Kent  writes:
>> > 
>> > > On Fri, 2012-11-23 at 11:45 +0800, Ian Kent wrote:
>> > >> On Thu, 2012-11-22 at 17:24 +0100, Miklos Szeredi wrote:
>> > >> > Patches were tested by the customer.
>> > >> > 
>> > >> > Ian, Eric, do these patches look OK?

My apologies for the delay.  I have been swamped with the holidays and
the impending 3.8 merge window.   I will take a good hard look at your
patches shortly.

>> AFAICS autofs mounts mounted with MS_PRIVATE in the initial namespace do
>> propagate to the clone when it's created so I'm assuming subsequent
>> mounts would also. If these mounts are busy in some way they can't be
>> umounted in the clone unless "/" is marked private before attempting the
>> umount.
>
> This may sound stupid but if there something like, say, MS_NOPROPAGATE
> then the problem I see would pretty much just go away. No more need to
> umount existing mounts and container instances would be isolated. But, I
> guess, I'm not considering the possibility of cloned of processes as
> well  if that makes sense, ;)

Something is very weird is going on.  MS_PRIVATE should be the
MS_NOPROPOGATE you are looking for.  There is also MS_UNBINDABLE.
which is a stronger form of MS_PRIVATE and probably worth play with.

I would love to advertise my user namespace changes (queued for 3.8)
that reduce shared subtrees to slave subtress as the solution to this,
and that does address part of the issue but that does not really seem
like the fix.

I expect what we need to avoid unwanted mount propagation is an idiom
something like:

unshare -n
mount --private /mnt
pivot_root /mnt /
umount /mnt

Something like that is present in the startup of most containers
already.  So figuring out where to sprinkle MS_PRIVATE or MS_UNBINDABLE
so that mounts don't propogate that we want to propogate look like a
good deal.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mfd: jz4740-adc: use devm_kzalloc

2012-11-24 Thread Devendra Naga
use devm_kzalloc and remove the error path free'ing and unload free'ing
as the devm resource functions free them.

Signed-off-by: Devendra Naga 
---
 drivers/mfd/jz4740-adc.c | 23 +++
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/drivers/mfd/jz4740-adc.c b/drivers/mfd/jz4740-adc.c
index c6b6d7d..3efdb65 100644
--- a/drivers/mfd/jz4740-adc.c
+++ b/drivers/mfd/jz4740-adc.c
@@ -211,7 +211,7 @@ static int __devinit jz4740_adc_probe(struct 
platform_device *pdev)
int ret;
int irq_base;
 
-   adc = kmalloc(sizeof(*adc), GFP_KERNEL);
+   adc = devm_kzalloc(>dev, sizeof(*adc), GFP_KERNEL);
if (!adc) {
dev_err(>dev, "Failed to allocate driver structure\n");
return -ENOMEM;
@@ -219,32 +219,28 @@ static int __devinit jz4740_adc_probe(struct 
platform_device *pdev)
 
adc->irq = platform_get_irq(pdev, 0);
if (adc->irq < 0) {
-   ret = adc->irq;
-   dev_err(>dev, "Failed to get platform irq: %d\n", ret);
-   goto err_free;
+   dev_err(>dev, "Failed to get platform irq: %d\n", 
adc->irq);
+   return adc->irq;
}
 
irq_base = platform_get_irq(pdev, 1);
if (irq_base < 0) {
-   ret = irq_base;
-   dev_err(>dev, "Failed to get irq base: %d\n", ret);
-   goto err_free;
+   dev_err(>dev, "Failed to get irq base: %d\n", irq_base);
+   return irq_base;
}
 
mem_base = platform_get_resource(pdev, IORESOURCE_MEM, 0);
if (!mem_base) {
-   ret = -ENOENT;
dev_err(>dev, "Failed to get platform mmio resource\n");
-   goto err_free;
+   return -ENOENT;
}
 
/* Only request the shared registers for the MFD driver */
adc->mem = request_mem_region(mem_base->start, JZ_REG_ADC_STATUS,
pdev->name);
if (!adc->mem) {
-   ret = -EBUSY;
dev_err(>dev, "Failed to request mmio memory region\n");
-   goto err_free;
+   return -EBUSY;
}
 
adc->base = ioremap_nocache(adc->mem->start, resource_size(adc->mem));
@@ -301,9 +297,6 @@ err_iounmap:
iounmap(adc->base);
 err_release_mem_region:
release_mem_region(adc->mem->start, resource_size(adc->mem));
-err_free:
-   kfree(adc);
-
return ret;
 }
 
@@ -325,8 +318,6 @@ static int __devexit jz4740_adc_remove(struct 
platform_device *pdev)
 
platform_set_drvdata(pdev, NULL);
 
-   kfree(adc);
-
return 0;
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >