Re: [PATCH 4/4] perf tools: Run dynamic loaded GTK browser

2013-08-05 Thread Pekka Enberg
On Tue, Aug 6, 2013 at 8:14 AM, Namhyung Kim  wrote:
> Run GTK hist and annotation browser using libdl.
>
> Cc: Andi Kleen 
> Cc: Pekka Enberg 
> Signed-off-by: Namhyung Kim 

Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4] perf tools: Setup GTK browser dynamically

2013-08-05 Thread Pekka Enberg
On Tue, Aug 6, 2013 at 8:14 AM, Namhyung Kim  wrote:
> Call setup/exit GTK browser function using libdl.
>
> Cc: Andi Kleen 
> Cc: Pekka Enberg 
> Signed-off-by: Namhyung Kim 

Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/4] perf tools: Separate out GTK codes to libperf-gtk.so

2013-08-05 Thread Pekka Enberg
On Tue, Aug 6, 2013 at 8:14 AM, Namhyung Kim  wrote:
> Separate out GTK codes to a shared object called libperf-gtk.so.  This
> time only GTK codes are built with -fPIC and libperf remains as is.
>
> Cc: Andi Kleen 
> Cc: Pekka Enberg 
> Signed-off-by: Namhyung Kim 

Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] perf ui/gtk: Fix segmentation fault on perf_hpp__for_each_format loop

2013-08-05 Thread Pekka Enberg
On Tue, Aug 6, 2013 at 8:14 AM, Namhyung Kim  wrote:
> From: Namhyung Kim 
>
> The commit 2b8bfa6bb8a7 ("perf tools: Centralize default columns init
> in perf_hpp__init") moves initialization of common overhead column to
> perf_hpp__init() but forgot about the gtk code.
>
> So the gtk code added the same column to the list twice causing
> infinite loop when iterating it by perf_hpp__for_each_format loop.
> When I run perf report --gtk, I can see following messages
> indefinitely.
>
>   (perf:11687): Gtk-CRITICAL **: IA__gtk_main_quit: assertion 'main_loops != 
> NULL' failed
>   perf: Segmentation fault
>
> Cc: Jiri Olsa 
> Cc: Pekka Enberg 
> Cc: Andi Kleen 
> Signed-off-by: Namhyung Kim 

Reviewed-by: Pekka Enberg 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: manual merge of the akpm tree with the drm-intel tree

2013-08-05 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in
drivers/gpu/drm/i915/i915_gem.c between commit a70a3148b0c6 ("drm/i915:
Make proper functions for VMs") from the drm-intel tree and commit
e6950216e0af ("drivers: convert shrinkers to new count/scan API") from
the akpm tree.

I fixed it up (see below) and can carry the fix as necessary (no action
is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/gpu/drm/i915/i915_gem.c
index d31e15d,49db617..000
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@@ -4642,10 -4626,10 +4650,9 @@@ i915_gem_inactive_count(struct shrinke
 struct drm_i915_private,
 mm.inactive_shrinker);
struct drm_device *dev = dev_priv->dev;
 -  struct i915_address_space *vm = _priv->gtt.base;
struct drm_i915_gem_object *obj;
-   int nr_to_scan = sc->nr_to_scan;
bool unlock = true;
-   int cnt;
+   long cnt;
  
if (!mutex_trylock(>struct_mutex)) {
if (!mutex_is_locked_by(>struct_mutex, current))
@@@ -4683,75 -4653,36 +4681,109 @@@
mutex_unlock(>struct_mutex);
return cnt;
  }
 +
+ static long
+ i915_gem_inactive_scan(struct shrinker *shrinker, struct shrink_control *sc)
+ {
+   struct drm_i915_private *dev_priv =
+   container_of(shrinker,
+struct drm_i915_private,
+mm.inactive_shrinker);
+   struct drm_device *dev = dev_priv->dev;
+   int nr_to_scan = sc->nr_to_scan;
+   long freed;
+   bool unlock = true;
+ 
+   if (!mutex_trylock(>struct_mutex)) {
+   if (!mutex_is_locked_by(>struct_mutex, current))
+   return 0;
+ 
+   if (dev_priv->mm.shrinker_no_lock_stealing)
+   return 0;
+ 
+   unlock = false;
+   }
+ 
+   freed = i915_gem_purge(dev_priv, nr_to_scan);
+   if (freed < nr_to_scan)
+   freed += __i915_gem_shrink(dev_priv, nr_to_scan,
+   false);
+   if (freed < nr_to_scan)
+   freed += i915_gem_shrink_all(dev_priv);
+ 
+   if (unlock)
+   mutex_unlock(>struct_mutex);
+   return freed;
+ }
++
 +/* All the new VM stuff */
 +unsigned long i915_gem_obj_offset(struct drm_i915_gem_object *o,
 +struct i915_address_space *vm)
 +{
 +  struct drm_i915_private *dev_priv = o->base.dev->dev_private;
 +  struct i915_vma *vma;
 +
 +  if (vm == _priv->mm.aliasing_ppgtt->base)
 +  vm = _priv->gtt.base;
 +
 +  BUG_ON(list_empty(>vma_list));
 +  list_for_each_entry(vma, >vma_list, vma_link) {
 +  if (vma->vm == vm)
 +  return vma->node.start;
 +
 +  }
 +  return -1;
 +}
 +
 +bool i915_gem_obj_bound(struct drm_i915_gem_object *o,
 +  struct i915_address_space *vm)
 +{
 +  struct i915_vma *vma;
 +
 +  list_for_each_entry(vma, >vma_list, vma_link)
 +  if (vma->vm == vm)
 +  return true;
 +
 +  return false;
 +}
 +
 +bool i915_gem_obj_bound_any(struct drm_i915_gem_object *o)
 +{
 +  struct drm_i915_private *dev_priv = o->base.dev->dev_private;
 +  struct i915_address_space *vm;
 +
 +  list_for_each_entry(vm, _priv->vm_list, global_link)
 +  if (i915_gem_obj_bound(o, vm))
 +  return true;
 +
 +  return false;
 +}
 +
 +unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 +  struct i915_address_space *vm)
 +{
 +  struct drm_i915_private *dev_priv = o->base.dev->dev_private;
 +  struct i915_vma *vma;
 +
 +  if (vm == _priv->mm.aliasing_ppgtt->base)
 +  vm = _priv->gtt.base;
 +
 +  BUG_ON(list_empty(>vma_list));
 +
 +  list_for_each_entry(vma, >vma_list, vma_link)
 +  if (vma->vm == vm)
 +  return vma->node.size;
 +
 +  return 0;
 +}
 +
 +struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 +   struct i915_address_space *vm)
 +{
 +  struct i915_vma *vma;
 +  list_for_each_entry(vma, >vma_list, vma_link)
 +  if (vma->vm == vm)
 +  return vma;
 +
 +  return NULL;
 +}


pgpg6Jwhsp2kK.pgp
Description: PGP signature


Re: [PATCH 0/5] perf kvm live - latest round take 4

2013-08-05 Thread Xiao Guangrong
On 08/06/2013 09:41 AM, David Ahern wrote:
> Hi Arnaldo:
> 
> This round addresses all of Xiao's comments. It also includes a small
> change in the live mode introduction to improve ordered samples
> processing. For that a change in perf-session functions is needed.

Reviewed-by: Xiao Guangrong 

David, could you please update the documentation? It can be a separate
patch.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/8] partitions/efi: delete annoying emacs style comments

2013-08-05 Thread Davidlohr Bueso
I love emacs, but these settings for coding style are
annoying when trying to open the efi.h file. More important,
we already have checkpatch for that.

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.h | 19 ---
 1 file changed, 19 deletions(-)

diff --git a/block/partitions/efi.h b/block/partitions/efi.h
index e9741de..9ab8ee9 100644
--- a/block/partitions/efi.h
+++ b/block/partitions/efi.h
@@ -130,22 +130,3 @@ typedef struct _legacy_mbr {
 extern int efi_partition(struct parsed_partitions *state);
 
 #endif
-
-/*
- * Overrides for Emacs so that we follow Linus's tabbing style.
- * Emacs will notice this stuff at the end of the file and automatically
- * adjust the settings for this buffer only.  This must remain at the end
- * of the file.
- * --
- * Local variables:
- * c-indent-level: 4 
- * c-brace-imaginary-offset: 0
- * c-brace-offset: -4
- * c-argdecl-indent: 4
- * c-label-offset: -4
- * c-continued-statement-offset: 4
- * c-continued-brace-offset: 0
- * indent-tabs-mode: nil
- * tab-width: 8
- * End:
- */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 3.11.0-rc4 (Linus GIT) -- WARNING: CPU: 1 PID: 1 at kernel/time/tick-sched.c:185 can_stop_full_tick+0x7e/0x89()

2013-08-05 Thread yun wang

Hi, Miles

On 08/06/2013 12:30 PM, Miles Lane wrote:

I am not seeing any problems in the behavior of the computer, but
wonder if this indicates something that needs fixing.

[1.969109] WARNING: CPU: 1 PID: 1 at kernel/time/tick-sched.c:185
can_stop_full_tick+0x7e/0x89()


According to the comments:

/*
 * Don't allow the user to think they can get
 * full NO_HZ with this machine.
 */

I guess this WARN is just supposed to notify user that two feature are 
conflict.


I'm not sure whether this is the right way, may be declaim while making 
config will be better, but consider that nohz could also be enabled by 
boot option, may be this is a good way...


Regards,
Michael Wang



[1.969121] NO_HZ FULL will not work with unstable sched clock
[1.969129] Modules linked in:
[1.969142] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.11.0-rc4 #150
[1.969152] Hardware name: ASUSTeK Computer Inc. UL50VT
  /UL50VT, BIOS 217 03/01/2010
[1.969166]   88013f303eb0 8138347b
88013f303ef8
[1.969183]  88013f303ee8 8103454f 81079bbf
88013f30d7b0
[1.969199]  0001 88013a801dc0 88013a802240
88013f303f48
[1.969216] Call Trace:
[1.969222][] dump_stack+0x4e/0x82
[1.969241]  [] warn_slowpath_common+0x75/0x8e
[1.969253]  [] ? can_stop_full_tick+0x7e/0x89
[1.969265]  [] warn_slowpath_fmt+0x47/0x49
[1.969278]  [] can_stop_full_tick+0x7e/0x89
[1.969290]  [] tick_nohz_irq_exit+0x63/0x7f
[1.969302]  [] irq_exit+0xa4/0xac
[1.969314]  [] smp_apic_timer_interrupt+0x30/0x3c
[1.969327]  [] apic_timer_interrupt+0x6f/0x80
[1.969336][] ? save_stack_trace+0x26/0x41
[1.969355]  [] ? _raw_spin_unlock_irqrestore+0x3c/0x69
[1.969369]  [] __slab_free+0x53/0x317
[1.969382]  [] ? debug_check_no_obj_freed+0x103/0x153
[1.969397]  [] kfree+0x102/0x111
[1.969410]  [] ? acpi_ns_get_node+0xb6/0xc6
[1.969422]  [] acpi_ns_get_node+0xb6/0xc6
[1.969434]  [] ? _raw_spin_unlock_irqrestore+0x5b/0x69
[1.969447]  [] ? up+0x34/0x39
[1.969459]  [] ? acpi_os_signal_semaphore+0x1c/0x28
[1.969472]  [] acpi_get_handle+0x7e/0x92
[1.969486]  [] pnpacpi_add_device_handler+0x57/0x217
[1.969499]  [] acpi_ns_get_device_callback+0x135/0x14b
[1.969511]  [] ? up+0x34/0x39
[1.969523]  [] acpi_ns_walk_namespace+0xc3/0x17a
[1.969535]  [] ? acpi_walk_namespace+0xc0/0xc0
[1.969547]  [] acpi_get_devices+0x5d/0x72
[1.969560]  [] ? ispnpidacpi+0x84/0x84
[1.969571]  [] ? pnpacpi_add_device_handler+0x217/0x217
[1.969584]  [] pnpacpi_init+0x5e/0x8c
[1.969596]  [] do_one_initcall+0x8e/0x12b
[1.969608]  [] ? parameq+0x1d/0x1f
[1.969619]  [] ? parse_args+0x18c/0x23f
[1.969632]  [] kernel_init_freeable+0x115/0x196
[1.969643]  [] ? do_early_param+0x88/0x88
[1.969654]  [] ? rest_init+0x131/0x131
[1.969665]  [] kernel_init+0x9/0xd1
[1.969676]  [] ret_from_fork+0x7c/0xb0
[1.969687]  [] ? rest_init+0x131/0x131
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/8] partitions/efi: some style cleanups

2013-08-05 Thread Davidlohr Bueso
Trivial coding style cleanups - still plenty left.

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index 9a81c3b..8e6d77e 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -25,6 +25,9 @@
  * TODO:
  *
  * Changelog:
+ * Mon August 5th, 2013 Davidlohr Bueso 
+ * - detect hybrid MBRs, tighter pMBR checking & cleanups.
+ *
  * Mon Nov 09 2004 Matt Domsch 
  * - test for valid PMBR and valid PGPT before ever reading
  *   AGPT, allow override with 'gpt' kernel command line option.
@@ -288,8 +291,7 @@ static gpt_entry *alloc_read_gpt_entries(struct 
parsed_partitions *state,
return NULL;
 
if (read_lba(state, le64_to_cpu(gpt->partition_entry_lba),
- (u8 *) pte,
-count) < count) {
+ (u8 *) pte, count) < count) {
kfree(pte);
 pte=NULL;
return NULL;
@@ -613,8 +615,7 @@ static int find_valid_gpt(struct parsed_partitions *state, 
gpt_header **gpt,
good_pgpt = is_gpt_valid(state, GPT_PRIMARY_PARTITION_TABLE_LBA,
 , );
 if (good_pgpt)
-   good_agpt = is_gpt_valid(state,
-le64_to_cpu(pgpt->alternate_lba),
+   good_agpt = is_gpt_valid(state, 
le64_to_cpu(pgpt->alternate_lba),
 , );
 if (!good_agpt && force_gpt)
 good_agpt = is_gpt_valid(state, lastlba, , );
@@ -632,9 +633,7 @@ static int find_valid_gpt(struct parsed_partitions *state, 
gpt_header **gpt,
 kfree(agpt);
 kfree(aptes);
 if (!good_agpt) {
-printk(KERN_WARNING 
-  "Alternate GPT is invalid, "
-   "using primary GPT.\n");
+printk(KERN_WARNING "Alternate GPT is invalid, using 
primary GPT.\n");
 }
 return 1;
 }
@@ -643,8 +642,7 @@ static int find_valid_gpt(struct parsed_partitions *state, 
gpt_header **gpt,
 *ptes = aptes;
 kfree(pgpt);
 kfree(pptes);
-printk(KERN_WARNING 
-   "Primary GPT is invalid, using alternate GPT.\n");
+printk(KERN_WARNING "Primary GPT is invalid, using alternate 
GPT.\n");
 return 1;
 }
 
@@ -706,8 +704,7 @@ int efi_partition(struct parsed_partitions *state)
put_partition(state, i+1, start * ssz, size * ssz);
 
/* If this is a RAID volume, tell md */
-   if (!efi_guidcmp(ptes[i].partition_type_guid,
-PARTITION_LINUX_RAID_GUID))
+   if (!efi_guidcmp(ptes[i].partition_type_guid, 
PARTITION_LINUX_RAID_GUID))
state->parts[i + 1].flags = ADDPART_FLAG_RAID;
 
info = >parts[i + 1].info;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/8] partitions/efi: detect hybrid MBRs

2013-08-05 Thread Davidlohr Bueso
One of the biggest problems with GPT is compatibility with older,
non-GPT systems. The problem is addressed by creating hybrid mbrs,
an extension, or variant, of the traditional protective mbr. This
contains, apart from the 0xEE partition, up three additional
primary partitions that point to the same space marked by up to
three GPT partitions. The result is that legacy OSs can see the
three required MBR partitions and at the same time ignore the
GPT-aware partitions that protect the GPT structures.

While hybrid MBRs are hacks, workarounds and simply not part of the
GPT standard, they do exist and we have no way around them. For instance,
by default, OSX creates a hybrid scheme when using multi-OS booting.

In order for Linux to properly discover protective MBRs, it must be
made aware of devices that have hybrid MBRs. No functionality is
changed by this patch, just a debug message informing the user of the
MBR scheme that is being used.

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c | 72 +++---
 block/partitions/efi.h |  3 +++
 2 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index 331cd1c..4bf8165 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -158,7 +158,7 @@ static inline int pmbr_part_valid(gpt_record *part)
 if (le32_to_cpu(part->starting_lba) != GPT_PRIMARY_PARTITION_TABLE_LBA)
 goto invalid;
 
-   return 1;
+   return GPT_MBR_PROTECTIVE;
 invalid:
return 0;
 }
@@ -167,21 +167,47 @@ invalid:
  * is_pmbr_valid(): test Protective MBR for validity
  * @mbr: pointer to a legacy mbr structure
  *
- * Description: Returns 1 if PMBR is valid, 0 otherwise.
- * Validity depends on two things:
+ * Description: Checks for a valid protective or hybrid
+ * master boot record (MBR). The validity of a pMBR depends
+ * on all of the following properties:
  *  1) MSDOS signature is in the last two bytes of the MBR
  *  2) One partition of type 0xEE is found
+ *
+ * In addition, a hybrid MBR will have up to three additional
+ * primary partitions, which point to the same space that's
+ * marked out by up to three GPT partitions.
+ *
+ * Returns 0 upon invalid MBR, or GPT_MBR_PROTECTIVE or
+ * GPT_MBR_HYBRID depending on the device layout.
  */
-static int
-is_pmbr_valid(legacy_mbr *mbr)
+static int is_pmbr_valid(legacy_mbr *mbr)
 {
-   int i;
+   int i, ret = 0; /* invalid by default */
+
if (!mbr || le16_to_cpu(mbr->signature) != MSDOS_MBR_SIGNATURE)
-return 0;
+   goto done;
+
+   for (i = 0; i < 4; i++) {
+   ret = pmbr_part_valid(>partition_record[i]);
+   if (ret == GPT_MBR_PROTECTIVE) {
+   /*
+* Ok, we at least know that there's a protective MBR,
+* now check if there are other partition types for
+* hybrid MBR.
+*/
+   goto check_hybrid;
+   }
+   }
+
+   if (ret != GPT_MBR_PROTECTIVE)
+   goto done;
+check_hybrid:
for (i = 0; i < 4; i++)
-   if (pmbr_part_valid(>partition_record[i]))
-return 1;
-   return 0;
+   if ((mbr->partition_record[i].os_type != 
EFI_PMBR_OSTYPE_EFI_GPT) &&
+   (mbr->partition_record[i].os_type != 0x00))
+   ret = GPT_MBR_HYBRID;
+done:
+   return ret;
 }
 
 /**
@@ -548,17 +574,21 @@ static int find_valid_gpt(struct parsed_partitions 
*state, gpt_header **gpt,
 
lastlba = last_lba(state->bdev);
 if (!force_gpt) {
-/* This will be added to the EFI Spec. per Intel after v1.02. 
*/
-legacymbr = kzalloc(sizeof (*legacymbr), GFP_KERNEL);
-if (legacymbr) {
-read_lba(state, 0, (u8 *) legacymbr,
-sizeof (*legacymbr));
-good_pmbr = is_pmbr_valid(legacymbr);
-kfree(legacymbr);
-}
-if (!good_pmbr)
-goto fail;
-}
+   /* This will be added to the EFI Spec. per Intel after v1.02. */
+   legacymbr = kzalloc(sizeof (*legacymbr), GFP_KERNEL);
+   if (!legacymbr)
+   goto fail;
+
+   read_lba(state, 0, (u8 *) legacymbr, sizeof (*legacymbr));
+   good_pmbr = is_pmbr_valid(legacymbr);
+   kfree(legacymbr);
+
+   if (!good_pmbr)
+   goto fail;
+
+   pr_debug("Device has a %s MBR\n",
+good_pmbr == GPT_MBR_PROTECTIVE ? "protective" : 
"hybrid");
+   }
 
good_pgpt = is_gpt_valid(state, GPT_PRIMARY_PARTITION_TABLE_LBA,
 , );
diff --git a/block/partitions/efi.h 

[PATCH 1/8] partitions/efi: use lba-aware partition records

2013-08-05 Thread Davidlohr Bueso
The kernel's GPT implementation currently uses the generic
'struct partition' type for dealing with legacy MBR partition
records. While this is is useful for disklabels that we designed
for CHS addressing, such as msdos, it doesn't adapt well to newer
standards that use LBA instead, such as GUID partition tables.
Furthermore, these generic partition structures do not have all the
required fields to properly follow the UEFI specs.

While a CHS address can be translated to LBA, it's much simpler and
cleaner to just replace the partition type. This patch adds a new
'gpt_record' type that is fully compliant with EFI and will allow,
in the next patches, to add more checks to properly verify a protective
MBR, which is paramount to probing a device that makes use of GPT.

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c |  7 +++
 block/partitions/efi.h | 16 +++-
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index c85fc89..3ebd3d8 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -149,11 +149,10 @@ static u64 last_lba(struct block_device *bdev)
   bdev_logical_block_size(bdev)) - 1ULL;
 }
 
-static inline int
-pmbr_part_valid(struct partition *part)
+static inline int pmbr_part_valid(gpt_record *part)
 {
-if (part->sys_ind == EFI_PMBR_OSTYPE_EFI_GPT &&
-le32_to_cpu(part->start_sect) == 1UL)
+if (part->os_type == EFI_PMBR_OSTYPE_EFI_GPT &&
+le32_to_cpu(part->start_sector) == 1UL)
 return 1;
 return 0;
 }
diff --git a/block/partitions/efi.h b/block/partitions/efi.h
index b69ab72..46cf1a4 100644
--- a/block/partitions/efi.h
+++ b/block/partitions/efi.h
@@ -101,11 +101,25 @@ typedef struct _gpt_entry {
efi_char16_t partition_name[72 / sizeof (efi_char16_t)];
 } __attribute__ ((packed)) gpt_entry;
 
+typedef struct _gpt_record {
+u8  boot_indicator; /* unused by EFI, set to 0x80 for bootable */
+u8  start_head; /* unused by EFI, pt start in CHS */
+u8  start_sector;   /* unused by EFI, pt start in CHS */
+u8  start_track;
+u8  os_type;/* EFI and legacy non-EFI OS types */
+u8  end_head;   /* unused by EFI, pt end in CHS */
+u8  end_sector; /* unused by EFI, pt end in CHS */
+u8  end_track;  /* unused by EFI, pt end in CHS */
+__le32  starting_lba;   /* used by EFI - start addr of the on disk pt 
*/
+__le32  size_in_lba;/* used by EFI - size of pt in LBA */
+} __attribute__ ((packed)) gpt_record;
+
+
 typedef struct _legacy_mbr {
u8 boot_code[440];
__le32 unique_mbr_signature;
__le16 unknown;
-   struct partition partition_record[4];
+   gpt_record partition_record[4];
__le16 signature;
 } __attribute__ ((packed)) legacy_mbr;
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/8] partitions/efi: do not require gpt partition to begin at sector 1

2013-08-05 Thread Davidlohr Bueso
When detecting a valid protective MBR, the Linux kernel isn't picky about
the partition (1-4) the 0xEE is at, but, unlike other operating systems,
it does require it to begin at the second sector (sector 1). This check, apart
from it not being enforced by UEFI, and causing Linux to potentially fail to 
detect
any *valid* partitions on the disk, can present problems when dealing with 
hybrid
MBRs[1].

For compatibility reasons, if the first partition is hybridized, the 0xEE
partition must be small enough to ensure that it only protects the GPT data
structures - as opposed to the the whole disk in a protective MBR.
This problem is very well described by Rod Smith[1]: where MBR-only partitioning
programs (such as older versions of fdisk) can see some of the disk space as
unallocated, thus loosing the purpose of the 0xEE partition's protection of GPT
data structures.

By dropping this check, this patch enables Linux to be more flexible when 
probing
for GPT disklabels.

[1] http://www.rodsbooks.com/gdisk/hybrid.html#reactions

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index 6a997b1..331cd1c 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -158,12 +158,9 @@ static inline int pmbr_part_valid(gpt_record *part)
 if (le32_to_cpu(part->starting_lba) != GPT_PRIMARY_PARTITION_TABLE_LBA)
 goto invalid;
 
-if (le32_to_cpu(part->start_sector) != 1UL)
-goto invalid;
-
-return 1;
+   return 1;
 invalid:
-return 0;
+   return 0;
 }
 
 /**
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/8] partitions/efi: detect hybrid mbrs

2013-08-05 Thread Davidlohr Bueso
This patchset teaches the kernel about hybrid master boot records (MBRs), one of
the most common issues with GUID partition tables, as a workaround to layout
disk partitions to be compatible with both EFI and legacy MBR based systems.
Except for adding more pmbr checks, to better comply with the UEFI/GPT specs, 
the
functionality is left unchanged - we only inform (through debug) the user about
the used MBR scheme. While it is true that these restrictions can be bypassed 
when
forcing gpt, this is not the correct or default way of doing things, 
complicating
users furthermore. More details are in the individual patches.

Patches 1-5 enables the kernel to inform the user about the mbr scheme being 
used.
They also include more protective mbr checks to be more UEFI compliant - we 
currently
have a very open and generic gpt implementation that can cause non-GPT disks to 
be
recognized/probed as GPT.

Patch 6 adds a missing check when verifying the header integrity.

Patches 7 & 8 are trivial cleanups.

All changes were tested on a macbook pro containing a hybrid mbr and a large 
EFI based 
HP server with a standard protective mbr.

Thanks!

Davidlohr Bueso (8):
  partitions/efi: use lba-aware partition records
  partitions/efi: check pmbr record's starting lba
  partitions/efi: do not require gpt partition to begin at sector 1
  partitions/efi: detect hybrid MBRs
  partitions/efi: account for pmbr size in lba
  partitions/efi: compare first and last usable LBAs
  partitions/efi: delete annoying emacs style comments
  partitions/efi: some style cleanups

 block/partitions/efi.c | 128 ++---
 block/partitions/efi.h |  38 +++
 2 files changed, 108 insertions(+), 58 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/8] partitions/efi: check pmbr record's starting lba

2013-08-05 Thread Davidlohr Bueso
Per the UEFI Specs 2.4, June 2013, the starting lba of the partition
that has the EFI GPT (0xEE) must be set to 0x0001 - this is obviously
the LBA of the GPT Partition Header.

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index 3ebd3d8..6a997b1 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -151,9 +151,18 @@ static u64 last_lba(struct block_device *bdev)
 
 static inline int pmbr_part_valid(gpt_record *part)
 {
-if (part->os_type == EFI_PMBR_OSTYPE_EFI_GPT &&
-le32_to_cpu(part->start_sector) == 1UL)
-return 1;
+if (part->os_type != EFI_PMBR_OSTYPE_EFI_GPT)
+goto invalid;
+
+/* set to 0x0001 (i.e., the LBA of the GPT Partition Header) */
+if (le32_to_cpu(part->starting_lba) != GPT_PRIMARY_PARTITION_TABLE_LBA)
+goto invalid;
+
+if (le32_to_cpu(part->start_sector) != 1UL)
+goto invalid;
+
+return 1;
+invalid:
 return 0;
 }
 
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/8] partitions/efi: account for pmbr size in lba

2013-08-05 Thread Davidlohr Bueso
The partition that has the 0xEE (GPT protective), must
have the size in lba field set to the lesser of the size
of the disk minus one or 0x for larger disks.

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index 4bf8165..ab6cd08 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -166,6 +166,7 @@ invalid:
 /**
  * is_pmbr_valid(): test Protective MBR for validity
  * @mbr: pointer to a legacy mbr structure
+ * @total_sectors: amount of sectors in the device
  *
  * Description: Checks for a valid protective or hybrid
  * master boot record (MBR). The validity of a pMBR depends
@@ -180,9 +181,9 @@ invalid:
  * Returns 0 upon invalid MBR, or GPT_MBR_PROTECTIVE or
  * GPT_MBR_HYBRID depending on the device layout.
  */
-static int is_pmbr_valid(legacy_mbr *mbr)
+static int is_pmbr_valid(legacy_mbr *mbr, sector_t total_sectors)
 {
-   int i, ret = 0; /* invalid by default */
+   int i, part = 0, ret = 0; /* invalid by default */
 
if (!mbr || le16_to_cpu(mbr->signature) != MSDOS_MBR_SIGNATURE)
goto done;
@@ -190,6 +191,7 @@ static int is_pmbr_valid(legacy_mbr *mbr)
for (i = 0; i < 4; i++) {
ret = pmbr_part_valid(>partition_record[i]);
if (ret == GPT_MBR_PROTECTIVE) {
+   part = i;
/*
 * Ok, we at least know that there's a protective MBR,
 * now check if there are other partition types for
@@ -206,6 +208,18 @@ check_hybrid:
if ((mbr->partition_record[i].os_type != 
EFI_PMBR_OSTYPE_EFI_GPT) &&
(mbr->partition_record[i].os_type != 0x00))
ret = GPT_MBR_HYBRID;
+
+   /*
+* Protective MBRs take up the lesser of the whole disk
+* or 2 TiB (32bit LBA), ignoring the rest of the disk.
+*
+* Hybrid MBRs do not necessarily comply with this.
+*/
+   if (ret == GPT_MBR_PROTECTIVE) {
+   if (le32_to_cpu(mbr->partition_record[part].size_in_lba) !=
+   min((uint32_t) total_sectors - 1, 0x))
+   ret = 0;
+   }
 done:
return ret;
 }
@@ -567,6 +581,7 @@ static int find_valid_gpt(struct parsed_partitions *state, 
gpt_header **gpt,
gpt_header *pgpt = NULL, *agpt = NULL;
gpt_entry *pptes = NULL, *aptes = NULL;
legacy_mbr *legacymbr;
+   sector_t total_sectors = i_size_read(state->bdev->bd_inode) >> 9;
u64 lastlba;
 
if (!ptes)
@@ -580,7 +595,7 @@ static int find_valid_gpt(struct parsed_partitions *state, 
gpt_header **gpt,
goto fail;
 
read_lba(state, 0, (u8 *) legacymbr, sizeof (*legacymbr));
-   good_pmbr = is_pmbr_valid(legacymbr);
+   good_pmbr = is_pmbr_valid(legacymbr, total_sectors);
kfree(legacymbr);
 
if (!good_pmbr)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/8] partitions/efi: compare first and last usable LBAs

2013-08-05 Thread Davidlohr Bueso
When verifying GPT header integrity, make sure that
first usable LBA is smaller than last usable LBA.

Signed-off-by: Davidlohr Bueso 
---
 block/partitions/efi.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index ab6cd08..9a81c3b 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -409,7 +409,12 @@ static int is_gpt_valid(struct parsed_partitions *state, 
u64 lba,
 (unsigned long long)lastlba);
goto fail;
}
-
+   if (le64_to_cpu((*gpt)->last_usable_lba) < 
le64_to_cpu((*gpt)->first_usable_lba)) {
+   pr_debug("GPT: last_usable_lba incorrect: %lld > %lld\n",
+(unsigned long 
long)le64_to_cpu((*gpt)->last_usable_lba),
+(unsigned long 
long)le64_to_cpu((*gpt)->first_usable_lba));
+   goto fail;
+   }
/* Check that sizeof_partition_entry has the correct value */
if (le32_to_cpu((*gpt)->sizeof_partition_entry) != sizeof(gpt_entry)) {
pr_debug("GUID Partitition Entry Size check failed.\n");
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 3/3] dma: Add Freescale eDMA engine driver support

2013-08-05 Thread Lu Jingchang-B35083


> -Original Message-
> From: Vinod Koul [mailto:vinod.k...@intel.com]
> Sent: Tuesday, August 06, 2013 12:25 PM
> To: Lu Jingchang-B35083
> Cc: d...@fb.com; shawn@linaro.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; Wang Huan-B18965; Li Xiaochun-
> B41219
> Subject: Re: [PATCH v2 3/3] dma: Add Freescale eDMA engine driver support
> 
> On Tue, Aug 06, 2013 at 01:24:31AM +, Lu Jingchang-B35083 wrote:
> > > -Original Message-
> > > From: Vinod Koul [mailto:vinod.k...@intel.com]
> > > Sent: Tuesday, August 06, 2013 12:35 AM
> > > To: Lu Jingchang-B35083
> > > Cc: d...@fb.com; shawn@linaro.org; linux-kernel@vger.kernel.org;
> > > linux-arm-ker...@lists.infradead.org; Wang Huan-B18965; Li Xiaochun-
> > > B41219
> > > Subject: Re: [PATCH v2 3/3] dma: Add Freescale eDMA engine driver
> > > support
> > > > +
> > > > +static void fsl_edma_free_desc(struct virt_dma_desc *vdesc) {
> > > > +   struct fsl_edma_desc *fsl_desc;
> > > > +   int i;
> > > > +
> > > > +   fsl_desc = to_fsl_edma_desc(vdesc);
> > > > +   for (i = 0; i < fsl_desc->n_tcds; i++)
> > > > +   dma_pool_free(fsl_desc->echan->tcd_pool,
> > > > +   fsl_desc->tcd[i].vtcd,
> > > > +   fsl_desc->tcd[i].ptcd);
> > > > +   kfree(fsl_desc);
> > > should this be called with lock held or not?
> > [Lu Jingchang-B35083]
> > The desc list to be freed is got with lock held, and the free for each
> desc is independent, and the lock is not needed. Thanks.
> Would be apt to add this comment in the code, so that people know this
> function needs to be always called with lock held!
[Lu Jingchang-B35083] 
Sorry, this function is called without lock held, I mean that the free() 
doesn't need the lock held, 
just as other drivers do.
It is called from vchan_free_chan_resources().
Thanks!


Best Regards,
Jingchang


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] perf tools: Separate out GTK codes to libperf-gtk.so

2013-08-05 Thread Namhyung Kim
Separate out GTK codes to a shared object called libperf-gtk.so.  This
time only GTK codes are built with -fPIC and libperf remains as is.

Cc: Andi Kleen 
Cc: Pekka Enberg 
Signed-off-by: Namhyung Kim 
---
 tools/perf/Makefile| 39 ---
 tools/perf/config/Makefile | 14 ++
 2 files changed, 38 insertions(+), 15 deletions(-)

diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index bfd12d02a304..17f0509e0eb0 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -113,6 +113,7 @@ SPARSE_FLAGS = -D__BIG_ENDIAN__ -D__powerpc__
 BUILTIN_OBJS =
 LIB_H =
 LIB_OBJS =
+GTK_OBJS =
 PYRF_OBJS =
 SCRIPT_SH =
 
@@ -484,13 +485,19 @@ ifndef NO_SLANG
 endif
 
 ifndef NO_GTK2
-  LIB_OBJS += $(OUTPUT)ui/gtk/browser.o
-  LIB_OBJS += $(OUTPUT)ui/gtk/hists.o
-  LIB_OBJS += $(OUTPUT)ui/gtk/setup.o
-  LIB_OBJS += $(OUTPUT)ui/gtk/util.o
-  LIB_OBJS += $(OUTPUT)ui/gtk/helpline.o
-  LIB_OBJS += $(OUTPUT)ui/gtk/progress.o
-  LIB_OBJS += $(OUTPUT)ui/gtk/annotate.o
+  ALL_PROGRAMS += $(OUTPUT)libperf-gtk.so
+
+  GTK_OBJS += $(OUTPUT)ui/gtk/browser.o
+  GTK_OBJS += $(OUTPUT)ui/gtk/hists.o
+  GTK_OBJS += $(OUTPUT)ui/gtk/setup.o
+  GTK_OBJS += $(OUTPUT)ui/gtk/util.o
+  GTK_OBJS += $(OUTPUT)ui/gtk/helpline.o
+  GTK_OBJS += $(OUTPUT)ui/gtk/progress.o
+  GTK_OBJS += $(OUTPUT)ui/gtk/annotate.o
+
+install-gtk: $(OUTPUT)libperf-gtk.so
+   $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(libdir_SQ)'
+   $(INSTALL) $(OUTPUT)libperf-gtk.so '$(DESTDIR_SQ)$(libdir_SQ)'
 endif
 
 ifndef NO_LIBPERL
@@ -544,6 +551,12 @@ $(OUTPUT)perf: $(OUTPUT)perf.o $(BUILTIN_OBJS) $(PERFLIBS)
$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $(OUTPUT)perf.o \
$(BUILTIN_OBJS) $(LIBS) -o $@
 
+$(GTK_OBJS): %.o: %.c $(LIB_H)
+   $(QUIET_CC)$(CC) -o $@ -c -fPIC $(CFLAGS) $(GTK_CFLAGS) $<
+
+$(OUTPUT)libperf-gtk.so: $(GTK_OBJS) $(PERFLIBS)
+   $(QUIET_LINK)$(CC) -o $@ -shared $(ALL_LDFLAGS) $(filter %.o,$^) 
$(GTK_LIBS)
+
 $(OUTPUT)builtin-help.o: builtin-help.c $(OUTPUT)common-cmds.h 
$(OUTPUT)PERF-CFLAGS
$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) \
'-DPERF_HTML_PATH="$(htmldir_SQ)"' \
@@ -762,7 +775,9 @@ check: $(OUTPUT)common-cmds.h
 
 ### Installation rules
 
-install-bin: all
+install-gtk:
+
+install-bin: all install-gtk
$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(bindir_SQ)'
$(INSTALL) $(OUTPUT)perf '$(DESTDIR_SQ)$(bindir_SQ)'
$(INSTALL) -d -m 755 
'$(DESTDIR_SQ)$(perfexec_instdir_SQ)/scripts/perl/Perf-Trace-Util/lib/Perf/Trace'
@@ -795,15 +810,17 @@ $(INSTALL_DOC_TARGETS):
 ### Cleaning rules
 
 clean: $(LIBTRACEEVENT)-clean $(LIBLK)-clean
-   $(RM) $(LIB_OBJS) $(BUILTIN_OBJS) $(LIB_FILE) $(OUTPUT)perf-archive 
$(OUTPUT)perf.o $(LANG_BINDINGS)
+   $(RM) $(LIB_OBJS) $(BUILTIN_OBJS) $(LIB_FILE) $(GTK_OBJS)
+   $(RM) $(OUTPUT)perf-archive $(OUTPUT)perf.o $(LANG_BINDINGS)
$(RM) $(ALL_PROGRAMS) perf
-   $(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS 
tags cscope*
+   $(RM) *.spec *.pyc *.pyo */*.pyc */*.pyo
+   $(RM) $(OUTPUT)common-cmds.h TAGS tags cscope*
$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
$(RM) $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)PERF-CFLAGS
$(RM) $(OUTPUT)util/*-bison*
$(RM) $(OUTPUT)util/*-flex*
$(python-clean)
 
-.PHONY: all install clean strip $(LIBTRACEEVENT) $(LIBLK)
+.PHONY: all install clean strip $(LIBTRACEEVENT) $(LIBLK) install-gtk
 .PHONY: shell_compatibility_test please_set_SHELL_PATH_to_a_more_modern_shell
 .PHONY: .FORCE-PERF-VERSION-FILE TAGS tags cscope .FORCE-PERF-CFLAGS
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 214e17e97e5c..6bdfd0302c4e 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -267,11 +267,11 @@ ifndef NO_GTK2
 NO_GTK2 := 1
   else
 ifeq ($(call 
try-cc,$(SOURCE_GTK2_INFOBAR),$(FLAGS_GTK2),-DHAVE_GTK_INFO_BAR),y)
-  CFLAGS += -DHAVE_GTK_INFO_BAR
+  GTK_CFLAGS := -DHAVE_GTK_INFO_BAR
 endif
-CFLAGS += -DGTK2_SUPPORT
-CFLAGS += $(shell pkg-config --cflags gtk+-2.0 2>/dev/null)
-EXTLIBS += $(shell pkg-config --libs gtk+-2.0 2>/dev/null)
+GTK_CFLAGS += -DGTK2_SUPPORT
+GTK_CFLAGS += $(shell pkg-config --cflags gtk+-2.0 2>/dev/null)
+GTK_LIBS := $(shell pkg-config --libs gtk+-2.0 2>/dev/null)
   endif
 endif
 
@@ -456,7 +456,12 @@ else
 sysconfdir = $(prefix)/etc
 ETC_PERFCONFIG = etc/perfconfig
 endif
+ifeq ($(IS_X86_64),1)
+lib = lib64
+else
 lib = lib
+endif
+libdir = $(prefix)/$(lib)
 
 # Shell quote (do not use $(call) to accommodate ancient setups);
 ETC_PERFCONFIG_SQ = $(subst ','\'',$(ETC_PERFCONFIG))
@@ -469,6 +474,7 @@ template_dir_SQ = $(subst ','\'',$(template_dir))
 htmldir_SQ = $(subst ','\'',$(htmldir))
 prefix_SQ = $(subst ','\'',$(prefix))
 sysconfdir_SQ = $(subst ','\'',$(sysconfdir))
+libdir_SQ = $(subst ','\'',$(libdir))
 
 ifneq ($(filter /%,$(firstword $(perfexecdir))),)
 perfexec_instdir = 

[PATCH 4/4] perf tools: Run dynamic loaded GTK browser

2013-08-05 Thread Namhyung Kim
Run GTK hist and annotation browser using libdl.

Cc: Andi Kleen 
Cc: Pekka Enberg 
Signed-off-by: Namhyung Kim 
---
 tools/perf/builtin-annotate.c | 26 +++---
 tools/perf/builtin-report.c   | 16 ++--
 tools/perf/config/Makefile|  2 +-
 tools/perf/ui/gtk/annotate.c  | 13 ++---
 tools/perf/ui/gtk/gtk.h   | 13 +
 tools/perf/util/annotate.h| 24 
 tools/perf/util/hist.h| 15 ---
 7 files changed, 61 insertions(+), 48 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index db491e9a812b..82469b3ead07 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -30,6 +30,7 @@
 #include "util/tool.h"
 #include "arch/common.h"
 
+#include 
 #include 
 
 struct perf_annotate {
@@ -143,8 +144,18 @@ find_next:
 
if (use_browser == 2) {
int ret;
+   int (*annotate)(struct hist_entry *he,
+   struct perf_evsel *evsel,
+   struct hist_browser_timer *hbt);
+
+   annotate = dlsym(perf_gtk_handle,
+"hist_entry__gtk_annotate");
+   if (annotate == NULL) {
+   ui__error("GTK browser not found!\n");
+   return;
+   }
 
-   ret = hist_entry__gtk_annotate(he, evsel, NULL);
+   ret = annotate(he, evsel, NULL);
if (!ret || !ann->skip_missing)
return;
 
@@ -246,8 +257,17 @@ static int __cmd_annotate(struct perf_annotate *ann)
goto out_delete;
}
 
-   if (use_browser == 2)
-   perf_gtk__show_annotations();
+   if (use_browser == 2) {
+   void (*show_annotations)(void);
+
+   show_annotations = dlsym(perf_gtk_handle,
+"perf_gtk__show_annotations");
+   if (show_annotations == NULL) {
+   ui__error("GTK browser not found!\n");
+   goto out_delete;
+   }
+   show_annotations();
+   }
 
 out_delete:
/*
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index d785d89ed226..05c0e80c8ae4 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -35,6 +35,7 @@
 #include "util/hist.h"
 #include "arch/common.h"
 
+#include 
 #include 
 
 struct perf_report {
@@ -592,8 +593,19 @@ static int __cmd_report(struct perf_report *rep)
ret = 0;
 
} else if (use_browser == 2) {
-   perf_evlist__gtk_browse_hists(session->evlist, help,
- NULL, rep->min_percent);
+   int (*hist_browser)(struct perf_evlist *,
+   const char *,
+   struct hist_browser_timer *,
+   float min_pcnt);
+
+   hist_browser = dlsym(perf_gtk_handle,
+"perf_evlist__gtk_browse_hists");
+   if (hist_browser == NULL) {
+   ui__error("GTK browser not found!\n");
+   return ret;
+   }
+   hist_browser(session->evlist, help, NULL,
+rep->min_percent);
}
} else
perf_evlist__tty_browse_hists(session->evlist, rep, help);
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 6bdfd0302c4e..1b6ccb242609 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -269,7 +269,7 @@ ifndef NO_GTK2
 ifeq ($(call 
try-cc,$(SOURCE_GTK2_INFOBAR),$(FLAGS_GTK2),-DHAVE_GTK_INFO_BAR),y)
   GTK_CFLAGS := -DHAVE_GTK_INFO_BAR
 endif
-GTK_CFLAGS += -DGTK2_SUPPORT
+CFLAGS += -DGTK2_SUPPORT
 GTK_CFLAGS += $(shell pkg-config --cflags gtk+-2.0 2>/dev/null)
 GTK_LIBS := $(shell pkg-config --libs gtk+-2.0 2>/dev/null)
   endif
diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index f538794615db..9c7ff8d31b27 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -154,9 +154,9 @@ static int perf_gtk__annotate_symbol(GtkWidget *window, 
struct symbol *sym,
return 0;
 }
 
-int symbol__gtk_annotate(struct symbol *sym, struct map *map,
-struct perf_evsel *evsel,
-struct hist_browser_timer *hbt)
+static int symbol__gtk_annotate(struct symbol *sym, struct map *map,
+   struct perf_evsel *evsel,
+   struct hist_browser_timer 

[PATCH/RFC 0/4] perf ui/gtk: Separate out GTK code to a shared object (v2)

2013-08-05 Thread Namhyung Kim
Hi,

This is v2 of gtk code separation patchset to reduce library
dependencies of the perf executable.

I only built libperf-gtk.so with -fPIC, and it's not linked to libperf
at build time.  All unresolved symbols used for perf should be
resolved at runtime via perf executable (so libperf.a) - I didn't know
that the linker permits unresolved symbols in a shared library at
build time.

Tested on my x86-64 machine only.  It seems work well for me.

The patch 1 is a bug fix and can be applied independently.

You can find it on my 'perf/separate-v2' branch in my tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git


Any comments are welcome, thanks
Namhyung


Cc: Pekka Enberg 
Cc: Andi Kleen 

Namhyung Kim (4):
  perf ui/gtk: Fix segmentation fault on perf_hpp__for_each_format loop
  perf tools: Separate out GTK codes to libperf-gtk.so
  perf tools: Setup GTK browser dynamically
  perf tools: Run dynamic loaded GTK browser

 tools/perf/Makefile   | 39 +++--
 tools/perf/builtin-annotate.c | 26 +++---
 tools/perf/builtin-report.c   | 16 --
 tools/perf/config/Makefile| 12 +++---
 tools/perf/ui/gtk/annotate.c  | 13 ---
 tools/perf/ui/gtk/gtk.h   | 16 ++
 tools/perf/ui/gtk/hists.c |  2 --
 tools/perf/ui/setup.c | 51 +--
 tools/perf/ui/ui.h| 12 +-
 tools/perf/util/annotate.h| 24 
 tools/perf/util/hist.h| 15 -
 11 files changed, 150 insertions(+), 76 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/4] perf tools: Setup GTK browser dynamically

2013-08-05 Thread Namhyung Kim
Call setup/exit GTK browser function using libdl.

Cc: Andi Kleen 
Cc: Pekka Enberg 
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/gtk/gtk.h |  3 +++
 tools/perf/ui/setup.c   | 51 +++--
 tools/perf/ui/ui.h  | 12 +---
 3 files changed, 53 insertions(+), 13 deletions(-)

diff --git a/tools/perf/ui/gtk/gtk.h b/tools/perf/ui/gtk/gtk.h
index 3d96785ef155..09b7a062fd48 100644
--- a/tools/perf/ui/gtk/gtk.h
+++ b/tools/perf/ui/gtk/gtk.h
@@ -20,6 +20,9 @@ struct perf_gtk_context {
guint statbar_ctx_id;
 };
 
+int perf_gtk__init(void);
+void perf_gtk__exit(bool wait_for_ok);
+
 extern struct perf_gtk_context *pgctx;
 
 static inline bool perf_gtk__is_active_context(struct perf_gtk_context *ctx)
diff --git a/tools/perf/ui/setup.c b/tools/perf/ui/setup.c
index 47d9a571f261..51a7f7357371 100644
--- a/tools/perf/ui/setup.c
+++ b/tools/perf/ui/setup.c
@@ -1,4 +1,5 @@
 #include 
+#include 
 
 #include "../util/cache.h"
 #include "../util/debug.h"
@@ -6,6 +7,52 @@
 
 pthread_mutex_t ui__lock = PTHREAD_MUTEX_INITIALIZER;
 
+#ifdef GTK2_SUPPORT
+void *perf_gtk_handle;
+
+static int setup_gtk_browser(void)
+{
+   int (*perf_ui_init)(void);
+
+   perf_gtk_handle = dlopen("libperf-gtk.so", RTLD_LAZY);
+   if (perf_gtk_handle == NULL)
+   return -1;
+
+   perf_ui_init = dlsym(perf_gtk_handle, "perf_gtk__init");
+   if (perf_ui_init == NULL)
+   goto out_close;
+
+   if (perf_ui_init() == 0)
+   return 0;
+
+out_close:
+   dlclose(perf_gtk_handle);
+   return -1;
+}
+
+static void exit_gtk_browser(bool wait_for_ok)
+{
+   void (*perf_ui_exit)(bool);
+
+   if (perf_gtk_handle == NULL)
+   return;
+
+   perf_ui_exit = dlsym(perf_gtk_handle, "perf_gtk__exit");
+   if (perf_ui_exit == NULL)
+   goto out_close;
+
+   perf_ui_exit(wait_for_ok);
+
+out_close:
+   dlclose(perf_gtk_handle);
+
+   perf_gtk_handle = NULL;
+}
+#else
+static inline int setup_gtk_browser(void) { return -1; }
+static inline void exit_gtk_browser(bool wait_for_ok __maybe_unused) {}
+#endif
+
 void setup_browser(bool fallback_to_pager)
 {
if (use_browser < 2 && (!isatty(1) || dump_trace))
@@ -17,7 +64,7 @@ void setup_browser(bool fallback_to_pager)
 
switch (use_browser) {
case 2:
-   if (perf_gtk__init() == 0)
+   if (setup_gtk_browser() == 0)
break;
/* fall through */
case 1:
@@ -39,7 +86,7 @@ void exit_browser(bool wait_for_ok)
 {
switch (use_browser) {
case 2:
-   perf_gtk__exit(wait_for_ok);
+   exit_gtk_browser(wait_for_ok);
break;
 
case 1:
diff --git a/tools/perf/ui/ui.h b/tools/perf/ui/ui.h
index 70cb0d4eb8aa..4f7cbe6a2608 100644
--- a/tools/perf/ui/ui.h
+++ b/tools/perf/ui/ui.h
@@ -6,6 +6,7 @@
 #include 
 
 extern pthread_mutex_t ui__lock;
+extern void *perf_gtk_handle;
 
 extern int use_browser;
 
@@ -23,17 +24,6 @@ static inline int ui__init(void)
 static inline void ui__exit(bool wait_for_ok __maybe_unused) {}
 #endif
 
-#ifdef GTK2_SUPPORT
-int perf_gtk__init(void);
-void perf_gtk__exit(bool wait_for_ok);
-#else
-static inline int perf_gtk__init(void)
-{
-   return -1;
-}
-static inline void perf_gtk__exit(bool wait_for_ok __maybe_unused) {}
-#endif
-
 void ui__refresh_dimensions(bool force);
 
 #endif /* _PERF_UI_H_ */
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4] perf ui/gtk: Fix segmentation fault on perf_hpp__for_each_format loop

2013-08-05 Thread Namhyung Kim
From: Namhyung Kim 

The commit 2b8bfa6bb8a7 ("perf tools: Centralize default columns init
in perf_hpp__init") moves initialization of common overhead column to
perf_hpp__init() but forgot about the gtk code.

So the gtk code added the same column to the list twice causing
infinite loop when iterating it by perf_hpp__for_each_format loop.
When I run perf report --gtk, I can see following messages
indefinitely.

  (perf:11687): Gtk-CRITICAL **: IA__gtk_main_quit: assertion 'main_loops != 
NULL' failed
  perf: Segmentation fault

Cc: Jiri Olsa 
Cc: Pekka Enberg 
Cc: Andi Kleen 
Signed-off-by: Namhyung Kim 
---
 tools/perf/ui/gtk/hists.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index cb2ed1980147..2ca66cc1160f 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -109,8 +109,6 @@ __HPP_COLOR_PERCENT_FN(overhead_guest_us, period_guest_us)
 
 void perf_gtk__init_hpp(void)
 {
-   perf_hpp__column_enable(PERF_HPP__OVERHEAD);
-
perf_hpp__init();
 
perf_hpp__format[PERF_HPP__OVERHEAD].color =
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpumask: fix cpumask leak in partition_sched_domains

2013-08-05 Thread Xiaotian Feng
On Tue, Aug 6, 2013 at 12:37 PM, Rusty Russell  wrote:
> Xiaotian Feng  writes:
>> On Sat, Jul 27, 2013 at 3:26 PM, Xiaotian Feng  wrote:
>>> If doms_new is NULL, partition_sched_domains() will reset ndoms_cur
>>> to 0, and free old sched domains with free_sched_domains(doms_cur, 
>>> ndoms_cur).
>>> As ndoms_cur is 0, the cpumask will not be freed.
>>>
>>> Signed-off-by: Xiaotian Feng 
>>> Cc: Ingo Molnar 
>>> Cc: Peter Zijlstra 
>>> Cc: linux-kernel@vger.kernel.org
>>
>> Any comments? Cc'ed Rusty.
>
> The code is a little convoluted, but your fix is logical.
>

Yes, it's quite convoluted :(

>>> ---
>>>  kernel/sched/core.c |5 +++--
>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index b7c32cb..3d6c57b 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -6184,8 +6184,9 @@ match1:
>>> ;
>>> }
>>>
>>> +   n= ndoms_cur;
>
> You're missing a ' ' here:
> n = ndoms_cur;
>

I'll update this, thanks :)

>>> if (doms_new == NULL) {
>>> -   ndoms_cur = 0;
>>> +   n = 0;
>>> doms_new = _doms;
>>> cpumask_andnot(doms_new[0], cpu_active_mask, 
>>> cpu_isolated_map);
>>> WARN_ON_ONCE(dattr_new);
>>> @@ -6193,7 +6194,7 @@ match1:
>>>
>>> /* Build new domains */
>>> for (i = 0; i < ndoms_new; i++) {
>>> -   for (j = 0; j < ndoms_cur && !new_topology; j++) {
>>> +   for (j = 0; j < n && !new_topology; j++) {
>>> if (cpumask_equal(doms_new[i], doms_cur[j])
>>> && dattrs_equal(dattr_new, i, dattr_cur, j))
>>> goto match2;
>>> --
>>> 1.7.9.6 (Apple Git-31.1)
>>>
>
> Cheers,
> Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/3] dma: Add Freescale eDMA engine driver support

2013-08-05 Thread Vinod Koul
On Tue, Aug 06, 2013 at 01:24:31AM +, Lu Jingchang-B35083 wrote:
> > -Original Message-
> > From: Vinod Koul [mailto:vinod.k...@intel.com]
> > Sent: Tuesday, August 06, 2013 12:35 AM
> > To: Lu Jingchang-B35083
> > Cc: d...@fb.com; shawn@linaro.org; linux-kernel@vger.kernel.org;
> > linux-arm-ker...@lists.infradead.org; Wang Huan-B18965; Li Xiaochun-
> > B41219
> > Subject: Re: [PATCH v2 3/3] dma: Add Freescale eDMA engine driver support
> > > +
> > > +static void fsl_edma_free_desc(struct virt_dma_desc *vdesc) {
> > > + struct fsl_edma_desc *fsl_desc;
> > > + int i;
> > > +
> > > + fsl_desc = to_fsl_edma_desc(vdesc);
> > > + for (i = 0; i < fsl_desc->n_tcds; i++)
> > > + dma_pool_free(fsl_desc->echan->tcd_pool,
> > > + fsl_desc->tcd[i].vtcd,
> > > + fsl_desc->tcd[i].ptcd);
> > > + kfree(fsl_desc);
> > should this be called with lock held or not?
> [Lu Jingchang-B35083] 
> The desc list to be freed is got with lock held, and the free for each desc 
> is independent, and the lock is not needed. Thanks.
Would be apt to add this comment in the code, so that people know this function
needs to be always called with lock held!

Pls add this stuff in next rev of the patch

~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Why name length of an array can hit case of "Inconsistent kallsyms data" fail

2013-08-05 Thread vincent Ho
While I'm porting a new mach for ARM. I hit a case to produce
"Inconsistent kallsyms data".
I try to find out the solution and then get an interesting case is
about "array name length"

Kernel is 3.4.56 and gcc version 4.3.5 (Buildroot 2010.11)

While register serial8250 platform device, the code are

>#include 
>#include 
>#include 
>#include 

>static struct plat_serial8250_port abcdef_serial_platform_data[] = {
>{
>.membase= (void *)UARTC_0_MMR_BASE_VRT,
>.mapbase= VPL_UARTC_0_MMR_BASE_PHY,
>.irq= UARTC0_IRQ_NUM,
>.uartclk= 2400,
>.flags= UPF_BOOT_AUTOCONF | UPF_SKIP_TEST | UPF_FIXED_TYPE,
>.iotype= UPIO_MEM32,
>.regshift= 2,
>.type= PORT_16550A,
>},
>{},
>};
> static struct platform_device board_serial_device = {
>.name= "serial8250",
>.id= PLAT8250_DEV_PLATFORM,
>.dev= {
>.platform_data  = abcdef_serial_platform_data,
>},
>};
>void __init board_serial_init(void)
>{
>platform_device_register(_serial_device);
>}

Kernel make fail of original code by the "Inconsistent kallsyms data"  error.
BUT once I modified the array name of "abcdef_serial_platform_data" to
"abcde_serial_platform_data", i.e reduce one alphabet,
the Kernel make can complete without "Inconsistent kallsyms data" error.
I have no idea why the array name length can make difference.
There is no other change between these two make test, just modify the
array name.
Does this means some "Inconsistent kallsyms data" are caused by symbol
length and effects final two KSYM flow?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] xen/trace: replace old code with __print_symbolic

2013-08-05 Thread kpark3469
From: Sahara 

The advantage of using __print_symbolic() is that it allows both perf
and trace-cmd to read this event properly. Their parsers are not full C
parsers, and when you open code the the processing, they both will fail
to parse how to read the output, and will just default to printing the
fields via their raw numbers.
Another advantage is if the __entry->action is not one of the defined
fields, instead of outputting "??" it will output the number in hex. Say
if __entry->action is 0x123, the __print_symbolic will return "0x123" as
a string and that will be shown to the user, letting you know the actual
value of the field that was unknown.

Signed-off-by: Sahara 
---
 include/trace/events/xen.h |   33 +++--
 1 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/include/trace/events/xen.h b/include/trace/events/xen.h
index d06b6da..8c6f945 100644
--- a/include/trace/events/xen.h
+++ b/include/trace/events/xen.h
@@ -10,6 +10,25 @@
 
 struct multicall_entry;
 
+#define show_paravirt_lazy_mode(val)   \
+   __print_symbolic(val,   \
+   { PARAVIRT_LAZY_NONE,   "LAZY_NONE" },  \
+   { PARAVIRT_LAZY_MMU,"LAZY_MMU" },   \
+   { PARAVIRT_LAZY_CPU,"LAZY_CPU" })
+
+#define show_xen_mc_flush_reason(val)  \
+   __print_symbolic(val,   \
+   { XEN_MC_FL_NONE,   "NONE" },   \
+   { XEN_MC_FL_BATCH,  "BATCH" },  \
+   { XEN_MC_FL_ARGS,   "ARGS" },   \
+   { XEN_MC_FL_CALLBACK,   "CALLBACK" })
+
+#define show_xen_mc_extend_args(val)   \
+   __print_symbolic(val,   \
+   { XEN_MC_XE_OK, "OK" }, \
+   { XEN_MC_XE_BAD_OP, "BAD_OP" }, \
+   { XEN_MC_XE_NO_SPACE,   "NO_SPACE" })
+
 /* Multicalls */
 DECLARE_EVENT_CLASS(xen_mc__batch,
TP_PROTO(enum paravirt_lazy_mode mode),
@@ -18,9 +37,8 @@ DECLARE_EVENT_CLASS(xen_mc__batch,
__field(enum paravirt_lazy_mode, mode)
),
TP_fast_assign(__entry->mode = mode),
-   TP_printk("start batch LAZY_%s",
- (__entry->mode == PARAVIRT_LAZY_MMU) ? "MMU" :
- (__entry->mode == PARAVIRT_LAZY_CPU) ? "CPU" : "NONE")
+   TP_printk("start batch %s",
+ show_paravirt_lazy_mode(__entry->mode)
);
 #define DEFINE_XEN_MC_BATCH(name)  \
DEFINE_EVENT(xen_mc__batch, name,   \
@@ -82,10 +100,7 @@ TRACE_EVENT(xen_mc_flush_reason,
),
TP_fast_assign(__entry->reason = reason),
TP_printk("flush reason %s",
- (__entry->reason == XEN_MC_FL_NONE) ? "NONE" :
- (__entry->reason == XEN_MC_FL_BATCH) ? "BATCH" :
- (__entry->reason == XEN_MC_FL_ARGS) ? "ARGS" :
- (__entry->reason == XEN_MC_FL_CALLBACK) ? "CALLBACK" : 
"??")
+ show_xen_mc_flush_reason(__entry->reason)
);
 
 TRACE_EVENT(xen_mc_flush,
@@ -117,9 +132,7 @@ TRACE_EVENT(xen_mc_extend_args,
TP_printk("extending op %u%s by %zu bytes res %s",
  __entry->op, xen_hypercall_name(__entry->op),
  __entry->args,
- __entry->res == XEN_MC_XE_OK ? "OK" :
- __entry->res == XEN_MC_XE_BAD_OP ? "BAD_OP" :
- __entry->res == XEN_MC_XE_NO_SPACE ? "NO_SPACE" : "???")
+ show_xen_mc_extend_args(__entry->res)
);
 
 /* mmu */
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] PM / QoS: use __print_symbolic with more convenient way

2013-08-05 Thread kpark3469
From: Sahara 

This patch is to prevent the same __print_symbolic functions from
being used repeatedly.

Signed-off-by: Sahara 
---
 include/trace/events/power.h |   43 +
 1 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 8e42410..34efacc 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -182,6 +182,23 @@ DEFINE_EVENT(power_domain, power_domain_target,
 /*
  * The pm qos events are used for pm qos update
  */
+#define show_pm_qos_class(val) \
+   __print_symbolic(val,   \
+   { PM_QOS_CPU_DMA_LATENCY,   "CPU_DMA_LATENCY" },\
+   { PM_QOS_NETWORK_LATENCY,   "NETWORK_LATENCY" },\
+   { PM_QOS_NETWORK_THROUGHPUT,"NETWORK_THROUGHPUT" })
+
+#define show_pm_qos_req_action(val)\
+   __print_symbolic(val,   \
+   { PM_QOS_ADD_REQ,   "ADD_REQ" },\
+   { PM_QOS_UPDATE_REQ,"UPDATE_REQ" }, \
+   { PM_QOS_REMOVE_REQ,"REMOVE_REQ" })
+
+#define show_dev_pm_qos_req_type(val)  \
+   __print_symbolic(val,   \
+   { DEV_PM_QOS_LATENCY,   "DEV_PM_QOS_LATENCY" }, \
+   { DEV_PM_QOS_FLAGS, "DEV_PM_QOS_FLAGS" })
+
 DECLARE_EVENT_CLASS(pm_qos_request,
 
TP_PROTO(int pm_qos_class, s32 value),
@@ -199,11 +216,7 @@ DECLARE_EVENT_CLASS(pm_qos_request,
),
 
TP_printk("pm_qos_class=%s value=%d",
- __print_symbolic(__entry->pm_qos_class,
-   { PM_QOS_CPU_DMA_LATENCY,   "CPU_DMA_LATENCY" },
-   { PM_QOS_NETWORK_LATENCY,   "NETWORK_LATENCY" },
-   { PM_QOS_NETWORK_THROUGHPUT,"NETWORK_THROUGHPUT" }),
- __entry->value)
+ show_pm_qos_class(__entry->pm_qos_class), __entry->value)
 );
 
 DEFINE_EVENT(pm_qos_request, pm_qos_add_request,
@@ -246,10 +259,7 @@ TRACE_EVENT(pm_qos_update_request_timeout,
),
 
TP_printk("pm_qos_class=%s value=%d, timeout_us=%ld",
- __print_symbolic(__entry->pm_qos_class,
-   { PM_QOS_CPU_DMA_LATENCY,   "CPU_DMA_LATENCY" },
-   { PM_QOS_NETWORK_LATENCY,   "NETWORK_LATENCY" },
-   { PM_QOS_NETWORK_THROUGHPUT,"NETWORK_THROUGHPUT" }),
+ show_pm_qos_class(__entry->pm_qos_class),
  __entry->value, __entry->timeout_us)
 );
 
@@ -272,10 +282,7 @@ DECLARE_EVENT_CLASS(pm_qos_update,
),
 
TP_printk("action=%s prev_value=%d curr_value=%d",
- __print_symbolic(__entry->action,
-   { PM_QOS_ADD_REQ,   "ADD_REQ" },
-   { PM_QOS_UPDATE_REQ,"UPDATE_REQ" },
-   { PM_QOS_REMOVE_REQ,"REMOVE_REQ" }),
+ show_pm_qos_req_action(__entry->action),
  __entry->prev_value, __entry->curr_value)
 );
 
@@ -293,10 +300,7 @@ DEFINE_EVENT_PRINT(pm_qos_update, pm_qos_update_flags,
TP_ARGS(action, prev_value, curr_value),
 
TP_printk("action=%s prev_value=0x%x curr_value=0x%x",
- __print_symbolic(__entry->action,
-   { PM_QOS_ADD_REQ,   "ADD_REQ" },
-   { PM_QOS_UPDATE_REQ,"UPDATE_REQ" },
-   { PM_QOS_REMOVE_REQ,"REMOVE_REQ" }),
+ show_pm_qos_req_action(__entry->action),
  __entry->prev_value, __entry->curr_value)
 );
 
@@ -321,10 +325,7 @@ DECLARE_EVENT_CLASS(dev_pm_qos_request,
 
TP_printk("device=%s type=%s new_value=%d",
  __get_str(name),
- __print_symbolic(__entry->type,
-   { DEV_PM_QOS_LATENCY,   "DEV_PM_QOS_LATENCY" },
-   { DEV_PM_QOS_FLAGS, "DEV_PM_QOS_FLAGS" }),
- __entry->new_value)
+ show_dev_pm_qos_req_type(__entry->type), __entry->new_value)
 );
 
 DEFINE_EVENT(dev_pm_qos_request, dev_pm_qos_add_request,
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/8] hugetlb: add pmd_huge_support() to migrate only pmd-based hugepage

2013-08-05 Thread Naoya Horiguchi
On Tue, Aug 06, 2013 at 07:26:10AM +0530, Aneesh Kumar K.V wrote:
> Naoya Horiguchi  writes:
> 
> > This patch is motivated by the discussion with Aneesh about "extend
> > hugepage migration" patchset.
> >   http://thread.gmane.org/gmane.linux.kernel.mm/103933/focus=104391
> > I'll append this to the patchset in the next post, but before that
> > I want this patch to be reviewed (I don't want to repeat posting the
> > whole set for just minor changes.)
> >
> > Any comments?
> >
> > Thanks,
> > Naoya Horiguchi
> > ---
> > From: Naoya Horiguchi 
> > Date: Mon, 5 Aug 2013 13:33:02 -0400
> > Subject: [PATCH] hugetlb: add pmd_huge_support() to migrate only pmd-based
> >  hugepage
> >
> > Currently hugepage migration works well only for pmd-based hugepages,
> > because core routines of hugepage migration use pmd specific internal
> > functions like huge_pte_offset(). So we should not enable the migration
> > of other levels of hugepages until we are ready for it.
> 
> I guess huge_pte_offset may not be the right reason because archs do
> implement huge_pte_offsets even if they are not pmd-based hugepages
> 
> pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
> {
>   /* Only called for hugetlbfs pages, hence can ignore THP */
>   return find_linux_pte_or_hugepte(mm->pgd, addr, NULL);
> }

You're right, sorry.
Honestly saying, I tested only on x86 and my testing on pud-based hugepage
is not enough (I experienced undissolved bugs,) so I want to restrict the
target for now.

> >
> > Some users of hugepage migration (mbind, move_pages, and migrate_pages)
> > do page table walk and check pud/pmd_huge() there, so they are safe.
> > But the other users (softoffline and memory hotremove) don't do this,
> > so they can try to migrate unexpected types of hugepages.
> >
> > To prevent this, we introduce an architecture dependent check of whether
> > hugepage are implemented on a pmd basis or not. It returns 0 if pmd_huge()
> > returns always 0, and 1 otherwise.
> >
> 
> so why not #define pmd_huge_support pmd_huge or use pmd_huge directly ?

The caller (unmap_and_move_huge_page) doesn't have pmd, so we need do
rmap to get the pmd associated with the source hugepage. Maybe the patch
becomes smaller with this, but maybe it's slower.

Thanks,
Naoya Horiguchi

> > Signed-off-by: Naoya Horiguchi 
> > ---
> >  arch/arm/mm/hugetlbpage.c |  5 +
> >  arch/arm64/mm/hugetlbpage.c   |  5 +
> >  arch/ia64/mm/hugetlbpage.c|  5 +
> >  arch/metag/mm/hugetlbpage.c   |  5 +
> >  arch/mips/mm/hugetlbpage.c|  5 +
> >  arch/powerpc/mm/hugetlbpage.c | 10 ++
> >  arch/s390/mm/hugetlbpage.c|  5 +
> >  arch/sh/mm/hugetlbpage.c  |  5 +
> >  arch/sparc/mm/hugetlbpage.c   |  5 +
> >  arch/tile/mm/hugetlbpage.c|  5 +
> >  arch/x86/mm/hugetlbpage.c |  8 
> >  include/linux/hugetlb.h   |  2 ++
> >  mm/migrate.c  | 11 +++
> >  13 files changed, 76 insertions(+)
> >
> > diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> > index 3d1e4a2..3f3b6a7 100644
> > --- a/arch/arm/mm/hugetlbpage.c
> > +++ b/arch/arm/mm/hugetlbpage.c
> > @@ -99,3 +99,8 @@ int pmd_huge(pmd_t pmd)
> >  {
> > return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
> >  }
> > +
> > +int pmd_huge_support(void)
> > +{
> > +   return 1;
> > +}
> 
> -aneesh
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] OPP: Export opp_add()

2013-08-05 Thread Viresh Kumar
On 5 August 2013 18:17, Rafael J. Wysocki  wrote:
> On Monday, August 05, 2013 07:29:09 AM Nishanth Menon wrote:
>> minor nitpick.
>> $subject: PM / OPP:

Stupid mistake.

> Those are things I can easily fix up when I'm applying the patch.

Thanks :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpumask: fix cpumask leak in partition_sched_domains

2013-08-05 Thread Rusty Russell
Xiaotian Feng  writes:
> On Sat, Jul 27, 2013 at 3:26 PM, Xiaotian Feng  wrote:
>> If doms_new is NULL, partition_sched_domains() will reset ndoms_cur
>> to 0, and free old sched domains with free_sched_domains(doms_cur, 
>> ndoms_cur).
>> As ndoms_cur is 0, the cpumask will not be freed.
>>
>> Signed-off-by: Xiaotian Feng 
>> Cc: Ingo Molnar 
>> Cc: Peter Zijlstra 
>> Cc: linux-kernel@vger.kernel.org
>
> Any comments? Cc'ed Rusty.

The code is a little convoluted, but your fix is logical.

>> ---
>>  kernel/sched/core.c |5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index b7c32cb..3d6c57b 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6184,8 +6184,9 @@ match1:
>> ;
>> }
>>
>> +   n= ndoms_cur;

You're missing a ' ' here:
n = ndoms_cur;

>> if (doms_new == NULL) {
>> -   ndoms_cur = 0;
>> +   n = 0;
>> doms_new = _doms;
>> cpumask_andnot(doms_new[0], cpu_active_mask, 
>> cpu_isolated_map);
>> WARN_ON_ONCE(dattr_new);
>> @@ -6193,7 +6194,7 @@ match1:
>>
>> /* Build new domains */
>> for (i = 0; i < ndoms_new; i++) {
>> -   for (j = 0; j < ndoms_cur && !new_topology; j++) {
>> +   for (j = 0; j < n && !new_topology; j++) {
>> if (cpumask_equal(doms_new[i], doms_cur[j])
>> && dattrs_equal(dattr_new, i, dattr_cur, j))
>> goto match2;
>> --
>> 1.7.9.6 (Apple Git-31.1)
>>

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


3.11.0-rc4 (Linus GIT) -- WARNING: CPU: 1 PID: 1 at kernel/time/tick-sched.c:185 can_stop_full_tick+0x7e/0x89()

2013-08-05 Thread Miles Lane
I am not seeing any problems in the behavior of the computer, but
wonder if this indicates something that needs fixing.

[1.969109] WARNING: CPU: 1 PID: 1 at kernel/time/tick-sched.c:185
can_stop_full_tick+0x7e/0x89()
[1.969121] NO_HZ FULL will not work with unstable sched clock
[1.969129] Modules linked in:
[1.969142] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.11.0-rc4 #150
[1.969152] Hardware name: ASUSTeK Computer Inc. UL50VT
 /UL50VT, BIOS 217 03/01/2010
[1.969166]   88013f303eb0 8138347b
88013f303ef8
[1.969183]  88013f303ee8 8103454f 81079bbf
88013f30d7b0
[1.969199]  0001 88013a801dc0 88013a802240
88013f303f48
[1.969216] Call Trace:
[1.969222][] dump_stack+0x4e/0x82
[1.969241]  [] warn_slowpath_common+0x75/0x8e
[1.969253]  [] ? can_stop_full_tick+0x7e/0x89
[1.969265]  [] warn_slowpath_fmt+0x47/0x49
[1.969278]  [] can_stop_full_tick+0x7e/0x89
[1.969290]  [] tick_nohz_irq_exit+0x63/0x7f
[1.969302]  [] irq_exit+0xa4/0xac
[1.969314]  [] smp_apic_timer_interrupt+0x30/0x3c
[1.969327]  [] apic_timer_interrupt+0x6f/0x80
[1.969336][] ? save_stack_trace+0x26/0x41
[1.969355]  [] ? _raw_spin_unlock_irqrestore+0x3c/0x69
[1.969369]  [] __slab_free+0x53/0x317
[1.969382]  [] ? debug_check_no_obj_freed+0x103/0x153
[1.969397]  [] kfree+0x102/0x111
[1.969410]  [] ? acpi_ns_get_node+0xb6/0xc6
[1.969422]  [] acpi_ns_get_node+0xb6/0xc6
[1.969434]  [] ? _raw_spin_unlock_irqrestore+0x5b/0x69
[1.969447]  [] ? up+0x34/0x39
[1.969459]  [] ? acpi_os_signal_semaphore+0x1c/0x28
[1.969472]  [] acpi_get_handle+0x7e/0x92
[1.969486]  [] pnpacpi_add_device_handler+0x57/0x217
[1.969499]  [] acpi_ns_get_device_callback+0x135/0x14b
[1.969511]  [] ? up+0x34/0x39
[1.969523]  [] acpi_ns_walk_namespace+0xc3/0x17a
[1.969535]  [] ? acpi_walk_namespace+0xc0/0xc0
[1.969547]  [] acpi_get_devices+0x5d/0x72
[1.969560]  [] ? ispnpidacpi+0x84/0x84
[1.969571]  [] ? pnpacpi_add_device_handler+0x217/0x217
[1.969584]  [] pnpacpi_init+0x5e/0x8c
[1.969596]  [] do_one_initcall+0x8e/0x12b
[1.969608]  [] ? parameq+0x1d/0x1f
[1.969619]  [] ? parse_args+0x18c/0x23f
[1.969632]  [] kernel_init_freeable+0x115/0x196
[1.969643]  [] ? do_early_param+0x88/0x88
[1.969654]  [] ? rest_init+0x131/0x131
[1.969665]  [] kernel_init+0x9/0xd1
[1.969676]  [] ret_from_fork+0x7c/0xb0
[1.969687]  [] ? rest_init+0x131/0x131
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

2013-08-05 Thread Naoya Horiguchi
On Tue, Aug 06, 2013 at 07:22:02AM +0530, Aneesh Kumar K.V wrote:
> Naoya Horiguchi  writes:
> >> 
> >> Considering that we have architectures that won't support migrating
> >> explicit hugepages with this patch series, is it ok to use
> >> GFP_HIGHUSER_MOVABLE for hugepage allocation ?
> >
> > Originally this parameter was introduced to make hugepage pool on 
> > ZONE_MOVABLE.
> > The benefit is that we can extend the hugepage pool more easily,
> > because external fragmentation less likely happens than other zone type
> > by rearranging fragmented pages with page migration/reclaim.
> >
> > So I think using ZONE_MOVABLE for hugepage allocation by default makes sense
> > even on the architectures which don't support hugepage migration.
> 
> But allocating hugepages from ZONE_MOVABLE means we have pages in that
> zone which we can't migrate. Doesn't that impact other features like
> hotplug ?

Memory blocks occupied by hugepages are not removable before this patchset,
whether they are from ZONE_MOVABLE or not, and the hugepage users accepted
it for now. So I think this change doesn't make things worse than now. 

It can be more preferable to switch on/off __GFP_MOVABLE flag depending on
archs without using the tunable parameter. I'm ok for this direction, but
I want to do it as a separate work.

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 09:14 PM, Mathieu Desnoyers wrote:
>>
>> For unconditional jmp that should be pretty safe barring any fundamental
>> changes to the instruction set, in which case we can enable it as
>> needed, but for extra robustness it probably should skip prefix bytes.
> 
> On x86-32, some prefixes are actually meaningful. AFAIK, the 0x66 prefix
> is used for:
> 
> E9 cw   jmp rel16   relative jump, only in 32-bit
> 
> Other prefixes can probably be safely skipped.
> 

Yes.  Some of them are used as hints or for MPX.

> Another question is whether anything prevents the assembler from
> generating a jump near (absolute indirect), or far jump. The code above
> seems to assume that we have either a short or near relative jump.

Absolutely something prevents!  It would be a very serious error for the
assembler to generate such instructions.

-hpa




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] clk: s2mps11: Add support for s2mps11

2013-08-05 Thread Yadwinder Singh Brar
Hi Mike,

On Tue, Aug 6, 2013 at 4:53 AM, Mike Turquette  wrote:
> Quoting Yadwinder Singh Brar (2013-07-07 04:44:20)
>> This patch adds support to register three(AP/CP/BT) buffered 32.768 KHz
>> outputs of mfd-s2mps11 with common clock framework.
>>
>> Signed-off-by: Yadwinder Singh Brar 
>
> Yadwinder,
>
> Looks good to me with the exception of a binding description document.
> Can you provide one and squash it into this commit?
>

Binding description is provided in next patch :
[PATCH 2/3] mfd: sec: Add clock cell for s2mps11  :
https://lkml.org/lkml/2013/7/16/228

Since its a MFD, so i preferred to add documentation in same file
under mfd, same was done for regulators also.
If its fine with you then, I think that patch can go through MFD tree
as same file is touched in for-next of MFD tree.


Thanks,
Yadwinder
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Mathieu Desnoyers
* H. Peter Anvin (h...@linux.intel.com) wrote:
> On 08/05/2013 02:28 PM, Mathieu Desnoyers wrote:
> > * Linus Torvalds (torva...@linux-foundation.org) wrote:
> >> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers
> >>  wrote:
> >>>
> >>> I remember that choosing between 2 and 5 bytes nop in the asm goto was
> >>> tricky: it had something to do with the fact that gcc doesn't know the
> >>> exact size of each instructions until further down within compilation
> >>
> >> Oh, you can't do it in the coompiler, no. But you don't need to. The
> >> assembler will pick the right version if you just do "jmp target".
> > 
> > Yep.
> > 
> > Another thing that bothers me with Steven's approach is that decoding
> > jumps generated by the compiler seems fragile IMHO.
> > 
> > x86 decoding proposed by https://lkml.org/lkml/2012/3/8/464 :
> > 
> > +static int make_nop_x86(void *map, size_t const offset)
> > +{
> > +   unsigned char *op;
> > +   unsigned char *nop;
> > +   int size;
> > +
> > +   /* Determine which type of jmp this is 2 byte or 5. */
> > +   op = map + offset;
> > +   switch (*op) {
> > +   case 0xeb: /* 2 byte */
> > +   size = 2;
> > +   nop = ideal_nop2_x86;
> > +   break;
> > +   case 0xe9: /* 5 byte */
> > +   size = 5;
> > +   nop = ideal_nop;
> > +   break;
> > +   default:
> > +   die(NULL, "Bad jump label section (bad op %x)\n", *op);
> > +   __builtin_unreachable();
> > +   }
> > 
> > My though is that the code above does not cover all jump encodings that
> > can be generated by past, current and future x86 assemblers.
> > 
> 
> For unconditional jmp that should be pretty safe barring any fundamental
> changes to the instruction set, in which case we can enable it as
> needed, but for extra robustness it probably should skip prefix bytes.

On x86-32, some prefixes are actually meaningful. AFAIK, the 0x66 prefix
is used for:

E9 cw   jmp rel16   relative jump, only in 32-bit

Other prefixes can probably be safely skipped.

Another question is whether anything prevents the assembler from
generating a jump near (absolute indirect), or far jump. The code above
seems to assume that we have either a short or near relative jump.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/mempolicy: return NULL if node is NUMA_NO_NODE in get_task_policy

2013-08-05 Thread Jianguo Wu
If node == NUMA_NO_NODE, pol is NULL, we should return NULL instead of
do "if (!pol->mode)" check.

Signed-off-by: Jianguo Wu 
---
 mm/mempolicy.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 4baf12e..e0e3398 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -129,6 +129,8 @@ static struct mempolicy *get_task_policy(struct task_struct 
*p)
node = numa_node_id();
if (node != NUMA_NO_NODE)
pol = _node_policy[node];
+   else
+   return NULL;
 
/* preferred_node_policy is not initialised early in boot */
if (!pol->mode)
-- 
1.8.2.2



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] Intel MIC Host Driver Changes for Virtio Devices.

2013-08-05 Thread Rusty Russell
Sudeep Dutt  writes:
> On Mon, 2013-07-29 at 10:05 +0300, Michael S. Tsirkin wrote: 
>> On Wed, Jul 24, 2013 at 08:31:34PM -0700, Sudeep Dutt wrote:
>> > From: Ashutosh Dixit 
>> > 
>> > This patch introduces the host "Virtio over PCIe" interface for
>> > Intel MIC. It allows creating user space backends on the host and
>> > instantiating virtio devices for them on the Intel MIC card. A character
>> > device per MIC is exposed with IOCTL, mmap and poll callbacks. This allows
>> > the user space backend to:
>> > (a) add/remove a virtio device via a device page.
>> > (b) map (R/O) virtio rings and device page to user space.
>> > (c) poll for availability of data.
>> > (d) copy a descriptor or entire descriptor chain to/from the card.
>> > (e) modify virtio configuration.
>> > (f) handle virtio device reset.
>> > The buffers are copied over using CPU copies for this initial patch
>> > and host initiated MIC DMA support is planned for future patches.
>> > The avail and desc virtio rings are in host memory and the used ring
>> > is in card memory to maximize writes across PCIe for performance.
>> > 
>> > Co-author: Sudeep Dutt 
>> > Signed-off-by: Ashutosh Dixit 
>> > Signed-off-by: Caz Yokoyama 
>> > Signed-off-by: Dasaratharaman Chandramouli 
>> > 
>> > Signed-off-by: Nikhil Rao 
>> > Signed-off-by: Harshavardhan R Kharche 
>> > Signed-off-by: Sudeep Dutt 
>> > Acked-by: Yaozu (Eddie) Dong 
>> > Reviewed-by: Peter P Waskiewicz Jr 
>> 
>> I decided to look at the security and ordering of ring accesses.
>> Doing a quick look, I think I found some issues, see comments below.
>> If it were possible to reuse existing ring handling code,
>> such issues would go away automatically.
>> Which brings me to the next question: have you looked at reusing
>> some code under drivers/vhost for host side processing?
>> If not, you probably should.
>> Is code in vringh.c generic enough to support your use-case,
>> and if not what exactly are the issues preventing this?
>> 
>> Thanks,
>> 
> We had implemented our custom MIC vring host access logic before the
> VRINGH infrastructure was merged to mainline in v3.10. Based on your
> feedback, we have a proof of concept implemented this week, by reusing
> the VRINGH infrastructure and it works nicely for us!

Nice!  Good suggestion MST, thanks for the plug :)

> One of our goals is to issue the  buffer transfers using DMA with future
> patches. The CPU copy in our current patches is also slightly different
> compared to VRINGH since we are copying from card buffers to user space
> and vice versa. In order to do meet these goals, we are obtaining the
> next available descriptor via vringh_getdesc_kern(..), then triggering
> the copy (CPU or eventually DMA) via a custom MIC API and then
> publishing the descriptor via vringh_complete_kern(..). Are there any
> plans of enhancing VRINGH to allow overriding the xfer mechanism in
> vringh_iov_xfer(..)? This will allow drivers with custom xfer routines
> to reuse APIs like vringh_iov_push_kern(..) and vringh_iov_pull_kern(..)
> as well. That said, the existing VRINGH infrastructure is generic enough
> for our use case as is today.

We'll have to look at exposing the internals.  vringh_iov_xfer() works
well because it's internal and inlined.  It'll be much easier to
evaluate when we're dealing with specific patches.

Cheers,
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Peter Chang
2013/8/5 Roland Dreier :
> From: Roland Dreier 
>
> There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
> leads to one process writing data into the address space of some other
> random unrelated process if the ioctl is interrupted by a signal.
> What happens is the following:
>
>  - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
>underlying SCSI command will transfer data from the SCSI device to
>the buffer provided in the ioctl)
>
>  - Before the command finishes, a signal is sent to the process waiting
>in the ioctl.  This will end up waking up the sg_ioctl() code:
>
> result = wait_event_interruptible(sfp->read_wait,
> (srp_done(sfp, srp) || sdp->detached));
>
>but neither srp_done() nor sdp->detached is true, so we end up just
>setting srp->orphan and returning to userspace:
>
> srp->orphan = 1;
> write_unlock_irq(>rq_list_lock);
> return result;  /* -ERESTARTSYS because signal hit process */
>
>At this point the original process is done with the ioctl and
>blithely goes ahead handling the signal, reissuing the ioctl, etc.

i think that an additional issue here is that part of reissuing the
ioctl is re-queueing the command. since the re-queue is at the front
of the block queue there are issues if the command is non-idempotent.

we have a local fix that gets rid of most of the orphan stuff and
re-waiting if a non-fatal signal was waiting. simpler than unmapping
but maybe we're missing some other interesting case?

\p
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] warning-elimination: android: binder

2013-08-05 Thread Andy Green
From: Andy Green 

warning-elimination: android: binder

This commit in mainline (now) causes a couple of warnings

commit 975a1ac9a9fe65d66ee1726c0db6dc58e53d232a
Author: Arve Hjønnevåg 
Date:   Tue Oct 16 15:29:53 2012 -0700

Staging: android: binder: Add some tracepoints

This patch fixes them

Signed-off-by: Andy Green 
---

diff --git a/drivers/staging/android/binder_trace.h
b/drivers/staging/android/binder_trace.h
index 82a567c..c661e37 100644
--- a/drivers/staging/android/binder_trace.h
+++ b/drivers/staging/android/binder_trace.h
@@ -159,7 +159,7 @@ TRACE_EVENT(binder_transaction_node_to_ref,
  TP_fast_assign(
  __entry->debug_id = t->debug_id;
  __entry->node_debug_id = node->debug_id;
- __entry->node_ptr = node->ptr;
+ __entry->node_ptr = (void __user *)node->ptr;
  __entry->ref_debug_id = ref->debug_id;
  __entry->ref_desc = ref->desc;
  ),
@@ -184,7 +184,7 @@ TRACE_EVENT(binder_transaction_ref_to_node,
  __entry->ref_debug_id = ref->debug_id;
  __entry->ref_desc = ref->desc;
  __entry->node_debug_id = ref->node->debug_id;
- __entry->node_ptr = ref->node->ptr;
+ __entry->node_ptr = (void __user *)ref->node->ptr;
  ),
  TP_printk("transaction=%d node=%d src_ref=%d src_desc=%d ==> dest_ptr=0x%p",
   __entry->debug_id, __entry->node_debug_id,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] Add EFI stub for ARM

2013-08-05 Thread Roy Franz
>> diff --git a/arch/arm/boot/compressed/head.S 
>> b/arch/arm/boot/compressed/head.S
>> index 75189f1..4c70b9e 100644
>> --- a/arch/arm/boot/compressed/head.S
>> +++ b/arch/arm/boot/compressed/head.S
>> @@ -122,19 +122,106 @@
>>   .arm@ Always enter in ARM state
>>  start:
>>   .type   start,#function
>> - .rept   7
>> +#ifdef CONFIG_EFI_STUB
>> + @ Magic MSDOS signature for PE/COFF + ADD opcode
>> + .word   0x62805a4d
>
> What about BE32?
>
> In that case, the instruction is a coprocessor load, that loads from a
> random address to a coprocessor that almost certainly doesn't exist.
> This will probably fault.
>
> Since BE32 is only for older platforms ( solvable, it might be sensible to make the EFI stub support depend on
> !CPU_ENDIAN_BE32.
>
>> +#else
>> + mov r0, r0
>> +#endif
>> + .rept   5
>
> You reduced the .rept count by 2, but only inserted one extra word,
> perhaps because of the extra, but buggy, insertion below.
>
>>   mov r0, r0
>>   .endr
>> ARM(  mov r0, r0  )
>> ARM(  b   1f  )
>>   THUMB(  adr r12, BSYM(1f)   )
>>   THUMB(  bx  r12 )
>
> Can't you just replace 1f with zimage_continue directly in the above
> lines, instead of the subsequent extra branch:
>
>> + THUMB(  .thumb  )
>> +1:
>> + b   zimage_continue
>
> This also avoids having two labels both called '1'.
>
> I believe the magic word is expected to be in a predictable offset,
> but the size of the preceding branch is unpredictable in Thumb
> (you could use b.w, or possibly remove the branch altogether, as
> explained above).
>
>>   .word   0x016f2818  @ Magic numbers to help the 
>> loader
>>   .word   start   @ absolute load/run zImage 
>> address
>>   .word   _edata  @ zImage end address
>> +
>> +#ifdef CONFIG_EFI_STUB
>> + @ Portions of the MSDOS file header must be at offset
>> + @ 0x3c from the start of the file.  All PE/COFF headers
>> + @ are kept contiguous for simplicity.
>> +#include "efi-header.S"
>> +
>> +efi_stub_entry:
>> + .text
>> + @ The EFI stub entry point is not at a fixed address, however
>> + @ this address must be set in the PE/COFF header.
>> + @ EFI entry point is in A32 mode, switch to T32 if configured.
>> + .arm
>> +   ARM(  mov r0, r0  )
>> +   ARM(  b   1f  )
>
> Those above two instructions are effectively just no-op padding.  Do you
> need them at all?
>
>> + THUMB(  adr r12, BSYM(1f)   )
>> + THUMB(  bx  r12 )
>>   THUMB(  .thumb  )
>>  1:
>> + @ Save lr on stack for possible return to EFI firmware.
>> + @ Don't care about fp, but need 64 bit alignment
>> + stmfd   sp!, {fp, lr}
>> +
>> + @ Save args to EFI app across got fixup call
>> + stmfd   sp!, {r0, r1}
>> + ldmfd   sp!, {r0, r1}
>> +
>> + @ allocate space on stack for return of new entry point of
>> + @ zImage, as EFI stub may copy the kernel.  Pass address
>> + @ of space in r2 - EFI stub will fill in the pointer.
>> +
>> + sub sp, #8  @ we only need 4 bytes,
>> + @ but keep stack 8 byte 
>> aligned.
>> + mov r2, sp
>> + @ Pass our actual runtime start address in pointer data
>> + adr r11, LC0@ address of LC0 at run time
>> + ldr r12, [r11, #0]  @ address of LC0 at link time
>> +
>> + sub r3, r11, r12@ calculate the delta offset
>> + str r3, [r2, #0]
>> + bl  efi_entry
>> +
>> + @ get new zImage entry address from stack, put into r3
>> + ldr r3, [sp, #0]
>> + add sp, #8  @ restore stack
>> +
>> + @ Check for error return from EFI stub (0x)
>> + ldr r1, =0x
>> + cmp r0, r1
>> + beq efi_load_fail
>> +
>> +
>> + @ Save return values of efi_entry
>> + stmfd   sp!, {r0, r3}
>> + bl  cache_clean_flush
>> + bl  cache_off
>> + ldmfd   sp!, {r0, r3}
>> +
>> + @ put DTB address in r2, it was returned by EFI entry
>> + mov r2, r0
>> + ldr r1, =0x @ DTB machine type
>> + mov r0, #0  @ r0 is 0
>> +
>> + @ Branch to (possibly) relocated zImage entry that is in r3
>> +

Re: [PATCH] printk: Fix return of braille_register_console()

2013-08-05 Thread Joe Perches
On Mon, 2013-08-05 at 22:55 -0400, Steven Rostedt wrote:
> Some of my configs I test with have CONFIG_A11Y_BRAILLE_CONSOLE set.
> When I started testing against v3.11-rc4 my console went bonkers. Using
> ktest to bisect the issue, it came down to:
> 
> commit bbeddf52a "printk: move braille console support into separate
> braille.[ch] files"
> 
> Looking into the patch I found the problem. It's with the return of
> braille_register_console(). As anything other than NULL is considered a
> failure.
> 
> But for those of us that have CONFIG_A11Y_BRAILLE_CONSOLE set but do not
> define a "brl" or "brl=" on the command line, we still may want a
> console that those with sight can still use.
> 
> Return NULL (success) if "brl" or "brl=" is not on the console line.

Thanks Steven.

> Signed-off-by: Steven Rostedt 
> 
> diff --git a/kernel/printk/braille.c b/kernel/printk/braille.c
> index b51087f..276762f 100644
> --- a/kernel/printk/braille.c
> +++ b/kernel/printk/braille.c
> @@ -19,7 +19,8 @@ char *_braille_console_setup(char **str, char **brl_options)
>   pr_err("need port name after brl=\n");
>   else
>   *((*str)++) = 0;
> - }
> + } else
> + return NULL;
>  
>   return *str;
>  }
> 
> 
> 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v14 0/6] LSM: Multiple concurrent LSMs

2013-08-05 Thread Balbir Singh
On Thu, Aug 1, 2013 at 10:51 PM, Casey Schaufler  wrote:
> On 7/31/2013 7:48 PM, Balbir Singh wrote:
>> On Thu, Jul 25, 2013 at 11:52 PM, Casey Schaufler
>>  wrote:
>>> Subject: [PATCH v14 0/6] LSM: Multiple concurrent LSMs
>>>
>>> Version 14 of this patchset is based on v3.10.
>>> It required significant change from version 13 due to changes
>>> in the audit code. It came out cleaner, especially in the changes
>>> to NetLabel. This version supports all existing LSMs running
>>> together at the same time. The combinations tested most completely
>>> are:
>>>
>>> apparmor,tomoyo,smack,yama  - Ubuntu
>>> apparmor,selinux,smack,yama - Fedora
>>>
>> Does this change the way one would develop a new LSM module? I presume
>> it does not
>
> The change that LSM developers need to be aware of is the security blob
> abstraction. Instead of using cred->security, inode->i_security and the
> like the code needs to use lsm_get_cred() and lsm_set_cred() and similar
> functions.
>

OK

>>> I have been unable to figure out how to configure SELinux on
>>> Ubuntu and TOMOYO on Fedora. That's the only reason the list
>>> does not include all five LSMs at once. Combining LSMs that
>>> use networking is tricky, but can be done. There are changes
>>> coming from AppArmor that might make it even trickier, but
>>> that's a problem for the future.
>>>
>>>
>>> Change the infrastructure for Linux Security Modules (LSM)s from a
>>> single vector of hook handlers to a list based method for handling
>>> multiple concurrent modules. All combinations of existing LSMs
>>> are supported.
>>>
>>> The "security=" boot option takes a comma separated list of LSMs,
>>> registering them in the order presented. The LSM hooks will be
>>> executed in the order registered. Hooks that return errors are
>>> not short circuited. All hooks are called even if one of the LSM
>>> hooks fails. The result returned will be that of the last LSM
>>> hook that failed.
>>>
>> This is an important design trade-off. From my perspective I think you
>> might want to revisit this, today it sounds like effective security ==
>> all hooks process and allow the operation. In this world a lack of
>> proper policy/setting can make hooks fail. I've not yet looked at the
>> code, but you might want to revisit this.
>
> The result of an LSM hook will be failure if any of the LSMs
> indicates failure. The key here is that all of the LSM hooks
> get called even if it's known that the overall result is failure.
> This is done because many LSM hooks maintain internal state and
> shortcutting can disrupt that.
>

Thanks for clarifying

Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-05 Thread Balbir Singh
On Sun, Aug 4, 2013 at 9:37 PM, Tejun Heo  wrote:
> cgroup_event is way over-designed and tries to build a generic
> flexible event mechanism into cgroup - fully customizable event
> specification for each user of the interface.  This is utterly
> unnecessary and overboard especially in the light of the planned
> unified hierarchy as there's gonna be single agent.  Simply generating

[off-topic] Has the unified hierarchy been agreed upon? I did not
follow that thread

> events at fixed points, or if that's too restrictive, configureable
> cadence or single set of configureable points should be enough.
>

Nit-pick: typo on the spelling of configurable

> Thankfully, memcg is the only user and gets to keep it.  Replacing it
> with something simpler on sane_behavior is strongly recommended.
>
> This patch moves cgroup_event and "cgroup.event_control"
> implementation to mm/memcontrol.c.  Clearing of events on cgroup
> destruction is moved from cgroup_destroy_locked() to
> mem_cgroup_css_offline(), which shouldn't make any noticeable
> difference.
>
> Note that "cgroup.event_control" will now exist only on the hierarchy
> with memcg attached to it.  While this change is visible to userland,
> it is unlikely to be noticeable as the file has never been meaningful
> outside memcg.
>

Tejun, I think the framework was designed to be flexible. Do you see
cgroup subsystems never using this?

> Signed-off-by: Tejun Heo 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Balbir Singh 
> ---
>  kernel/cgroup.c | 237 ---
>  mm/memcontrol.c | 238 
> 
>  2 files changed, 238 insertions(+), 237 deletions(-)
>
...
> +/*
> + * cgroup_event represents events which userspace want to receive.
> + */
> +struct cgroup_event {
> +   /*
> +* css which the event belongs to.
> +*/
> +   struct cgroup_subsys_state *css;
> +   /*
> +* Control file which the event associated.
> +*/
> +   struct cftype *cft;
> +   /*
> +* eventfd to signal userspace about the event.
> +*/
> +   struct eventfd_ctx *eventfd;
> +   /*
> +* Each of these stored in a list by the cgroup.
> +*/
> +   struct list_head list;
> +   /*
> +* All fields below needed to unregister event when
> +* userspace closes eventfd.
> +*/
> +   poll_table pt;
> +   wait_queue_head_t *wqh;
> +   wait_queue_t wait;
> +   struct work_struct remove;
> +};
> +
>  static void mem_cgroup_threshold(struct mem_cgroup *memcg);
>  static void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
>
> @@ -5926,6 +5956,194 @@ static void kmem_cgroup_css_offline(struct mem_cgroup 
> *memcg)
>  }
>  #endif
>
> +/*
> + * Unregister event and free resources.
> + *
> + * Gets called from workqueue.
> + */
> +static void cgroup_event_remove(struct work_struct *work)
> +{
> +   struct cgroup_event *event = container_of(work, struct cgroup_event,
> +   remove);
> +   struct cgroup_subsys_state *css = event->css;
> +   struct cgroup *cgrp = css->cgroup;
> +
> +   remove_wait_queue(event->wqh, >wait);
> +
> +   event->cft->unregister_event(css, event->cft, event->eventfd);
> +
> +   /* Notify userspace the event is going away. */
> +   eventfd_signal(event->eventfd, 1);
> +
> +   eventfd_ctx_put(event->eventfd);
> +   kfree(event);
> +   __cgroup_dput(cgrp);
> +}
> +
> +/*
> + * Gets called on POLLHUP on eventfd when user closes it.
> + *
> + * Called with wqh->lock held and interrupts disabled.
> + */
> +static int cgroup_event_wake(wait_queue_t *wait, unsigned mode,
> +   int sync, void *key)
> +{
> +   struct cgroup_event *event = container_of(wait,
> +   struct cgroup_event, wait);
> +   struct cgroup *cgrp = event->css->cgroup;
> +   unsigned long flags = (unsigned long)key;
> +
> +   if (flags & POLLHUP) {
> +   /*
> +* If the event has been detached at cgroup removal, we
> +* can simply return knowing the other side will cleanup
> +* for us.
> +*
> +* We can't race against event freeing since the other
> +* side will require wqh->lock via remove_wait_queue(),
> +* which we hold.
> +*/
> +   spin_lock(>event_list_lock);
> +   if (!list_empty(>list)) {
> +   list_del_init(>list);
> +   /*
> +* We are in atomic context, but cgroup_event_remove()
> +* may sleep, so we have to call it in workqueue.
> +*/
> +   schedule_work(>remove);
> +   }
> +   spin_unlock(>event_list_lock);
> +   }
> +
> +   return 0;
> +}
> +
> +static void 

Re: [PATCH RFC 51/51] ARM: 7805/1: mm: change max*pfn to include the physical offset of memory

2013-08-05 Thread Rob Herring
On Thu, Aug 1, 2013 at 5:25 PM, Santosh Shilimkar
 wrote:
> Most of the kernel code assumes that max*pfn is maximum pfns because
> the physical start of memory is expected to be PFN0. Since this
> assumption is not true on ARM architectures, the meaning of max*pfn
> is number of memory pages. This is done to keep drivers happy which
> are making use of of these variable to calculate the dma bounce limit
> using dma_mask.
>
> Now since we have a architecture override possibility for DMAable
> maximum pfns, lets make meaning of max*pfns as maximum pnfs on ARM
> as well.
>
> Signed-off-by: Santosh Shilimkar 
> Signed-off-by: Russell King 
> ---
>  arch/arm/include/asm/dma-mapping.h |8 
>  arch/arm/mm/init.c |   10 --
>  2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm/include/asm/dma-mapping.h 
> b/arch/arm/include/asm/dma-mapping.h
> index 5b579b9..863cd84 100644
> --- a/arch/arm/include/asm/dma-mapping.h
> +++ b/arch/arm/include/asm/dma-mapping.h
> @@ -64,6 +64,7 @@ static inline dma_addr_t virt_to_dma(struct device *dev, 
> void *addr)
>  {
> return (dma_addr_t)__virt_to_bus((unsigned long)(addr));
>  }
> +
>  #else
>  static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn)
>  {
> @@ -86,6 +87,13 @@ static inline dma_addr_t virt_to_dma(struct device *dev, 
> void *addr)
>  }
>  #endif
>
> +/* The ARM override for dma_max_pfn() */
> +static inline unsigned long dma_max_pfn(struct device *dev)
> +{
> +   return PHYS_PFN_OFFSET + dma_to_pfn(dev, *dev->dma_mask);

Do we need to handle dev == NULL case?

Rob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 2/4] spinlock: Enable x86 architecture to do lockless refcount update

2013-08-05 Thread Waiman Long
This patch enables the x86 architecture to do lockless reference
count update using the generic lockref implementation with default
parameters. Only the x86/Kconfig file needs to be changed.

Signed-off-by: Waiman Long 
---
 arch/x86/Kconfig |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index b32ebf9..79a9309 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -262,6 +262,9 @@ config ARCH_CPU_PROBE_RELEASE
 config ARCH_SUPPORTS_UPROBES
def_bool y
 
+config GENERIC_SPINLOCK_REFCOUNT
+   def_bool y
+
 source "init/Kconfig"
 source "kernel/Kconfig.freezer"
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 4/4] dcache: Enable lockless update of dentry's refcount

2013-08-05 Thread Waiman Long
The current code takes the dentry's d_lock lock whenever the refcnt
is being updated. In reality, nothing big really happens until refcnt
goes to 0 in dput(). So it is not necessary to take the lock if the
reference count won't go to 0. On the other hand, there are cases
where refcnt should not be updated or was not expected to be updated
while d_lock was acquired by another thread.

This patch changes the code in dput(), dget(), __dget() and
dget_parent() to use lockless reference count update function calls.

This patch has a particular big impact on the short workload of the
AIM7 benchmark with ramdisk filesystem. The table below show the
performance improvement to the JPM (jobs per minutes) throughput due
to this patch on an 8-socket 80-core x86-64 system with a 3.11-rc3
kernel in a 1/2/4/8 node configuration by using numactl to restrict
the execution of the workload on certain nodes.

+-++-+--+
|  Configuration  |Mean JPM|Mean JPM | % Change |
| | Rate w/o patch | Rate with patch |  |
+-+-+
| |  User Range 10 - 100|
+-+-+
| 8 nodes, HT off |1760523 | 4225737 | +140.0%  |
| 4 nodes, HT off |2020076 | 3206202 |  +58.7%  |
| 2 nodes, HT off |2391359 | 2654701 |  +11.0%  |
| 1 node , HT off |2302912 | 2302433 |0.0%  |
+-+-+
| |  User Range 200 - 1000  |
+-+-+
| 8 nodes, HT off |1078421 | 7380760 | +584.4%  |
| 4 nodes, HT off |1371040 | 4212007 | +207.2%  |
| 2 nodes, HT off |2844720 | 2783442 |   -2.2%  |
| 1 node , HT off |2433443 | 2415590 |   -0.7%  |
+-+-+
| |  User Range 1100 - 2000 |
+-+-+
| 8 nodes, HT off |1055626 | 7118985 | +574.4%  |
| 4 nodes, HT off |1352329 | 4512914 | +233.7%  |
| 2 nodes, HT off |2793037 | 2758652 |   -1.2%  |
| 1 node , HT off |2458125 | 2445069 |   -0.5%  |
+-++-+--+

With 4 nodes and above, there are significant performance improvement
with this patch. With only 1 or 2 nodes, the performance is very close.
Because of variability of the AIM7 benchmark, a few percent difference
may not indicate a real performance gain or loss.

A perf call-graph report of the short workload at 1500 users
without the patch on the same 8-node machine indicates that about
79% of the workload's total time were spent in the _raw_spin_lock()
function. Almost all of which can be attributed to the following 2
kernel functions:
 1. dget_parent (49.92%)
 2. dput (49.84%)

The relevant perf report lines are:
+  78.76%  reaim  [kernel.kallsyms]   [k] _raw_spin_lock
+   0.05%  reaim  [kernel.kallsyms]   [k] dput
+   0.01%  reaim  [kernel.kallsyms]   [k] dget_parent

With this patch installed, the new perf report lines are:
+  19.66%  reaim  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave
+   2.46%  reaim  [kernel.kallsyms]   [k] _raw_spin_lock
+   2.23%  reaim  [kernel.kallsyms]   [k] lockref_get_not_zero
+   0.50%  reaim  [kernel.kallsyms]   [k] dput
+   0.32%  reaim  [kernel.kallsyms]   [k] lockref_put_or_lock
+   0.30%  reaim  [kernel.kallsyms]   [k] lockref_get
+   0.01%  reaim  [kernel.kallsyms]   [k] dget_parent

-   2.46%  reaim  [kernel.kallsyms]   [k] _raw_spin_lock
   - _raw_spin_lock
  + 23.89% sys_getcwd
  + 23.60% d_path
  + 8.01% prepend_path
  + 5.18% complete_walk
  + 4.21% __rcu_process_callbacks
  + 3.08% inet_twsk_schedule
  + 2.36% do_anonymous_page
  + 2.24% unlazy_walk
  + 2.02% sem_lock
  + 1.82% process_backlog
  + 1.62% selinux_inode_free_security
  + 1.54% task_rq_lock
  + 1.45% unix_dgram_sendmsg
  + 1.18% enqueue_to_backlog
  + 1.06% unix_stream_sendmsg
  + 0.94% tcp_v4_rcv
  + 0.87% unix_create1
  + 0.71% scheduler_tick
  + 0.60% unix_release_sock
  + 0.59% do_wp_page
  + 0.59% unix_stream_recvmsg
  + 0.58% handle_pte_fault
  + 0.57% __do_fault
  + 0.53% unix_peer_get

The dput() and dget_parent() functions didn't show up in the
_raw_spin_lock callers at all.

This impact of this patch on other AIM7 workloads were much more
modest. Besides short, the other AIM7 workload that showed consistent
improvement is the high_systime workload. For the other workloads,
the changes were so minor that they are no significant difference
with 

[PATCH v7 3/4] dcache: replace d_lock/d_count by d_lockcnt

2013-08-05 Thread Waiman Long
This patch replaces the d_lock and d_count fields of the dentry
data structure by the combined d_lockcnt structure. A d_lock macro
is defined to remap the old d_lock name to the new d_lockcnt.lock
name. This is needed as a lot of files use the d_lock spinlock.

Read accesses to d_count are replaced by the d_count() helper
function. Write accesses to d_count are replaced by the new
d_lockcnt.refcnt name. Other than that, there is no other functional
change in this patch.

The offsets of the new d_lockcnt field are at byte 72 and 88 for
32-bit and 64-bit SMP systems respectively. In both cases, they are
8-byte aligned and their combination into a single 8-byte word will
not introduce a hole that increase the size of the dentry structure.

Signed-off-by: Waiman Long 
---
 fs/dcache.c|   54 
 fs/namei.c |6 ++--
 include/linux/dcache.h |   15 
 3 files changed, 40 insertions(+), 35 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 87bdb53..3adb6aa 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -54,7 +54,7 @@
  *   - d_flags
  *   - d_name
  *   - d_lru
- *   - d_count
+ *   - d_lockcnt.refcnt
  *   - d_unhashed()
  *   - d_parent and d_subdirs
  *   - childrens' d_child and d_parent
@@ -229,7 +229,7 @@ static void __d_free(struct rcu_head *head)
  */
 static void d_free(struct dentry *dentry)
 {
-   BUG_ON(dentry->d_count);
+   BUG_ON(d_count(dentry));
this_cpu_dec(nr_dentry);
if (dentry->d_op && dentry->d_op->d_release)
dentry->d_op->d_release(dentry);
@@ -467,7 +467,7 @@ relock:
}
 
if (ref)
-   dentry->d_count--;
+   dentry->d_lockcnt.refcnt--;
/*
 * inform the fs via d_prune that this dentry is about to be
 * unhashed and destroyed.
@@ -513,12 +513,12 @@ void dput(struct dentry *dentry)
return;
 
 repeat:
-   if (dentry->d_count == 1)
+   if (d_count(dentry) == 1)
might_sleep();
spin_lock(>d_lock);
-   BUG_ON(!dentry->d_count);
-   if (dentry->d_count > 1) {
-   dentry->d_count--;
+   BUG_ON(!d_count(dentry));
+   if (d_count(dentry) > 1) {
+   dentry->d_lockcnt.refcnt--;
spin_unlock(>d_lock);
return;
}
@@ -535,7 +535,7 @@ repeat:
dentry->d_flags |= DCACHE_REFERENCED;
dentry_lru_add(dentry);
 
-   dentry->d_count--;
+   dentry->d_lockcnt.refcnt--;
spin_unlock(>d_lock);
return;
 
@@ -590,7 +590,7 @@ int d_invalidate(struct dentry * dentry)
 * We also need to leave mountpoints alone,
 * directory or not.
 */
-   if (dentry->d_count > 1 && dentry->d_inode) {
+   if (d_count(dentry) > 1 && dentry->d_inode) {
if (S_ISDIR(dentry->d_inode->i_mode) || d_mountpoint(dentry)) {
spin_unlock(>d_lock);
return -EBUSY;
@@ -606,7 +606,7 @@ EXPORT_SYMBOL(d_invalidate);
 /* This must be called with d_lock held */
 static inline void __dget_dlock(struct dentry *dentry)
 {
-   dentry->d_count++;
+   dentry->d_lockcnt.refcnt++;
 }
 
 static inline void __dget(struct dentry *dentry)
@@ -634,8 +634,8 @@ repeat:
goto repeat;
}
rcu_read_unlock();
-   BUG_ON(!ret->d_count);
-   ret->d_count++;
+   BUG_ON(!d_count(ret));
+   ret->d_lockcnt.refcnt++;
spin_unlock(>d_lock);
return ret;
 }
@@ -718,7 +718,7 @@ restart:
spin_lock(>i_lock);
hlist_for_each_entry(dentry, >i_dentry, d_alias) {
spin_lock(>d_lock);
-   if (!dentry->d_count) {
+   if (!d_count(dentry)) {
__dget_dlock(dentry);
__d_drop(dentry);
spin_unlock(>d_lock);
@@ -734,7 +734,7 @@ EXPORT_SYMBOL(d_prune_aliases);
 
 /*
  * Try to throw away a dentry - free the inode, dput the parent.
- * Requires dentry->d_lock is held, and dentry->d_count == 0.
+ * Requires dentry->d_lock is held, and dentry->d_lockcnt.refcnt == 0.
  * Releases dentry->d_lock.
  *
  * This may fail if locks cannot be acquired no problem, just try again.
@@ -764,8 +764,8 @@ static void try_prune_one_dentry(struct dentry *dentry)
dentry = parent;
while (dentry) {
spin_lock(>d_lock);
-   if (dentry->d_count > 1) {
-   dentry->d_count--;
+   if (d_count(dentry) > 1) {
+   dentry->d_lockcnt.refcnt--;
spin_unlock(>d_lock);
return;
}
@@ -793,7 +793,7 @@ static void shrink_dentry_list(struct list_head *list)
 * the LRU because of laziness during lookup.  Do not free
 * it - just keep it off the LRU list.
 */
-   if (dentry->d_count) {
+   

[PATCH v7 0/4] Lockless update of reference count protected by spinlock

2013-08-05 Thread Waiman Long
v6->v7:
 - Substantially reduce the number of patches from 14 to 4 because a
   lot of the minor filesystem related changes had been merged to
   v3.11-rc1.
 - Remove architecture specific customization (LOCKREF_WAIT_SHIFT &
   LOCKREF_RETRY_COUNT).
 - Tune single-thread performance of lockref_put/get to within 10%
   of old lock->update->unlock code.

v5->v6:
 - Add a new GENERIC_SPINLOCK_REFCOUNT config parameter for using the
   generic implementation.
 - Add two parameters LOCKREF_WAIT_SHIFT and LOCKREF_RETRY_COUNT which
   can be specified differently for each architecture.
 - Update various spinlock_refcount.* files to incorporate review
   comments.
 - Replace reference of d_refcount() macro in Lustre filesystem code in
   the staging tree to use the new d_count() helper function.

v4->v5:
 - Add a d_count() helper for readonly access of reference count and
   change all references to d_count outside of dcache.c, dcache.h
   and namei.c to use d_count().

v3->v4:
 - Replace helper function access to d_lock and d_count by using
   macros to redefine the old d_lock name to the spinlock and new
   d_refcount name to the reference count. This greatly reduces the
   size of this patchset from 25 to 12 and make it easier to review.

v2->v3:
 - Completely revamp the packaging by adding a new lockref data
   structure that combines the spinlock with the reference
   count. Helper functions are also added to manipulate the new data
   structure. That results in modifying over 50 files, but the changes
   were trivial in most of them.
 - Change initial spinlock wait to use a timeout.
 - Force 64-bit alignment of the spinlock & reference count structure.
 - Add a new way to use the combo by using a new union and helper
   functions.

v1->v2:
 - Add one more layer of indirection to LOCK_WITH_REFCOUNT macro.
 - Add __LINUX_SPINLOCK_REFCOUNT_H protection to spinlock_refcount.h.
 - Add some generic get/put macros into spinlock_refcount.h.

This patchset supports a generic mechanism to atomically update
a reference count that is protected by a spinlock without actually
acquiring the lock itself. If the update doesn't succeeed, the caller
will have to acquire the lock and update the reference count in the
the old way.  This will help in situation where there is a lot of
spinlock contention because of frequent reference count update.

The d_lock and d_count fields of the struct dentry in dcache.h was
modified to use the new lockref data structure and the d_lock name
is now a macro to the actual spinlock.

This patch set causes significant performance improvement in the
short workload of the AIM7 benchmark on a 8-socket x86-64 machine
with 80 cores.

Thank to Thomas Gleixner, Andi Kleen and Linus for their valuable
input in shaping this patchset.

Signed-off-by: Waiman Long 

Waiman Long (4):
  spinlock: A new lockref structure for lockless update of refcount
  spinlock: Enable x86 architecture to do lockless refcount update
  dcache: replace d_lock/d_count by d_lockcnt
  dcache: Enable lockless update of dentry's refcount

 arch/x86/Kconfig|3 +
 fs/dcache.c |   78 +++--
 fs/namei.c  |6 +-
 include/asm-generic/spinlock_refcount.h |   46 +++
 include/linux/dcache.h  |   22 ++--
 include/linux/spinlock_refcount.h   |  126 
 kernel/Kconfig.locks|   15 +++
 lib/Makefile|2 +
 lib/spinlock_refcount.c |  198 +++
 9 files changed, 449 insertions(+), 47 deletions(-)
 create mode 100644 include/asm-generic/spinlock_refcount.h
 create mode 100644 include/linux/spinlock_refcount.h
 create mode 100644 lib/spinlock_refcount.c

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v7 1/4] spinlock: A new lockref structure for lockless update of refcount

2013-08-05 Thread Waiman Long
This patch introduces a new set of spinlock_refcount.h header files to
be included by kernel codes that want to do a faster lockless update
of reference count protected by a spinlock.

The new lockref structure consists of just the spinlock and the
reference count data. Helper functions are defined in the new
 header file to access the content of
the new structure. There is a generic structure defined for all
architecture, but each architecture can also optionally define its
own structure and use its own helper functions.

Three new config parameters are introduced:
1. SPINLOCK_REFCOUNT
2. GENERIC_SPINLOCK_REFCOUNT
2. ARCH_SPINLOCK_REFCOUNT

The first one is defined in the kernel/Kconfig.locks which is used
to enable or disable the faster lockless reference count update
optimization. The second and third one have to be defined in each of
the architecture's Kconfig file to enable the optimization for that
architecture. Therefore, each architecture has to opt-in for this
optimization or it won't get it. This allows each architecture plenty
of time to test it out before deciding to use it or replace it with
a better architecture specific solution. The architecture should set
only GENERIC_SPINLOCK_REFCOUNT to use the generic implementation
without customization. By setting only ARCH_SPINLOCK_REFCOUNT,
the architecture will have to provide its own implementation.

This optimization won't work for non-SMP system or when spinlock
debugging is turned on. As a result, it is turned off each any of
them is true. It also won't work for full preempt-RT and so should
be turned off in this case.

To maximize the chance of doing lockless atomic update, the new code
will wait until the lock is free before trying to do the update.
The code will also attempt to do lockless atomic update a few times
before falling back to the old code path of acquiring a lock before
doing the update.

The table below shows the average JPM (jobs/minute) number (out of
3 runs) of the AIM7's short workload at 1500 users for different
configurations on an 8-socket 80-core DL980 with HT off with kernel
based on 3.11-rc3.

Configuration JPM
- ---
Wait till lock free, 1 update attempt   5899907
Wait till lock free, 2 update attempts  6534958
Wait till lock free, 3 update attempts  6868170
Wait till lock free, 4 update attempts  6905332
No wait,  2 update attempts 1091273
No wait,  4 update attempts 1281867
No wait,  8 update attempts 5095203
No wait, 16 update attempts 6392709
No wait, 32 update attempts 6438080

The "no wait, 8 update attempts" test showed high variability in the
results.  One run can have 6M JPM whereas the other one is only 2M
JPM, for example. The "wait till lock free" tests, on the other hand,
are much more stable in their throughput numbers.

For this initial version, the code will wait until the lock is free
with 4 update attempts.

To evaluate the performance difference between doing a reference count
update using the old way (lock->update->unlock) and the new lockref
functions in the uncontended case, a 256K loop was run on a 2.4Ghz
Westmere x86-64 CPU.  The following table shows the average time
(in ns) for a single update operation (including the looping and
timing overhead):

Update Type Time (ns)
--- -
lock->update->unlock  14.7
lockref_get/lockref_put   16.0

The new lockref* functions are about 10% slower than when there is
no contention. Since reference count update is usually a very small
part of a typical workload, the actual performance impact of this
change is negligible when there is no contention.

Signed-off-by: Waiman Long 
---
 include/asm-generic/spinlock_refcount.h |   46 +++
 include/linux/spinlock_refcount.h   |  126 
 kernel/Kconfig.locks|   15 +++
 lib/Makefile|2 +
 lib/spinlock_refcount.c |  198 +++
 5 files changed, 387 insertions(+), 0 deletions(-)
 create mode 100644 include/asm-generic/spinlock_refcount.h
 create mode 100644 include/linux/spinlock_refcount.h
 create mode 100644 lib/spinlock_refcount.c

diff --git a/include/asm-generic/spinlock_refcount.h 
b/include/asm-generic/spinlock_refcount.h
new file mode 100644
index 000..d3a4119
--- /dev/null
+++ b/include/asm-generic/spinlock_refcount.h
@@ -0,0 +1,46 @@
+/*
+ * Spinlock with reference count combo
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; 

acpi_bus_trim does not detach devices in post order

2013-08-05 Thread Yasuaki Ishimatsu

I acked the following commit but I hit a problem by the commit.
So I report it.

commit cecdb193c8d91a42d9489d00618cc3dfff92e55a
Author: Rafael J. Wysocki 
Date:   Tue Jan 15 13:24:02 2013 +0100

ACPI / scan: Change the implementation of acpi_bus_trim()

Before applying the commit, acpi_bus_trim() detachs devices in post order.

When I hot add memory devices and processor devices by container device
in my x86 box, memory devices are added first and processor devices are added
second. So I expect that processor devices are removed first and memory
devices are removed second when I remove them. And before applying the
commit, acpi_bus_trim() did so.

But after appling the commit, acpi_bus_trim() does not detach devices in
post order. So when I remove them, memory devices are removed first and
processor devices are removed second.

By this, I hit a problem.

In Linux on x86 arch, NUMA node is depend on memory devices. So new NUMA
node is created at memory hot adding. Thus when I hot add memory devices and
processor devices, we must hot add memory device first. Otherwise, processor
devices are not set to correct NUMA node number.

And Linux expects that when removing them, processor devices are removed
first before removing memory devices. But acpi_bus_trim() does not do so.
By this, NUMA node is not cleared in my x86 box when hot removing memory device
and processor devices. When removing memory devices, NUMA node is cleared.
But if there are processor devices related with the NUMA node, NUMA node is
not be cleared at memory hot removing.

So when I remove them, NUMA node's sysfs file remained as follows:

# ls /sys/devices/system/node/node1/
compact  cpumapmeminfo   power   subsystem  vmstat
cpulist  distance  numastat  scan_unevictable_pages  uevent

CPU and memory sysfs files are removed correctly. But node1 sysfs file
remained.

Thanks,
Yasuaki Ishimatsu






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Steven Rostedt
On Mon, 2013-08-05 at 22:26 -0400, Jason Baron wrote:

> I think if the 'cold' attribute on the default disabled static_key 
> branch moved the text completely out-of-line, it would satisfy your 
> requirement here?
> 
> If you like this approach, perhaps we can make something like this work 
> within gcc. As its already supported, but doesn't quite go far enough 
> for our purposes.

It may not be too bad to use.

> 
> Also, if we go down this path, it means the 2-byte jump sequence is 
> probably not going to be too useful.

Don't count us out yet :-)


static inline bool arch_static_branch(struct static_key *key)
{
asm goto("1:"
[...]
: : "i" (key) : : l_yes);
return false;
l_yes:
goto __l_yes;
__l_yes: __attribute__((cold));
return false;
}

Or put that logic in the caller of arch_static_branch(). Basically, we
may be able to do a short jump to the place that will do a long jump to
the real work.

I'll have to play with this and see what gcc does with the output.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] printk: Fix return of braille_register_console()

2013-08-05 Thread Steven Rostedt
Some of my configs I test with have CONFIG_A11Y_BRAILLE_CONSOLE set.
When I started testing against v3.11-rc4 my console went bonkers. Using
ktest to bisect the issue, it came down to:

commit bbeddf52a "printk: move braille console support into separate
braille.[ch] files"

Looking into the patch I found the problem. It's with the return of
braille_register_console(). As anything other than NULL is considered a
failure.

But for those of us that have CONFIG_A11Y_BRAILLE_CONSOLE set but do not
define a "brl" or "brl=" on the command line, we still may want a
console that those with sight can still use.

Return NULL (success) if "brl" or "brl=" is not on the console line.

Signed-off-by: Steven Rostedt 

diff --git a/kernel/printk/braille.c b/kernel/printk/braille.c
index b51087f..276762f 100644
--- a/kernel/printk/braille.c
+++ b/kernel/printk/braille.c
@@ -19,7 +19,8 @@ char *_braille_console_setup(char **str, char **brl_options)
pr_err("need port name after brl=\n");
else
*((*str)++) = 0;
-   }
+   } else
+   return NULL;
 
return *str;
 }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.

2013-08-05 Thread Tang Chen
At early time, memblock will reserve some memory for the kernel,
such as the kernel code and data segments, initrd file, and so on,
which means the kernel resides in these memory regions.

Even if these memory regions are hotpluggable, we should not
mark them as hotpluggable. Otherwise the kernel won't have enough
memory to boot.

This patch finds out which memory regions the kernel resides in,
and skip them when finding all hotpluggable memory regions.

Signed-off-by: Tang Chen 
Reviewed-by: Zhang Yanfei 
---
 mm/memory_hotplug.c |   45 +
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ef9ccf8..10a30ef 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -31,6 +31,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -93,6 +94,40 @@ static void release_memory_resource(struct resource *res)
 
 #ifdef CONFIG_ACPI_NUMA
 /**
+ * kernel_resides_in_range - Check if kernel resides in a memory region.
+ * @base: The base address of the memory region.
+ * @length: The length of the memory region.
+ *
+ * This function is used at early time. It iterates memblock.reserved and check
+ * if the kernel has used any memory in [@base, @base + @length).
+ *
+ * Return true if the kernel resides in the memory region, false otherwise.
+ */
+static bool __init kernel_resides_in_region(phys_addr_t base, u64 length)
+{
+   int i;
+   phys_addr_t start, end;
+   struct memblock_region *region;
+   struct memblock_type *reserved = 
+
+   for (i = 0; i < reserved->cnt; i++) {
+   region = >regions[i];
+
+   if (region->flags != MEMBLOCK_HOTPLUG)
+   continue;
+
+   start = region->base;
+   end = region->base + region->size;
+   if (end <= base || start >= base + length)
+   continue;
+
+   return true;
+   }
+
+   return false;
+}
+
+/**
  * find_hotpluggable_memory - Find out hotpluggable memory from ACPI SRAT.
  *
  * This function did the following:
@@ -129,6 +164,16 @@ void __init find_hotpluggable_memory(void)
 
while (ACPI_SUCCESS(acpi_hotplug_mem_affinity(srat_vaddr, ,
  , ))) {
+   /*
+* At early time, memblock will reserve some memory for the
+* kernel, such as the kernel code and data segments, initrd
+* file, and so on,which means the kernel resides in these
+* memory regions. These regions should not be hotpluggable.
+* So do not mark them as hotpluggable.
+*/
+   if (kernel_resides_in_region(base, size))
+   continue;
+
/* Will mark hotpluggable memory regions here */
}
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC V11 0/18] Paravirtualized ticket spinlocks

2013-08-05 Thread Raghavendra K T

On 08/06/2013 04:20 AM, H. Peter Anvin wrote:

So, having read through the entire thread I *think* this is what the
status of this patchset is:

1. Patches 1-17 are noncontroversial, Raghavendra is going to send an
update split into two patchsets;


Yes.  Only one patch would be common to both host and guest which will
be sent as a separate patch.
I 'll rebase first patchset to -next and second patchset to kvm tree as
needed.


2. There are at least two versions of patch 15; I think the "PATCH
RESEND RFC" is the right one.


True.


3. Patch 18 is controversial but there are performance numbers; these
should be integrated in the patch description.


Current plan is to drop for patch 18 for now.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpumask: fix cpumask leak in partition_sched_domains

2013-08-05 Thread Xiaotian Feng
On Sat, Jul 27, 2013 at 3:26 PM, Xiaotian Feng  wrote:
> If doms_new is NULL, partition_sched_domains() will reset ndoms_cur
> to 0, and free old sched domains with free_sched_domains(doms_cur, ndoms_cur).
> As ndoms_cur is 0, the cpumask will not be freed.
>
> Signed-off-by: Xiaotian Feng 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: linux-kernel@vger.kernel.org

Any comments? Cc'ed Rusty.

> ---
>  kernel/sched/core.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b7c32cb..3d6c57b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6184,8 +6184,9 @@ match1:
> ;
> }
>
> +   n= ndoms_cur;
> if (doms_new == NULL) {
> -   ndoms_cur = 0;
> +   n = 0;
> doms_new = _doms;
> cpumask_andnot(doms_new[0], cpu_active_mask, 
> cpu_isolated_map);
> WARN_ON_ONCE(dattr_new);
> @@ -6193,7 +6194,7 @@ match1:
>
> /* Build new domains */
> for (i = 0; i < ndoms_new; i++) {
> -   for (j = 0; j < ndoms_cur && !new_topology; j++) {
> +   for (j = 0; j < n && !new_topology; j++) {
> if (cpumask_equal(doms_new[i], doms_cur[j])
> && dattrs_equal(dattr_new, i, dattr_cur, j))
> goto match2;
> --
> 1.7.9.6 (Apple Git-31.1)
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 RESEND 13/18] x86, numa, mem_hotplug: Skip all the regions the kernel resides in.

2013-08-05 Thread Tang Chen

On 08/05/2013 10:52 PM, Tejun Heo wrote:

On Mon, Aug 05, 2013 at 02:22:47PM +0800, Tang Chen wrote:

I have resent the v2 patch-set. Would you please give some more
comments about the memblock and x86 booting code modification ?


Patch 13 still seems corrupt.  Is it a problem on my side maybe?
Nope, gmane raw message is corrupt too.

  http://article.gmane.org/gmane.linux.kernel.mm/104549/raw

Can you please verify your mail setup?  It's not very nice to repeat
the same problem.


Hi tj,

I'm sorry but seeing from lkml, it is OK. And the patch was formatted
by git and sent by git send-email.

  https://lkml.org/lkml/2013/8/2/135

I'll redo and resend this patch again.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] gcc feature request: Moving blocks into sections

2013-08-05 Thread Jason Baron

On 08/05/2013 04:35 PM, Richard Henderson wrote:

On 08/05/2013 09:57 AM, Jason Baron wrote:

On 08/05/2013 03:40 PM, Marek Polacek wrote:

On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote:

On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds
 wrote:

Ugh. I can see the attraction of your section thing for that case, I
just get the feeling that we should be able to do better somehow.

Hmm.. Quite frankly, Steven, for your use case I think you actually
want the C goto *labels* associated with a section. Which sounds like
it might be a cleaner syntax than making it about the basic block
anyway.

FWIW, we also support hot/cold attributes for labels, thus e.g.

if (bar ())
  goto A;
/* ... */
A: __attribute__((cold))
/* ... */

I don't know whether that might be useful for what you want or not though...

 Marek


It certainly would be.

That was how I wanted to the 'static_key' stuff to work, but unfortunately the
last time I tried it, it didn't move the text out-of-line any further than it
was already doing. Would that be expected? The change for us, if it worked
would be quite simple. Something like:

It is expected.  One must use -freorder-blocks-and-partition, and use real
profile feedback to get blocks moved completely out-of-line.

Whether that's a sensible default or not is debatable.



Hi Steve,

I think if the 'cold' attribute on the default disabled static_key 
branch moved the text completely out-of-line, it would satisfy your 
requirement here?


If you like this approach, perhaps we can make something like this work 
within gcc. As its already supported, but doesn't quite go far enough 
for our purposes.


Also, if we go down this path, it means the 2-byte jump sequence is 
probably not going to be too useful.


Thanks,

-Jason




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-05 Thread Li Zefan
On 2013/8/6 10:02, Li Zefan wrote:
>>  static struct cftype mem_cgroup_files[] = {
>>  {
>>  .name = "usage_in_bytes",
>> @@ -5973,6 +6192,12 @@ static struct cftype mem_cgroup_files[] = {
>>  .read_u64 = mem_cgroup_hierarchy_read,
>>  },
>>  {
>> +.name = "cgroup.event_control",
>> +.write_string = cgroup_write_event_control,
>> +.flags = CFTYPE_NO_PREFIX,
>> +.mode = S_IWUGO,
>> +},
> 
> One of the misdesign of cgroup eventfd is, cgroup.event_control is
> totally redunant...
> 

ok. write_string() is needed to accept arguments and pass them to
the event register function. still not good.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [3.10.4] NFS locking panic, plus persisting NFS shutdown panic from 3.9.*

2013-08-05 Thread Myklebust, Trond
On Mon, 2013-08-05 at 14:33 -0400, Jeff Layton wrote:
> On Mon, 5 Aug 2013 18:18:03 +
> "Myklebust, Trond"  wrote:
> 
> > On Mon, 2013-08-05 at 13:37 -0400, Jeff Layton wrote:
> > > On Mon, 5 Aug 2013 16:15:01 +
> > > "Myklebust, Trond"  wrote:
> > > 
> > > > From 3c50ba80105464a28d456d9a1e0f1d81d4af92a8 Mon Sep 17 00:00:00 2001
> > > > From: Trond Myklebust 
> > > > Date: Mon, 5 Aug 2013 12:06:12 -0400
> > > > Subject: [PATCH] LOCKD: Don't call utsname()->nodename from
> > > >  nlmclnt_setlockargs
> > > > MIME-Version: 1.0
> > > > Content-Type: text/plain; charset=UTF-8
> > > > Content-Transfer-Encoding: 8bit
> > > > 
> > > > Firstly, nlmclnt_setlockargs can be called from a reclaimer thread, in
> > > > which case we're in entirely the wrong namespace.
> > > > Secondly, commit 8aac62706adaaf0fab02c4327761561c8bda9448 (move
> > > > exit_task_namespaces() outside of exit_notify()) now means that
> > > > exit_task_work() is called after exit_task_namespaces(), which
> > > > triggers an Oops when we're freeing up the locks.
> > > > 
> > > > Signed-off-by: Trond Myklebust 
> > > > Cc: Toralf Förster 
> > > > Cc: Oleg Nesterov 
> > > > Cc: Nix 
> > > > Cc: Jeff Layton 
> > > > ---
> > > >  fs/lockd/clntproc.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
> > > > index 9760ecb..acd3947 100644
> > > > --- a/fs/lockd/clntproc.c
> > > > +++ b/fs/lockd/clntproc.c
> > > > @@ -125,14 +125,15 @@ static void nlmclnt_setlockargs(struct nlm_rqst 
> > > > *req, struct file_lock *fl)
> > > >  {
> > > > struct nlm_args *argp = >a_args;
> > > > struct nlm_lock *lock = >lock;
> > > > +   char *nodename = req->a_host->h_rpcclnt->cl_nodename;
> > > >  
> > > > nlmclnt_next_cookie(>cookie);
> > > > memcpy(>fh, NFS_FH(file_inode(fl->fl_file)), 
> > > > sizeof(struct nfs_fh));
> > > > -   lock->caller  = utsname()->nodename;
> > > > +   lock->caller  = nodename;
> > > > lock->oh.data = req->a_owner;
> > > > lock->oh.len  = snprintf(req->a_owner, sizeof(req->a_owner), 
> > > > "%u@%s",
> > > > (unsigned 
> > > > int)fl->fl_u.nfs_fl.owner->pid,
> > > > -   utsname()->nodename);
> > > > +   nodename);
> > > > lock->svid = fl->fl_u.nfs_fl.owner->pid;
> > > > lock->fl.fl_start = fl->fl_start;
> > > > lock->fl.fl_end = fl->fl_end;
> > > 
> > > Looks good to me...
> > > 
> > > Reviewed-by: Jeff Layton 
> > > 
> > > Trond, any thoughts on the other oops that Nix posted? The issue there
> > > seems to be that we're trying to do the pathwalk to the rpcbind unix
> > > socket from exit_task_work(), but that's happening after we've already
> > > called exit_fs().
> > > 
> > > The trivial answer seems to be to simply call exit_task_work() before
> > > exit_fs() there, but it seems like we ought to be doing the upcall to
> > > rpcbind in a mount namespace from which we know we can reach the
> > > socket...
> > 
> > Isn't it enough to just do the same thing as we did for gss proxy? i.e.
> > set the RPC_CLNT_CREATE_NO_IDLE_TIMEOUT flag.
> > 
> > See attachment.
> 
> Yeah, that looks like a reasonable thing to do...
> 
> OTOH, Is there any other way for a unix socket to end up disconnected
> other than if we were to close it? Maybe if rpcbind stopped, the socket
> unlinked and recreated and then started again?
> 
> If so then you still could potentially end up in this situation even if
> you didn't autoclose it.

True. How about something like the following instead. Note the change to
the original patch...
-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
trond.mykleb...@netapp.com
www.netapp.com
From 00326ed6442c66021cd4b5e19e80f3e2027d5d42 Mon Sep 17 00:00:00 2001
From: Trond Myklebust 
Date: Mon, 5 Aug 2013 14:10:43 -0400
Subject: [PATCH v2 1/2] SUNRPC: Don't auto-disconnect from the local rpcbind
 socket

There is no need for the kernel to time out the AF_LOCAL connection to
the rpcbind socket, and doing so is problematic because when it is
time to reconnect, our process may no longer be using the same mount
namespace.

Reported-by: Nix 
Signed-off-by: Trond Myklebust 
Cc: Jeff Layton 
Cc: sta...@vger.kernel.org # 3.9.x
---
 net/sunrpc/rpcb_clnt.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 3df764d..b0f7232 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -238,6 +238,14 @@ static int rpcb_create_local_unix(struct net *net)
 		.program	= _program,
 		.version	= RPCBVERS_2,
 		.authflavor	= RPC_AUTH_NULL,
+		/*
+		 * We turn off the idle timeout to prevent the kernel
+		 * from automatically disconnecting the socket.
+		 * Otherwise, we'd have to cache the mount namespace
+		 * of the caller and somehow pass that to the socket
+		 * reconnect code.
+		 */
+		.flags		= 

Re: Cannot hot remove a memory device (patch, updated)

2013-08-05 Thread Yasuaki Ishimatsu

(2013/08/06 9:15), Rafael J. Wysocki wrote:

On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:

On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
   :

Can you please test the appended patch?  I tested it somewhat, but since the
greatest number of physical nodes per ACPI device object I can get on my test
machines is 2 (and even that after hacking the kernel somewhat), that was kind
of unconclusive.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device

The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused phyiscal dependent node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.

Signed-off-by: Rafael J. Wysocki 


I like the change. Much better :-)

Acked-by: Toshi Kani 


However, it introduces a bug in acpi_unbind_one(), because the size of the name
array in there has to be increased too.  Updated patch follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device

The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused dependent phyiscal node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.

Signed-off-by: Rafael J. Wysocki 


Reviewed-by: Yasuaki Ishimatsu 
Tested-by: Yasuaki Ishimatsu 

I confirmed that I can add and remove a memory device correctly.

Thanks,
Yasuaki Ishimatsu


---
  drivers/acpi/glue.c |   34 +++---
  include/acpi/acpi_bus.h |8 ++--
  2 files changed, 21 insertions(+), 21 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -31,6 +31,7 @@ static LIST_HEAD(bus_type_list);
  static DECLARE_RWSEM(bus_type_sem);

  #define PHYSICAL_NODE_STRING "physical_node"
+#define PHYSICAL_NODE_NAME_SIZE (sizeof(PHYSICAL_NODE_STRING) + 10)

  int register_acpi_bus_type(struct acpi_bus_type *type)
  {
@@ -112,7 +113,9 @@ int acpi_bind_one(struct device *dev, ac
struct acpi_device *acpi_dev;
acpi_status status;
struct acpi_device_physical_node *physical_node, *pn;
-   char physical_node_name[sizeof(PHYSICAL_NODE_STRING) + 2];
+   char physical_node_name[PHYSICAL_NODE_NAME_SIZE];
+   struct list_head *physnode_list;
+   unsigned int node_id;
int retval = -EINVAL;

if (ACPI_HANDLE(dev)) {
@@ -139,8 +142,14 @@ int acpi_bind_one(struct device *dev, ac

mutex_lock(_dev->physical_node_lock);

-   /* Sanity check. */
-   list_for_each_entry(pn, _dev->physical_node_list, node)
+   /*
+* Keep the list sorted by node_id so that the IDs of removed nodes can
+* be recycled.
+*/
+   physnode_list = _dev->physical_node_list;
+   node_id = 0;
+   list_for_each_entry(pn, _dev->physical_node_list, node) {
+   /* Sanity check. */
if (pn->dev == dev) {
dev_warn(dev, "Already associated with ACPI node\n");
if (ACPI_HANDLE(dev) == handle)
@@ -148,19 +157,15 @@ int acpi_bind_one(struct device *dev, ac

goto out_free;
}
-
-   /* allocate physical node id according to physical_node_id_bitmap */
-   physical_node->node_id =
-   find_first_zero_bit(acpi_dev->physical_node_id_bitmap,
-   ACPI_MAX_PHYSICAL_NODE);
-   if (physical_node->node_id >= ACPI_MAX_PHYSICAL_NODE) {
-   retval = -ENOSPC;
-   goto out_free;
+   if (pn->node_id == node_id) {
+   physnode_list = >node;
+   node_id++;
+   }
}

-   set_bit(physical_node->node_id, acpi_dev->physical_node_id_bitmap);
+   physical_node->node_id = node_id;
physical_node->dev = dev;
-   list_add_tail(_node->node, _dev->physical_node_list);
+   

Re: [PATCH v2 3/5] cgroup, memcg: move cgroup_event implementation to memcg

2013-08-05 Thread Li Zefan
>  static struct cftype mem_cgroup_files[] = {
>   {
>   .name = "usage_in_bytes",
> @@ -5973,6 +6192,12 @@ static struct cftype mem_cgroup_files[] = {
>   .read_u64 = mem_cgroup_hierarchy_read,
>   },
>   {
> + .name = "cgroup.event_control",
> + .write_string = cgroup_write_event_control,
> + .flags = CFTYPE_NO_PREFIX,
> + .mode = S_IWUGO,
> + },

One of the misdesign of cgroup eventfd is, cgroup.event_control is
totally redunant...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 9/8] hugetlb: add pmd_huge_support() to migrate only pmd-based hugepage

2013-08-05 Thread Aneesh Kumar K.V
Naoya Horiguchi  writes:

> This patch is motivated by the discussion with Aneesh about "extend
> hugepage migration" patchset.
>   http://thread.gmane.org/gmane.linux.kernel.mm/103933/focus=104391
> I'll append this to the patchset in the next post, but before that
> I want this patch to be reviewed (I don't want to repeat posting the
> whole set for just minor changes.)
>
> Any comments?
>
> Thanks,
> Naoya Horiguchi
> ---
> From: Naoya Horiguchi 
> Date: Mon, 5 Aug 2013 13:33:02 -0400
> Subject: [PATCH] hugetlb: add pmd_huge_support() to migrate only pmd-based
>  hugepage
>
> Currently hugepage migration works well only for pmd-based hugepages,
> because core routines of hugepage migration use pmd specific internal
> functions like huge_pte_offset(). So we should not enable the migration
> of other levels of hugepages until we are ready for it.

I guess huge_pte_offset may not be the right reason because archs do
implement huge_pte_offsets even if they are not pmd-based hugepages

pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr)
{
/* Only called for hugetlbfs pages, hence can ignore THP */
return find_linux_pte_or_hugepte(mm->pgd, addr, NULL);
}

>
> Some users of hugepage migration (mbind, move_pages, and migrate_pages)
> do page table walk and check pud/pmd_huge() there, so they are safe.
> But the other users (softoffline and memory hotremove) don't do this,
> so they can try to migrate unexpected types of hugepages.
>
> To prevent this, we introduce an architecture dependent check of whether
> hugepage are implemented on a pmd basis or not. It returns 0 if pmd_huge()
> returns always 0, and 1 otherwise.
>

so why not #define pmd_huge_support pmd_huge or use pmd_huge directly ?

> Signed-off-by: Naoya Horiguchi 
> ---
>  arch/arm/mm/hugetlbpage.c |  5 +
>  arch/arm64/mm/hugetlbpage.c   |  5 +
>  arch/ia64/mm/hugetlbpage.c|  5 +
>  arch/metag/mm/hugetlbpage.c   |  5 +
>  arch/mips/mm/hugetlbpage.c|  5 +
>  arch/powerpc/mm/hugetlbpage.c | 10 ++
>  arch/s390/mm/hugetlbpage.c|  5 +
>  arch/sh/mm/hugetlbpage.c  |  5 +
>  arch/sparc/mm/hugetlbpage.c   |  5 +
>  arch/tile/mm/hugetlbpage.c|  5 +
>  arch/x86/mm/hugetlbpage.c |  8 
>  include/linux/hugetlb.h   |  2 ++
>  mm/migrate.c  | 11 +++
>  13 files changed, 76 insertions(+)
>
> diff --git a/arch/arm/mm/hugetlbpage.c b/arch/arm/mm/hugetlbpage.c
> index 3d1e4a2..3f3b6a7 100644
> --- a/arch/arm/mm/hugetlbpage.c
> +++ b/arch/arm/mm/hugetlbpage.c
> @@ -99,3 +99,8 @@ int pmd_huge(pmd_t pmd)
>  {
>   return pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>  }
> +
> +int pmd_huge_support(void)
> +{
> + return 1;
> +}

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 8/8] prepare to remove /proc/sys/vm/hugepages_treat_as_movable

2013-08-05 Thread Aneesh Kumar K.V
Naoya Horiguchi  writes:


>> 
>> Considering that we have architectures that won't support migrating
>> explicit hugepages with this patch series, is it ok to use
>> GFP_HIGHUSER_MOVABLE for hugepage allocation ?
>
> Originally this parameter was introduced to make hugepage pool on 
> ZONE_MOVABLE.
> The benefit is that we can extend the hugepage pool more easily,
> because external fragmentation less likely happens than other zone type
> by rearranging fragmented pages with page migration/reclaim.
>
> So I think using ZONE_MOVABLE for hugepage allocation by default makes sense
> even on the architectures which don't support hugepage migration.

But allocating hugepages from ZONE_MOVABLE means we have pages in that
zone which we can't migrate. Doesn't that impact other features like
hotplug ?


-aneesh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] MAINTAINERS: drivers/power: add entry for SmartReflex AVS drivers

2013-08-05 Thread Anton Vorontsov
On Mon, Aug 05, 2013 at 09:29:37AM -0700, Kevin Hilman wrote:
> The SmartReflex AVS driver evolved out of the OMAP kernel and now
> lives under drivers/power/avs.
> 
> I've historically been maintainer of this but Nishanth Menon is doing
> most of the heavy lifting now.  Add us both as co-maintainers.
> 
...
> +M:   Kevin Hilman 
> +M:   Nishanth Menon 
> +S:   Maintained
> +F:   drivers/power/avs/smartreflex.c
> +F:  include/linux/power/smartreflex.h

This line is borken (whitespaces).

> +L:   linux...@vger.kernel.org
> +
>  PNP SUPPORT
>  M:   Rafael J. Wysocki 
>  M:   Bjorn Helgaas 
> -- 
> 1.8.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH trivial] UAPI: Kbuild: add/modify comments for "uapi/Kbuild" and "uapi/linux/Kbuild"

2013-08-05 Thread Chen Gang
"include/uapi/" is the whole Linux kernel API, it is important enough
to get more global explanations by comments.

In "include/uapi/Kbuild", "Makefile..." and "non-arch..." comments are
meaningless for current 'Kbuild', so delete them.

And add more explanations for "include/uapi/" in "include/uapi/Kbuild",
also add more explanations for "include/uapi/linux/" in "include/uapi
/linux/Kbuild".


Signed-off-by: Chen Gang 
---
 include/uapi/Kbuild   |5 ++---
 include/uapi/linux/Kbuild |2 ++
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/uapi/Kbuild b/include/uapi/Kbuild
index 81d2106..c682891 100644
--- a/include/uapi/Kbuild
+++ b/include/uapi/Kbuild
@@ -1,7 +1,6 @@
 # UAPI Header export list
-# Top-level Makefile calls into asm-$(ARCH)
-# List only non-arch directories below
-
+# Except "linux/", UAPI means Universal API.
+# For "linux/", UAPI means User API which can be used by user mode.
 
 header-y += asm-generic/
 header-y += linux/
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 997f9f2..0025e07 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -1,4 +1,6 @@
 # UAPI Header export list
+# UAPI is User API which can be used by user mode.
+
 header-y += byteorder/
 header-y += can/
 header-y += caif/
-- 
1.7.7.6


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] perf session: Export queue_event function

2013-08-05 Thread David Ahern
Taking a lesson from perf-trace and bringing in control of event
processing to perf-kvm-stat-live: parse the sample to get access the
time leaving just the need to queue it to the ordered samples list.
For that the queue_event function needs to be exported.

Unexport perf_session__process_event.

Signed-off-by: David Ahern 
Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Runzhen Wang 
Cc: Xiao Guangrong 
---
 tools/perf/util/session.c |   10 +-
 tools/perf/util/session.h |6 ++
 2 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index b5ebd47..dedaeb2 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -643,7 +643,7 @@ static void __queue_event(struct sample_queue *new, struct 
perf_session *s)
 
 #define MAX_SAMPLE_BUFFER  (64 * 1024 / sizeof(struct sample_queue))
 
-static int perf_session_queue_event(struct perf_session *s, union perf_event 
*event,
+int perf_session_queue_event(struct perf_session *s, union perf_event *event,
struct perf_sample *sample, u64 file_offset)
 {
struct ordered_samples *os = >ordered_samples;
@@ -1049,10 +1049,10 @@ static void event_swap(union perf_event *event, bool 
sample_id_all)
swap(event, sample_id_all);
 }
 
-int perf_session__process_event(struct perf_session *session,
-   union perf_event *event,
-   struct perf_tool *tool,
-   u64 file_offset)
+static int perf_session__process_event(struct perf_session *session,
+  union perf_event *event,
+  struct perf_tool *tool,
+  u64 file_offset)
 {
struct perf_sample sample;
int ret;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 9818fc2..8bed17e 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -56,10 +56,8 @@ int __perf_session__process_events(struct perf_session *self,
 int perf_session__process_events(struct perf_session *self,
 struct perf_tool *tool);
 
-int perf_session__process_event(struct perf_session *session,
-   union perf_event *event,
-   struct perf_tool *tool,
-   u64 file_offset);
+int perf_session_queue_event(struct perf_session *s, union perf_event *event,
+struct perf_sample *sample, u64 file_offset);
 
 void perf_tool__fill_defaults(struct perf_tool *tool);
 
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] perf kvm stat report: Add option to analyze specific VM

2013-08-05 Thread David Ahern
Add an option to analyze a specific VM within a data file. This
allows the collection of kvm events for all VMs and then analyze
data for each VM (or set of VMs) individually.

Signed-off-by: David Ahern 
Cc: Arnaldo Carvalho de Melo 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Xiao Guangrong 
Cc: Runzhen Wang 
---
 tools/perf/builtin-kvm.c |   38 --
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 5edbd3b..16a672f 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -9,7 +9,7 @@
 #include "util/thread.h"
 #include "util/header.h"
 #include "util/session.h"
-
+#include "util/intlist.h"
 #include "util/parse-options.h"
 #include "util/trace-event.h"
 #include "util/debug.h"
@@ -108,6 +108,9 @@ struct perf_kvm_stat {
u64 lost_events;
u64 threshold;
 
+   const char *pid_str;
+   struct intlist *pid_list;
+
struct rb_root result;
 
int timerfd;
@@ -790,16 +793,29 @@ static int process_lost_event(struct perf_tool *tool,
return 0;
 }
 
+static bool skip_sample(struct perf_kvm_stat *kvm,
+   struct perf_sample *sample)
+{
+   if (kvm->pid_list && intlist__find(kvm->pid_list, sample->pid) == NULL)
+   return true;
+
+   return false;
+}
+
 static int process_sample_event(struct perf_tool *tool,
union perf_event *event,
struct perf_sample *sample,
struct perf_evsel *evsel,
struct machine *machine)
 {
-   struct thread *thread = machine__findnew_thread(machine, sample->tid);
+   struct thread *thread;
struct perf_kvm_stat *kvm = container_of(tool, struct perf_kvm_stat,
 tool);
 
+   if (skip_sample(kvm, sample))
+   return 0;
+
+   thread = machine__findnew_thread(machine, sample->tid);
if (thread == NULL) {
pr_debug("problem processing %d event, skipping it.\n",
event->header.type);
@@ -1222,11 +1238,27 @@ static int read_events(struct perf_kvm_stat *kvm)
return perf_session__process_events(kvm->session, >tool);
 }
 
+static int parse_target_str(struct perf_kvm_stat *kvm)
+{
+   if (kvm->pid_str) {
+   kvm->pid_list = intlist__new(kvm->pid_str);
+   if (kvm->pid_list == NULL) {
+   pr_err("Error parsing process id string\n");
+   return -EINVAL;
+   }
+   }
+
+   return 0;
+}
+
 static int kvm_events_report_vcpu(struct perf_kvm_stat *kvm)
 {
int ret = -EINVAL;
int vcpu = kvm->trace_vcpu;
 
+   if (parse_target_str(kvm) != 0)
+   goto exit;
+
if (!verify_vcpu(vcpu))
goto exit;
 
@@ -1313,6 +1345,8 @@ kvm_events_report(struct perf_kvm_stat *kvm, int argc, 
const char **argv)
OPT_STRING('k', "key", >sort_key, "sort-key",
"key for sorting: sample(sort by samples number)"
" time (sort by avg time)"),
+   OPT_STRING('p', "pid", >pid_str, "pid",
+  "analyze events only for given process id(s)"),
OPT_END()
};
 
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] perf kvm: add min and max stats to display - v2

2013-08-05 Thread David Ahern
Add max and min times for exit events.

v2: address Xia's comment to use get_event function for pulling max and
min from stats struct similar to mean and count

Signed-off-by: David Ahern 
Cc: Arnaldo Carvalho de Melo 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Xiao Guangrong 
Cc: Runzhen Wang 
---
 tools/perf/builtin-kvm.c |   21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 29bfca7..b6595e9 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -337,14 +337,19 @@ static void clear_events_cache_stats(struct list_head 
*kvm_events_cache)
struct list_head *head;
struct kvm_event *event;
unsigned int i;
+   int j;
 
for (i = 0; i < EVENTS_CACHE_SIZE; i++) {
head = _events_cache[i];
list_for_each_entry(event, head, hash_entry) {
/* reset stats for event */
-   memset(>total, 0, sizeof(event->total));
-   memset(event->vcpu, 0,
-  event->max_vcpu * sizeof(*event->vcpu));
+   event->total.time = 0;
+   init_stats(>total.stats);
+
+   for (j = 0; j < event->max_vcpu; ++j) {
+   event->vcpu[j].time = 0;
+   init_stats(>vcpu[j].stats);
+   }
}
}
 }
@@ -583,6 +588,8 @@ static int compare_kvm_event_ ## func(struct kvm_event 
*one,\
 GET_EVENT_KEY(time, time);
 COMPARE_EVENT_KEY(count, stats.n);
 COMPARE_EVENT_KEY(mean, stats.mean);
+GET_EVENT_KEY(max, stats.max);
+GET_EVENT_KEY(min, stats.min);
 
 #define DEF_SORT_NAME_KEY(name, compare_key)   \
{ #name, compare_kvm_event_ ## compare_key }
@@ -727,20 +734,26 @@ static void print_result(struct perf_kvm_stat *kvm)
pr_info("%9s ", "Samples%");
 
pr_info("%9s ", "Time%");
+   pr_info("%10s ", "Min Time");
+   pr_info("%10s ", "Max Time");
pr_info("%16s ", "Avg time");
pr_info("\n\n");
 
while ((event = pop_from_result(>result))) {
-   u64 ecount, etime;
+   u64 ecount, etime, max, min;
 
ecount = get_event_count(event, vcpu);
etime = get_event_time(event, vcpu);
+   max = get_event_max(event, vcpu);
+   min = get_event_min(event, vcpu);
 
kvm->events_ops->decode_key(kvm, >key, decode);
pr_info("%20s ", decode);
pr_info("%10llu ", (unsigned long long)ecount);
pr_info("%8.2f%% ", (double)ecount / kvm->total_count * 100);
pr_info("%8.2f%% ", (double)etime / kvm->total_time * 100);
+   pr_info("%8" PRIu64 "us ", min / 1000);
+   pr_info("%8" PRIu64 "us ", max / 1000);
pr_info("%9.2fus ( +-%7.2f%% )", (double)etime / ecount/1e3,
kvm_event_rel_stddev(vcpu, event));
pr_info("\n");
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] perf kvm: option to print events that exceed a threshold

2013-08-05 Thread David Ahern
This is useful to spot high latency blips. It is normal for HLT
reasons to have long exit times, so strip those from the threshold
check.

Signed-off-by: David Ahern 
Cc: Arnaldo Carvalho de Melo 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Xiao Guangrong 
Cc: Runzhen Wang 
---
 tools/perf/builtin-kvm.c |   25 +
 tools/perf/perf.h|3 +++
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index b6595e9..5edbd3b 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -106,6 +106,7 @@ struct perf_kvm_stat {
u64 total_time;
u64 total_count;
u64 lost_events;
+   u64 threshold;
 
struct rb_root result;
 
@@ -470,7 +471,7 @@ static bool update_kvm_event(struct kvm_event *event, int 
vcpu_id,
 static bool handle_end_event(struct perf_kvm_stat *kvm,
 struct vcpu_event_record *vcpu_record,
 struct event_key *key,
-u64 timestamp)
+struct perf_sample *sample)
 {
struct kvm_event *event;
u64 time_begin, time_diff;
@@ -507,12 +508,24 @@ static bool handle_end_event(struct perf_kvm_stat *kvm,
vcpu_record->start_time = 0;
 
/* seems to happen once in a while during live mode */
-   if (timestamp < time_begin) {
+   if (sample->time < time_begin) {
pr_debug("End time before begin time; skipping event.\n");
return true;
}
 
-   time_diff = timestamp - time_begin;
+   time_diff = sample->time - time_begin;
+
+   if (kvm->threshold && time_diff > kvm->threshold) {
+   char decode[32];
+
+   kvm->events_ops->decode_key(kvm, >key, decode);
+   if (strcmp(decode, "HLT")) {
+   pr_info("%" PRIu64 " VM %d, vcpu %d: %s event took %" 
PRIu64 "usec\n",
+sample->time, sample->pid, 
vcpu_record->vcpu_id,
+decode, time_diff/1000);
+   }
+   }
+
return update_kvm_event(event, vcpu, time_diff);
 }
 
@@ -559,7 +572,7 @@ static bool handle_kvm_event(struct perf_kvm_stat *kvm,
return handle_begin_event(kvm, vcpu_record, , sample->time);
 
if (kvm->events_ops->is_end_event(evsel, sample, ))
-   return handle_end_event(kvm, vcpu_record, , sample->time);
+   return handle_end_event(kvm, vcpu_record, , sample);
 
return true;
 }
@@ -1395,6 +1408,8 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
OPT_STRING('k', "key", >sort_key, "sort-key",
"key for sorting: sample(sort by samples number)"
" time (sort by avg time)"),
+   OPT_U64('T', "threshold", >threshold,
+   "show events other than HALT that take longer than 
threshold usecs"),
OPT_END()
};
const char * const live_usage[] = {
@@ -1433,6 +1448,8 @@ static int kvm_events_live(struct perf_kvm_stat *kvm,
usage_with_options(live_usage, live_options);
}
 
+   kvm->threshold *= NSEC_PER_USEC;   /* convert usec to nsec */
+
/*
 * target related setups
 */
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 32bd102..cf20187 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -125,6 +125,9 @@
 #ifndef NSEC_PER_SEC
 # define NSEC_PER_SEC  10ULL
 #endif
+#ifndef NSEC_PER_USEC
+# define NSEC_PER_USEC 1000ULL
+#endif
 
 static inline unsigned long long rdclock(void)
 {
-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] perf kvm: add live mode - v4

2013-08-05 Thread David Ahern
perf kvm stat currently requires back to back record and report commands
to see stats. e.g,.

  perf kvm stat record -p $pid -- sleep 1
  perf kvm stat report

This is inconvenvient for on box monitoring of a VM. This patch introduces
a 'live' mode that in effect combines the record plus report into one
command. e.g., to monitor a single VM:

  perf kvm stat live -p $pid

or all VMs:

  perf kvm stat live

Same stats options for the record+report path work with the live mode.
Display rate defaults to 1 second and can be changed using the -d option.

v4:
- address comments from Xiao -- verify_vcpu check should not look at
  processors on line for the host, prune configurable options.
- set attr->{mmap,comm,task} to 0 - don't need task events so trim events
  we have to deal with
- better control of time for queue event flushing to reduce frequency of
  "Timestamp below last timeslice flush" failures.

v3:
updated to use existing tracepoint parsing code

v2:
removed ABSTIME arg from timerfd_settime as mentioned by Namhyung
only call perf_kvm__handle_stdin when poll returns activity.

Signed-off-by: David Ahern 
Cc: Arnaldo Carvalho de Melo 
Cc: Ingo Molnar 
Cc: Frederic Weisbecker 
Cc: Peter Zijlstra 
Cc: Jiri Olsa 
Cc: Namhyung Kim 
Cc: Xiao Guangrong 
Cc: Runzhen Wang 
---
 tools/perf/builtin-kvm.c |  659 --
 1 file changed, 633 insertions(+), 26 deletions(-)

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 7d14a3a..29bfca7 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -2,6 +2,7 @@
 #include "perf.h"
 
 #include "util/evsel.h"
+#include "util/evlist.h"
 #include "util/util.h"
 #include "util/cache.h"
 #include "util/symbol.h"
@@ -15,9 +16,12 @@
 #include 
 #include "util/tool.h"
 #include "util/stat.h"
+#include "util/top.h"
 
 #include 
+#include 
 
+#include 
 #include 
 #include 
 #include 
@@ -82,6 +86,8 @@ struct exit_reasons_table {
 
 struct perf_kvm_stat {
struct perf_tooltool;
+   struct perf_record_opts opts;
+   struct perf_evlist  *evlist;
struct perf_session *session;
 
const char *file_name;
@@ -96,10 +102,16 @@ struct perf_kvm_stat {
struct kvm_events_ops *events_ops;
key_cmp_fun compare;
struct list_head kvm_events_cache[EVENTS_CACHE_SIZE];
+
u64 total_time;
u64 total_count;
+   u64 lost_events;
 
struct rb_root result;
+
+   int timerfd;
+   unsigned int display_time;
+   bool live;
 };
 
 
@@ -320,6 +332,23 @@ static void init_kvm_event_record(struct perf_kvm_stat 
*kvm)
INIT_LIST_HEAD(>kvm_events_cache[i]);
 }
 
+static void clear_events_cache_stats(struct list_head *kvm_events_cache)
+{
+   struct list_head *head;
+   struct kvm_event *event;
+   unsigned int i;
+
+   for (i = 0; i < EVENTS_CACHE_SIZE; i++) {
+   head = _events_cache[i];
+   list_for_each_entry(event, head, hash_entry) {
+   /* reset stats for event */
+   memset(>total, 0, sizeof(event->total));
+   memset(event->vcpu, 0,
+  event->max_vcpu * sizeof(*event->vcpu));
+   }
+   }
+}
+
 static int kvm_events_hash_fn(u64 key)
 {
return key & (EVENTS_CACHE_SIZE - 1);
@@ -472,7 +501,11 @@ static bool handle_end_event(struct perf_kvm_stat *kvm,
vcpu_record->last_event = NULL;
vcpu_record->start_time = 0;
 
-   BUG_ON(timestamp < time_begin);
+   /* seems to happen once in a while during live mode */
+   if (timestamp < time_begin) {
+   pr_debug("End time before begin time; skipping event.\n");
+   return true;
+   }
 
time_diff = timestamp - time_begin;
return update_kvm_event(event, vcpu, time_diff);
@@ -639,24 +672,56 @@ static struct kvm_event *pop_from_result(struct rb_root 
*result)
return container_of(node, struct kvm_event, rb);
 }
 
-static void print_vcpu_info(int vcpu)
+static void print_vcpu_info(struct perf_kvm_stat *kvm)
 {
+   int vcpu = kvm->trace_vcpu;
+
pr_info("Analyze events for ");
 
+   if (kvm->live) {
+   if (kvm->opts.target.system_wide)
+   pr_info("all VMs, ");
+   else if (kvm->opts.target.pid)
+   pr_info("pid(s) %s, ", kvm->opts.target.pid);
+   else
+   pr_info("dazed and confused on what is monitored, ");
+   }
+
if (vcpu == -1)
pr_info("all VCPUs:\n\n");
else
pr_info("VCPU %d:\n\n", vcpu);
 }
 
+static void show_timeofday(void)
+{
+   char date[64];
+   struct timeval tv;
+   struct tm ltime;
+
+   gettimeofday(, NULL);
+   if (localtime_r(_sec, )) {
+   strftime(date, sizeof(date), "%H:%M:%S", );
+   pr_info("%s.%06ld", date, tv.tv_usec);
+   } else
+  

[PATCH 0/5] perf kvm live - latest round take 4

2013-08-05 Thread David Ahern
Hi Arnaldo:

This round addresses all of Xiao's comments. It also includes a small
change in the live mode introduction to improve ordered samples
processing. For that a change in perf-session functions is needed.

David Ahern (5):
  perf session: Export queue_event function
  perf kvm: add live mode - v4
  perf kvm: add min and max stats to display - v2
  perf kvm: option to print events that exceed a threshold
  perf kvm stat report: Add option to analyze specific VM

 tools/perf/builtin-kvm.c  |  725 +++--
 tools/perf/perf.h |3 +
 tools/perf/util/session.c |   10 +-
 tools/perf/util/session.h |6 +-
 4 files changed, 708 insertions(+), 36 deletions(-)

-- 
1.7.10.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Enable arm_global_timer for Zynq brakes boot

2013-08-05 Thread Sören Brinkmann
Hi Daniel,

On Thu, Aug 01, 2013 at 07:48:04PM +0200, Daniel Lezcano wrote:
> On 08/01/2013 07:43 PM, Sören Brinkmann wrote:
> > On Thu, Aug 01, 2013 at 07:29:12PM +0200, Daniel Lezcano wrote:
> >> On 08/01/2013 01:38 AM, Sören Brinkmann wrote:
> >>> On Thu, Aug 01, 2013 at 01:01:27AM +0200, Daniel Lezcano wrote:
>  On 08/01/2013 12:18 AM, Sören Brinkmann wrote:
> > On Wed, Jul 31, 2013 at 11:08:51PM +0200, Daniel Lezcano wrote:
> >> On 07/31/2013 10:58 PM, Sören Brinkmann wrote:
> >>> On Wed, Jul 31, 2013 at 10:49:06PM +0200, Daniel Lezcano wrote:
>  On 07/31/2013 12:34 AM, Sören Brinkmann wrote:
> > On Tue, Jul 30, 2013 at 10:47:15AM +0200, Daniel Lezcano wrote:
> >> On 07/30/2013 02:03 AM, Sören Brinkmann wrote:
> >>> Hi Daniel,
> >>>
> >>> On Mon, Jul 29, 2013 at 02:51:49PM +0200, Daniel Lezcano wrote:
> >>> (snip)
> 
>  the CPUIDLE_FLAG_TIMER_STOP flag tells the cpuidle framework the 
>  local
>  timer will be stopped when entering to the idle state. In this 
>  case, the
>  cpuidle framework will call clockevents_notify(ENTER) and 
>  switches to a
>  broadcast timer and will call clockevents_notify(EXIT) when 
>  exiting the
>  idle state, switching the local timer back in use.
> >>>
> >>> I've been thinking about this, trying to understand how this 
> >>> makes my
> >>> boot attempts on Zynq hang. IIUC, the wrongly provided TIMER_STOP 
> >>> flag
> >>> would make the timer core switch to a broadcast device even 
> >>> though it
> >>> wouldn't be necessary. But shouldn't it still work? It sounds 
> >>> like we do
> >>> something useless, but nothing wrong in a sense that it should 
> >>> result in
> >>> breakage. I guess I'm missing something obvious. This timer 
> >>> system will
> >>> always remain a mystery to me.
> >>>
> >>> Actually this more or less leads to the question: What is this
> >>> 'broadcast timer'. I guess that is some clockevent device which is
> >>> common to all cores? (that would be the cadence_ttc for Zynq). Is 
> >>> the
> >>> hang pointing to some issue with that driver?
> >>
> >> If you look at the /proc/timer_list, which timer is used for 
> >> broadcasting ?
> >
> > So, the correct run results (full output attached).
> >
> > The vanilla kernel uses the twd timers as local timers and the TTC 
> > as
> > broadcast device:
> > Tick Device: mode: 1
> >  
> > Broadcast device  
> > Clock Event Device: ttc_clockevent
> >
> > When I remove the offending CPUIDLE flag and add the DT fragment to
> > enable the global timer, the twd timers are still used as local 
> > timers
> > and the broadcast device is the global timer:
> > Tick Device: mode: 1
> >  
> > Broadcast device
> >  
> > Clock Event Device: arm_global_timer
> >
> > Again, since boot hangs in the actually broken case, I don't see 
> > way to
> > obtain this information for that case.
> 
>  Can't you use the maxcpus=1 option to ensure the system to boot up ?
> >>>
> >>> Right, that works. I forgot about that option after you mentioned, 
> >>> that
> >>> it is most likely not that useful.
> >>>
> >>> Anyway, this are those sysfs files with an unmodified cpuidle driver 
> >>> and
> >>> the gt enabled and having maxcpus=1 set.
> >>>
> >>> /proc/timer_list:
> >>>   Tick Device: mode: 1
> >>>   Broadcast device
> >>>   Clock Event Device: arm_global_timer
> >>>max_delta_ns:   12884902005
> >>>min_delta_ns:   1000
> >>>mult:   715827876
> >>>shift:  31
> >>>mode:   3
> >>
> >> Here the mode is 3 (CLOCK_EVT_MODE_ONESHOT)
> >>
> >> The previous timer_list output you gave me when removing the offending
> >> cpuidle flag, it was 1 (CLOCK_EVT_MODE_SHUTDOWN).
> >>
> >> Is it possible you try to get this output again right after onlining 
> >> the
> >> cpu1 in order to check if the broadcast device switches to SHUTDOWN ?
> >
> > How do I do that? I tried to online CPU1 after booting with maxcpus=1
> > and that didn't end well:
> > # echo 1 > online && cat /proc/timer_list 
> 
>  Hmm, I was hoping to have a small delay before the kernel hangs 

RE: [PATCH v2 3/3] dma: Add Freescale eDMA engine driver support

2013-08-05 Thread Lu Jingchang-B35083
> -Original Message-
> From: Vinod Koul [mailto:vinod.k...@intel.com]
> Sent: Tuesday, August 06, 2013 12:35 AM
> To: Lu Jingchang-B35083
> Cc: d...@fb.com; shawn@linaro.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; Wang Huan-B18965; Li Xiaochun-
> B41219
> Subject: Re: [PATCH v2 3/3] dma: Add Freescale eDMA engine driver support
> > +
> > +static void fsl_edma_free_desc(struct virt_dma_desc *vdesc) {
> > +   struct fsl_edma_desc *fsl_desc;
> > +   int i;
> > +
> > +   fsl_desc = to_fsl_edma_desc(vdesc);
> > +   for (i = 0; i < fsl_desc->n_tcds; i++)
> > +   dma_pool_free(fsl_desc->echan->tcd_pool,
> > +   fsl_desc->tcd[i].vtcd,
> > +   fsl_desc->tcd[i].ptcd);
> > +   kfree(fsl_desc);
> should this be called with lock held or not?
[Lu Jingchang-B35083] 
The desc list to be freed is got with lock held, and the free for each desc is 
independent, and the lock is not needed. Thanks.






Best Regards,
Jingchang


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RESEND] ARM: dts: Add USB host node for Exynos4

2013-08-05 Thread Jingoo Han
On Tuesday, August 06, 2013 3:09 AM, Dongjin Kim wrote:
> 
> This patch adds EHCI and OHCI host device nodes for Exynos4.
> 
> CC: Jingoo Han 

Acked-by: Jingoo Han 


> Signed-off-by: Dongjin Kim 
> ---
>  arch/arm/boot/dts/exynos4.dtsi |   18 ++
>  1 file changed, 18 insertions(+)


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] checkpatch: fix some whitespace issues caused by --fix

2013-08-05 Thread Joe Perches
On Mon, 2013-08-05 at 14:08 +0300, Phil Carmody wrote:
> Lines with incorrect spacing around an operator, such as:
>   bystander, correct,incorrect
> would get "fixed" to
>   bystander,correct, incorrect
> as the correct argument as well as the incorrectly-spaced operator
> were both being trimmed. The correct argument only needs to be
> right trimmed.

Thanks for the patch, but I think it needs a different fix.

Even after your patch the --fix option still makes a mess
of several code spacing issues.

I'll work on it and propose something soonish.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Roland Dreier
From: Roland Dreier 

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

 - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
   underlying SCSI command will transfer data from the SCSI device to
   the buffer provided in the ioctl)

 - Before the command finishes, a signal is sent to the process waiting
   in the ioctl.  This will end up waking up the sg_ioctl() code:

result = wait_event_interruptible(sfp->read_wait,
(srp_done(sfp, srp) || sdp->detached));

   but neither srp_done() nor sdp->detached is true, so we end up just
   setting srp->orphan and returning to userspace:

srp->orphan = 1;
write_unlock_irq(>rq_list_lock);
return result;  /* -ERESTARTSYS because signal hit process */

   At this point the original process is done with the ioctl and
   blithely goes ahead handling the signal, reissuing the ioctl, etc.

 - Eventually, the SCSI command issued by the first ioctl finishes and
   ends up in sg_rq_end_io().  At the end of that function, we run through:

write_lock_irqsave(>rq_list_lock, iflags);
if (unlikely(srp->orphan)) {
if (sfp->keep_orphan)
srp->sg_io_owned = 0;
else
done = 0;
}
srp->done = done;
write_unlock_irqrestore(>rq_list_lock, iflags);

if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
 * packet.
 */
wake_up_interruptible(>read_wait);
kill_fasync(>async_qp, SIGPOLL, POLL_IN);
kref_put(>f_ref, sg_remove_sfp);
} else {
INIT_WORK(>ew.work, sg_rq_end_io_usercontext);
schedule_work(>ew.work);
}

   Since srp->orphan *is* set, we set done to 0 (assuming the
   userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
   ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
   to run in a workqueue.

 - In workqueue context we go through sg_rq_end_io_usercontext() ->
   sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
   bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

   The key point here is that we are doing copy_to_user() on a
   workqueue -- that is, we're on a kernel thread with current->mm
   equal to whatever random previous user process was scheduled before
   this kernel thread.  So we end up copying whatever data the SCSI
   command returned to the virtual address of the buffer passed into
   the original ioctl, but it's quite likely we do this copying into a
   different address space!

As suggested by James Bottomley ,
add a check for current->mm (which is NULL if we're on a kernel thread
without a real userspace address space) in bio_uncopy_user(), and skip
the copy if we're on a kernel thread.

There's no reason that I can think of for any caller of bio_uncopy_user()
to want to do copying on a kernel thread with a random active userspace
address space.

Huge thanks to Costa Sapuntzakis  for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier 
Cc: 
---
 fs/bio.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 94bbc04..c5eae72 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
bio_vec *iovecs,
 int bio_uncopy_user(struct bio *bio)
 {
struct bio_map_data *bmd = bio->bi_private;
-   int ret = 0;
+   struct bio_vec *bvec;
+   int ret = 0, i;
 
-   if (!bio_flagged(bio, BIO_NULL_MAPPED))
-   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
-bmd->nr_sgvecs, bio_data_dir(bio) == READ,
-0, bmd->is_our_pages);
+   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
+   /*
+* if we're in a workqueue, the request is orphaned, so
+* don't copy into a random user address space, just free.
+*/
+   if (current->mm)
+   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
+bmd->nr_sgvecs, bio_data_dir(bio) 
== READ,
+0, bmd->is_our_pages);
+   else if (bmd->is_our_pages)
+   bio_for_each_segment_all(bvec, bio, i)
+   __free_page(bvec->bv_page);
+   }
bio_free_map_data(bmd);
bio_put(bio);
return ret;
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More 

Re: [PATCH] trivial: adjust code alignment

2013-08-05 Thread Joe Perches
On Mon, 2013-08-05 at 19:30 +0300, Dan Carpenter wrote:
> On Mon, Aug 05, 2013 at 09:17:07AM -0700, Joe Perches wrote:
> > ov7670_read via i2c_transfer can return a positive # too.
> > Perhaps all of these should be individually tested for "< 0".
> You're misreading something.  ov7670_read_i2c() only returns zero
> and negative error codes.

Yup, right, thanks, I skimmed over the

if (ret >= 0) {
...
ret = 0;
}

bit in ov7670_read_i2c

though I think this function via the i2c_transfer
can return 0 messages transferred as "success".


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] bridge: don't try to update timers in case of broken MLD queries

2013-08-05 Thread Linus Lüssing
On Mon, Aug 05, 2013 at 03:42:22PM -0700, Stephen Hemminger wrote:
> On Tue,  6 Aug 2013 00:32:05 +0200
> Linus Lüssing  wrote:
> 
> > Currently we are reading an uninitialized value for the max_delay
> > variable when snooping an MLD query message of invalid length and would
> > update our timers with that.
> > 
> > Fixing this by simply ignoring such broken MLD queries (just like we do
> > for IGMP already).
> > 
> > This is a regression introduced by:
> > "bridge: disable snooping if there is no querier" (b00589af3b04)
> > 
> > Reported-by: Paul Bolle 
> > Signed-off-by: Linus Lüssing 
> > ---
> >  net/bridge/br_multicast.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
> > index 61c5e81..08e576a 100644
> > --- a/net/bridge/br_multicast.c
> > +++ b/net/bridge/br_multicast.c
> > @@ -1195,7 +1195,7 @@ static int br_ip6_multicast_query(struct net_bridge 
> > *br,
> > max_delay = msecs_to_jiffies(ntohs(mld->mld_maxdelay));
> > if (max_delay)
> > group = >mld_mca;
> > -   } else if (skb->len >= sizeof(*mld2q)) {
> > +   } else {
> > if (!pskb_may_pull(skb, sizeof(*mld2q))) {
> > err = -EINVAL;
> > goto out;
> 
> Why not use else if here, other than that looks great.

Because it isn't really necessary, it is basically included
in the pskb_may_pull() already, just like it is in the according IGMP
code path.

And I thought it'd be nicer to handle it the same way as in the
IGMP code path to avoid diverging too much.

Cheers, Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/5] ACPI: Clean up error code path in acpi_unbind_one()

2013-08-05 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

The error code path in acpi_unbind_one() is unnecessarily complicated
(in particular, the err label is not really necessary) and the error
message printed by it is inaccurate (there's nothing called
'acpi_handle' in that function), so clean up those things.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/acpi/glue.c |   11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -219,8 +219,10 @@ int acpi_unbind_one(struct device *dev)
return 0;
 
status = acpi_bus_get_device(ACPI_HANDLE(dev), _dev);
-   if (ACPI_FAILURE(status))
-   goto err;
+   if (ACPI_FAILURE(status)) {
+   dev_err(dev, "Oops, ACPI handle corrupt in %s()\n", __func__);
+   return -EINVAL;
+   }
 
mutex_lock(_dev->physical_node_lock);
 
@@ -242,12 +244,7 @@ int acpi_unbind_one(struct device *dev)
}
 
mutex_unlock(_dev->physical_node_lock);
-
return 0;
-
-err:
-   dev_err(dev, "Oops, 'acpi_handle' corrupt\n");
-   return -EINVAL;
 }
 EXPORT_SYMBOL_GPL(acpi_unbind_one);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/5] ACPI: acpi_bind_one()/acpi_unbind_one() whitespace cleanups

2013-08-05 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Clean up some inconsistent use of whitespace in acpi_bind_one() and
acpi_unbind_one().

Signed-off-by: Rafael J. Wysocki 
---
 drivers/acpi/glue.c |7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -182,9 +182,9 @@ int acpi_bind_one(struct device *dev, ac
 
acpi_bind_physnode_name(physical_node_name, node_id);
retval = sysfs_create_link(_dev->dev.kobj, >kobj,
-   physical_node_name);
+  physical_node_name);
retval = sysfs_create_link(>kobj, _dev->dev.kobj,
-   "firmware_node");
+  "firmware_node");
 
mutex_unlock(_dev->physical_node_lock);
 
@@ -228,12 +228,11 @@ int acpi_unbind_one(struct device *dev)
char physical_node_name[PHYSICAL_NODE_NAME_SIZE];
 
entry = list_entry(node, struct acpi_device_physical_node,
-   node);
+  node);
if (entry->dev != dev)
continue;
 
list_del(node);
-
acpi_dev->physical_node_count--;
 
acpi_bind_physnode_name(physical_node_name, entry->node_id);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/5] ACPI: Create symlinks in acpi_bind_one() under physical_node_lock

2013-08-05 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Put the creation of symlinks in acpi_bind_one() under the
physical_node_lock mutex of the given ACPI device objects, because
that is part of the binding operation logically (those links are
already removed under that mutex too).

Signed-off-by: Rafael J. Wysocki 
---
 drivers/acpi/glue.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -177,8 +177,6 @@ int acpi_bind_one(struct device *dev, ac
list_add(_node->node, physnode_list);
acpi_dev->physical_node_count++;
 
-   mutex_unlock(_dev->physical_node_lock);
-
if (!ACPI_HANDLE(dev))
ACPI_HANDLE_SET(dev, acpi_dev->handle);
 
@@ -188,6 +186,8 @@ int acpi_bind_one(struct device *dev, ac
retval = sysfs_create_link(>kobj, _dev->dev.kobj,
"firmware_node");
 
+   mutex_unlock(_dev->physical_node_lock);
+
if (acpi_dev->wakeup.flags.valid)
device_set_wakeup_capable(dev, true);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/5] ACPI: acpi_bind_one()/acpi_unbind_one() cleanups

2013-08-05 Thread Rafael J. Wysocki
Hi All,

The following 5 patches clean up a little mess in acpi_bind_one() and
acpi_unbind_one().  They are on top of current linux-next plus the patch
at https://patchwork.kernel.org/patch/2839101/ .

[1/5] Move duplicated code from acpi_bind_one()/acpi_unbind_one() to a separate
  function.
[2/5] Create symlinks in acpi_bind_one() under physical_node_lock.
[3/5] Clean up inconsistent use of whitespace in 
acpi_bind_one()/acpi_unbind_one().
[4/5] Use list_for_each_entry() to walk the list in acpi_unbind_one().
[5/5] Clean up the error code path in acpi_unbind_one().

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/5] ACPI: Use list_for_each_entry() in acpi_unbind_one()

2013-08-05 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Since acpi_unbind_one() walks physical_node_list under the ACPI
device object's physical_node_lock mutex and the walk may be
terminated as soon as the matching entry has been found, it is
not necessary to use list_for_each_safe() for that walk, so use
list_for_each_entry() instead and make the code slightly more
straightforward.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/acpi/glue.c |   36 +---
 1 file changed, 17 insertions(+), 19 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -214,7 +214,6 @@ int acpi_unbind_one(struct device *dev)
struct acpi_device_physical_node *entry;
struct acpi_device *acpi_dev;
acpi_status status;
-   struct list_head *node, *next;
 
if (!ACPI_HANDLE(dev))
return 0;
@@ -224,25 +223,24 @@ int acpi_unbind_one(struct device *dev)
goto err;
 
mutex_lock(_dev->physical_node_lock);
-   list_for_each_safe(node, next, _dev->physical_node_list) {
-   char physical_node_name[PHYSICAL_NODE_NAME_SIZE];
 
-   entry = list_entry(node, struct acpi_device_physical_node,
-  node);
-   if (entry->dev != dev)
-   continue;
-
-   list_del(node);
-   acpi_dev->physical_node_count--;
-
-   acpi_bind_physnode_name(physical_node_name, entry->node_id);
-   sysfs_remove_link(_dev->dev.kobj, physical_node_name);
-   sysfs_remove_link(>kobj, "firmware_node");
-   ACPI_HANDLE_SET(dev, NULL);
-   /* acpi_bind_one increase refcnt by one */
-   put_device(dev);
-   kfree(entry);
-   }
+   list_for_each_entry(entry, _dev->physical_node_list, node)
+   if (entry->dev == dev) {
+   char physnode_name[PHYSICAL_NODE_NAME_SIZE];
+
+   list_del(>node);
+   acpi_dev->physical_node_count--;
+
+   acpi_bind_physnode_name(physnode_name, entry->node_id);
+   sysfs_remove_link(_dev->dev.kobj, physnode_name);
+   sysfs_remove_link(>kobj, "firmware_node");
+   ACPI_HANDLE_SET(dev, NULL);
+   /* acpi_bind_one() increase refcnt by one. */
+   put_device(dev);
+   kfree(entry);
+   break;
+   }
+
mutex_unlock(_dev->physical_node_lock);
 
return 0;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/5] ACPI: Reduce acpi_bind_one()/acpi_unbind_one() code duplication

2013-08-05 Thread Rafael J. Wysocki
From: Rafael J. Wysocki 

Move some duplicated code from acpi_bind_one() and acpi_unbind_one()
into a separate function and make that function use snprintf()
instead of sprintf() for extra safety.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/acpi/glue.c |   22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -108,6 +108,15 @@ acpi_handle acpi_get_child(acpi_handle p
 }
 EXPORT_SYMBOL(acpi_get_child);
 
+static void acpi_bind_physnode_name(char *buf, unsigned int node_id)
+{
+   if (node_id > 0)
+   snprintf(buf, PHYSICAL_NODE_NAME_SIZE,
+PHYSICAL_NODE_STRING "%u", node_id);
+   else
+   strcpy(buf, PHYSICAL_NODE_STRING);
+}
+
 int acpi_bind_one(struct device *dev, acpi_handle handle)
 {
struct acpi_device *acpi_dev;
@@ -173,11 +182,7 @@ int acpi_bind_one(struct device *dev, ac
if (!ACPI_HANDLE(dev))
ACPI_HANDLE_SET(dev, acpi_dev->handle);
 
-   if (!physical_node->node_id)
-   strcpy(physical_node_name, PHYSICAL_NODE_STRING);
-   else
-   sprintf(physical_node_name,
-   "physical_node%d", physical_node->node_id);
+   acpi_bind_physnode_name(physical_node_name, node_id);
retval = sysfs_create_link(_dev->dev.kobj, >kobj,
physical_node_name);
retval = sysfs_create_link(>kobj, _dev->dev.kobj,
@@ -231,12 +236,7 @@ int acpi_unbind_one(struct device *dev)
 
acpi_dev->physical_node_count--;
 
-   if (!entry->node_id)
-   strcpy(physical_node_name, PHYSICAL_NODE_STRING);
-   else
-   sprintf(physical_node_name,
-   "physical_node%d", entry->node_id);
-
+   acpi_bind_physnode_name(physical_node_name, entry->node_id);
sysfs_remove_link(_dev->dev.kobj, physical_node_name);
sysfs_remove_link(>kobj, "firmware_node");
ACPI_HANDLE_SET(dev, NULL);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Douglas Gilbert

Roland,
When this sg code was originally designed, there wasn't a bio
in sight :-)

Now I'm trying to get my head around this. We have launched
a "data-in" SCSI command like READ(10) and the DMA is underway
so we are waiting for a "done" indication. Instead we receive
a signal interruption. It is not clear to me why that DMA
would not just keep going unless we can get to something that
can stop or redirect the DMA. That something is more likely to
be the low level driver being used rather than the block layer.
In the original design to cope with this the destination pages
were locked in memory until the DMA completed.

So originally the design was to allow for this case at the top
of the waterfall. Now it seems there is bio magic going on
half way down the waterfall in the case of a signal interruption.


BTW, the keep_orphan logic probably only works for the
asynchronous sg interface (i.e. write sg_io_hdr then read response)
rather the the synchronous SG_IO ioctl. To support the keep_orphan
the user would need to do a read() on the sg device after the
SG_IO ioctl was interrupted.


Anyway, this obviously needs to be fixed.

Doug Gilbert

On 13-08-05 06:02 PM, Roland Dreier wrote:

From: Roland Dreier 

There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
leads to one process writing data into the address space of some other
random unrelated process if the ioctl is interrupted by a signal.
What happens is the following:

  - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
underlying SCSI command will transfer data from the SCSI device to
the buffer provided in the ioctl)

  - Before the command finishes, a signal is sent to the process waiting
in the ioctl.  This will end up waking up the sg_ioctl() code:

result = wait_event_interruptible(sfp->read_wait,
(srp_done(sfp, srp) || sdp->detached));

but neither srp_done() nor sdp->detached is true, so we end up just
setting srp->orphan and returning to userspace:

srp->orphan = 1;
write_unlock_irq(>rq_list_lock);
return result;  /* -ERESTARTSYS because signal hit process */

At this point the original process is done with the ioctl and
blithely goes ahead handling the signal, reissuing the ioctl, etc.

  - Eventually, the SCSI command issued by the first ioctl finishes and
ends up in sg_rq_end_io().  At the end of that function, we run through:

write_lock_irqsave(>rq_list_lock, iflags);
if (unlikely(srp->orphan)) {
if (sfp->keep_orphan)
srp->sg_io_owned = 0;
else
done = 0;
}
srp->done = done;
write_unlock_irqrestore(>rq_list_lock, iflags);

if (likely(done)) {
/* Now wake up any sg_read() that is waiting for this
 * packet.
 */
wake_up_interruptible(>read_wait);
kill_fasync(>async_qp, SIGPOLL, POLL_IN);
kref_put(>f_ref, sg_remove_sfp);
} else {
INIT_WORK(>ew.work, sg_rq_end_io_usercontext);
schedule_work(>ew.work);
}

Since srp->orphan *is* set, we set done to 0 (assuming the
userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
to run in a workqueue.

  - In workqueue context we go through sg_rq_end_io_usercontext() ->
sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

The key point here is that we are doing copy_to_user() on a
workqueue -- that is, we're on a kernel thread with current->mm
equal to whatever random previous user process was scheduled before
this kernel thread.  So we end up copying whatever data the SCSI
command returned to the virtual address of the buffer passed into
the original ioctl, but it's quite likely we do this copying into a
different address space!

Fix this by telling sg_finish_rem_req() whether we're on a workqueue
or not, and if we are, calling a new function blk_rq_unmap_user_nocopy()
that does everything the original blk_rq_unmap_user() does except
calling copy_{to,from}_user().  This requires a few levels of plumbing
through a "copy" flag in the bio layer.

I also considered fixing this by having the sg code just set
BIO_NULL_MAPPED for bios that are unmapped from a workqueue, which
happens to work because the __free_page() part of __bio_copy_iov()
isn't needed for sg (because sg handles its own pages).  However, this
seems coincidental and fragile, so I preferred making the fix
explicit, at the cost of minor tweaks to the bio code.

Huge thanks to Costa Sapuntzakis  for the
original pointer to this bug in the sg code.

Signed-off-by: Roland Dreier 
Cc: 
---
  block/blk-map.c| 15 ---
  

Re: BUG cxgb3: Check and handle the dma mapping errors

2013-08-05 Thread Alexey Kardashevskiy
On 08/06/2013 04:41 AM, Jay Fenlason wrote:
> On Mon, Aug 05, 2013 at 12:59:04PM +1000, Alexey Kardashevskiy wrote:
>> Hi!
>>
>> Recently I started getting multiple errors like this:
>>
>> cxgb3 0006:01:00.0: iommu_alloc failed, tbl c3067980 vaddr
>> c01fbdaaa882 npages 1
>> cxgb3 0006:01:00.0: iommu_alloc failed, tbl c3067980 vaddr
>> c01fbdaaa882 npages 1
>> cxgb3 0006:01:00.0: iommu_alloc failed, tbl c3067980 vaddr
>> c01fbdaaa882 npages 1
>> cxgb3 0006:01:00.0: iommu_alloc failed, tbl c3067980 vaddr
>> c01fbdaaa882 npages 1
>> cxgb3 0006:01:00.0: iommu_alloc failed, tbl c3067980 vaddr
>> c01fbdaaa882 npages 1
>> cxgb3 0006:01:00.0: iommu_alloc failed, tbl c3067980 vaddr
>> c01fbdaaa882 npages 1
>> cxgb3 0006:01:00.0: iommu_alloc failed, tbl c3067980 vaddr
>> c01fbdaaa882 npages 1
>> ... and so on
>>
>> This is all happening on a PPC64 "powernv" platform machine. To trigger the
>> error state, it is enough to _flood_ ping CXGB3 card from another machine
>> (which has Emulex 10Gb NIC + Cisco switch). Just do "ping -f 172.20.1.2"
>> and wait 10-15 seconds.
>>
>>
>> The messages are coming from arch/powerpc/kernel/iommu.c and basically
>> mean that the driver requested more pages than the DMA window has which is
>> normally 1GB (there could be another possible source of errors -
>> ppc_md.tce_build callback - but on powernv platform it always succeeds).
>>
>>
>> The patch after which it broke is:
>> commit f83331bab149e29fa2c49cf102c0cd8c3f1ce9f9
>> Author: Santosh Rastapur 
>> Date:   Tue May 21 04:21:29 2013 +
>> cxgb3: Check and handle the dma mapping errors
>>
>> Any quick ideas? Thanks!
> 
> That patch adds error checking to detect failed dma mapping requests.
> Before it, the code always assumed that dma mapping requests succeded,
> whether they actually do or not, so the fact that the older kernel
> does not log errors only means that the failures are being ignored,
> and any appearance of working is through pure luck.  The machine could
> have just crashed at that point.

>From what I see, the patch adds map_skb() function which is called in two
new places, so the patch does not just mechanically replace
skb_frag_dma_map() to map_skb() or something like that.

> What is the observed behavior of the system by the machine initiating
> the ping flood?  Do the older and newer kernels differ in the
> percentage of pings that do not receive replies? 

The other machine stops receiving replies. It is using different adapter,
not Chelsio and the kernel version does not really matter.

> O the newer kernel,
> when the mapping errors are detected, the packet that it is trying to
> transmit is dropped, but I'm not at all sure what happens on the older
> kernel after the dma mapping fails.  As I mentioned earlier, I'm
> surprised it does not crash.  Perhaps the folks from Chelsio have a
> better idea what happens after a dma mapping error is ignored?

Any kernel cannot avoid platform's iommu_alloc() on ppc64/powernv so if
there was a problem, we would have seen messages (and yes, kernel would
have crashed).



-- 
Alexey
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 6/7] Add EFI stub for ARM

2013-08-05 Thread Roy Franz
On Mon, Aug 5, 2013 at 8:33 AM, Leif Lindholm  wrote:
> On Mon, Aug 05, 2013 at 03:11:49PM +0100, Dave Martin wrote:
>> > diff --git a/arch/arm/boot/compressed/head.S 
>> > b/arch/arm/boot/compressed/head.S
>> > index 75189f1..4c70b9e 100644
>> > --- a/arch/arm/boot/compressed/head.S
>> > +++ b/arch/arm/boot/compressed/head.S
>> > @@ -122,19 +122,106 @@
>> > .arm@ Always enter in ARM state
>> >  start:
>> > .type   start,#function
>> > -   .rept   7
>> > +#ifdef CONFIG_EFI_STUB
>> > +   @ Magic MSDOS signature for PE/COFF + ADD opcode
>> > +   .word   0x62805a4d
>>
>> What about BE32?
>
> The ARM bindings for UEFI specify that the processor must be in
> little-endian mode.
>
>> In that case, the instruction is a coprocessor load, that loads from a
>> random address to a coprocessor that almost certainly doesn't exist.
>> This will probably fault.
>>
>> Since BE32 is only for older platforms (> solvable, it might be sensible to make the EFI stub support depend on
>> !CPU_ENDIAN_BE32.
>
> Well, it would make more sense to make EFI_STUB depend on EFI and
> EFI depend on !CPU_ENDIAN_BE32. Which is something I can add to
> my next set of general ARM UEFI patches. Thanks.
> /
> Leif

I had EFI_STUB depend on EFI at one point during my development, but took
it out because there was no actual dependency (the stub will work fine without
other EFI features.)  The features will most likely be used together,
but I wasn't
sure if we would want to enforce this with a config dependency.  I
don't care one
way or the other, I'd just like the dependencies to be correct and
follow best practices.

Thanks,
Roy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Cannot hot remove a memory device (patch, updated)

2013-08-05 Thread Rafael J. Wysocki
On Monday, August 05, 2013 05:19:56 PM Toshi Kani wrote:
> On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
>   :
> > Can you please test the appended patch?  I tested it somewhat, but since the
> > greatest number of physical nodes per ACPI device object I can get on my 
> > test
> > machines is 2 (and even that after hacking the kernel somewhat), that was 
> > kind
> > of unconclusive.
> > 
> > Thanks,
> > Rafael
> > 
> > 
> > ---
> > From: Rafael J. Wysocki 
> > Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
> > 
> > The physical_node_id_bitmap in struct acpi_device is only used for
> > looking up the first currently unused phyiscal dependent node ID
> > by acpi_bind_one().  It is not really necessary, however, because
> > acpi_bind_one() walks the entire physical_node_list of the given
> > device object for sanity checking anyway and if that list is always
> > sorted by node_id, it is straightforward to find the first gap
> > between the currently used node IDs and use that number as the ID
> > of the new list node.
> > 
> > This also removes the artificial limit of the maximum number of
> > dependent physical devices per ACPI device object, which now depends
> > only on the capacity of unsigend int.
> > 
> > Signed-off-by: Rafael J. Wysocki 
> 
> I like the change. Much better :-)
> 
> Acked-by: Toshi Kani 

However, it introduces a bug in acpi_unbind_one(), because the size of the name
array in there has to be increased too.  Updated patch follows.

Thanks,
Rafael


---
From: Rafael J. Wysocki 
Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device

The physical_node_id_bitmap in struct acpi_device is only used for
looking up the first currently unused dependent phyiscal node ID
by acpi_bind_one().  It is not really necessary, however, because
acpi_bind_one() walks the entire physical_node_list of the given
device object for sanity checking anyway and if that list is always
sorted by node_id, it is straightforward to find the first gap
between the currently used node IDs and use that number as the ID
of the new list node.

This also removes the artificial limit of the maximum number of
dependent physical devices per ACPI device object, which now depends
only on the capacity of unsigend int.

Signed-off-by: Rafael J. Wysocki 
---
 drivers/acpi/glue.c |   34 +++---
 include/acpi/acpi_bus.h |8 ++--
 2 files changed, 21 insertions(+), 21 deletions(-)

Index: linux-pm/drivers/acpi/glue.c
===
--- linux-pm.orig/drivers/acpi/glue.c
+++ linux-pm/drivers/acpi/glue.c
@@ -31,6 +31,7 @@ static LIST_HEAD(bus_type_list);
 static DECLARE_RWSEM(bus_type_sem);
 
 #define PHYSICAL_NODE_STRING "physical_node"
+#define PHYSICAL_NODE_NAME_SIZE (sizeof(PHYSICAL_NODE_STRING) + 10)
 
 int register_acpi_bus_type(struct acpi_bus_type *type)
 {
@@ -112,7 +113,9 @@ int acpi_bind_one(struct device *dev, ac
struct acpi_device *acpi_dev;
acpi_status status;
struct acpi_device_physical_node *physical_node, *pn;
-   char physical_node_name[sizeof(PHYSICAL_NODE_STRING) + 2];
+   char physical_node_name[PHYSICAL_NODE_NAME_SIZE];
+   struct list_head *physnode_list;
+   unsigned int node_id;
int retval = -EINVAL;
 
if (ACPI_HANDLE(dev)) {
@@ -139,8 +142,14 @@ int acpi_bind_one(struct device *dev, ac
 
mutex_lock(_dev->physical_node_lock);
 
-   /* Sanity check. */
-   list_for_each_entry(pn, _dev->physical_node_list, node)
+   /*
+* Keep the list sorted by node_id so that the IDs of removed nodes can
+* be recycled.
+*/
+   physnode_list = _dev->physical_node_list;
+   node_id = 0;
+   list_for_each_entry(pn, _dev->physical_node_list, node) {
+   /* Sanity check. */
if (pn->dev == dev) {
dev_warn(dev, "Already associated with ACPI node\n");
if (ACPI_HANDLE(dev) == handle)
@@ -148,19 +157,15 @@ int acpi_bind_one(struct device *dev, ac
 
goto out_free;
}
-
-   /* allocate physical node id according to physical_node_id_bitmap */
-   physical_node->node_id =
-   find_first_zero_bit(acpi_dev->physical_node_id_bitmap,
-   ACPI_MAX_PHYSICAL_NODE);
-   if (physical_node->node_id >= ACPI_MAX_PHYSICAL_NODE) {
-   retval = -ENOSPC;
-   goto out_free;
+   if (pn->node_id == node_id) {
+   physnode_list = >node;
+   node_id++;
+   }
}
 
-   set_bit(physical_node->node_id, acpi_dev->physical_node_id_bitmap);
+   physical_node->node_id = node_id;
physical_node->dev = dev;
-   list_add_tail(_node->node, _dev->physical_node_list);
+   list_add(_node->node, physnode_list);
acpi_dev->physical_node_count++;
 

Re: [PATCH 1/7] EFI stub documentation updates

2013-08-05 Thread Roy Franz
On Mon, Aug 5, 2013 at 7:12 AM, Dave Martin  wrote:
> On Fri, Aug 02, 2013 at 02:29:02PM -0700, Roy Franz wrote:
>> The ARM kernel also has an EFI stub which works largely the same way
>> as the x86 stub, so move the documentation out of x86 directory and
>> update to reflect that it is generic, and add ARM specific text.
>>
>> Signed-off-by: Roy Franz 
>> ---
>>  Documentation/efi-stub.txt |   78 
>> 
>>  Documentation/x86/efi-stub.txt |   65 -
>>  arch/x86/Kconfig   |2 +-
>>  3 files changed, 79 insertions(+), 66 deletions(-)
>>  create mode 100644 Documentation/efi-stub.txt
>>  delete mode 100644 Documentation/x86/efi-stub.txt
>>
>> diff --git a/Documentation/efi-stub.txt b/Documentation/efi-stub.txt
>> new file mode 100644
>> index 000..7837df1
>> --- /dev/null
>> +++ b/Documentation/efi-stub.txt
>> @@ -0,0 +1,78 @@
>> +   The EFI Boot Stub
>> +  ---
>> +
>> +On the x86 and ARM platforms, a bzImage can masquerade as a PE/COFF image,
>> +thereby convincing EFI firmware loaders to load it as an EFI
>> +executable. The code that modifies the bzImage header, along with the
>
>
> Minor nit, I don't think there is such a thing as "bzImage" for ARM.
>
> Cheers
> ---Dave
>
Yeah, I don't thinks so either...  How about:

On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade


Roy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 001/001] CHAR DRIVERS: a simple device to give daemons a /sys-like interface

2013-08-05 Thread Bob Smith

Greg
Thanks for discussing the module with me.  I think I'm now
closer to distilling it down to its essence.


GOAL:
The goal of this module is to give user space programs an
interface similar to that enjoyed by the kernel using procfs
and sysfs.  All of the following should be possible
echo 1 > /proc/sys/net/ipv4/ip_forward  # procfs
echo 75 > /dev/motors/left/speed# proxy
echo 5 > /dev/wpa_supplicant/use_channel # proxy

IPC:
To accomplish the above goal a new IPC is required.  This
new IPC must have the following characteristics:
- bidirectional
- writer blocks until reader is present
- a writer can cause the reader to close
- works with 'echo' and 'cat'

No existing IPC in Linux has all of these characteristics
but proxy, the tiny self-contained module submitted, does.
(Greg, I'm kind of surprised that a shim of an IPC like this
wasn't added to Linux a long, long time ago.)

USE CASES:
Proxy should be added to the kernel because it can greatly
improve Linux in two significant ways.

USE CASE #1: User space device drivers
A viable approach to user space device drivers would make
life easier for both programmers and kernel maintainers.
The latter because now a maintainer can now reasonably say
"go use proxy and a user space driver".   Some of the SPI
and I2C drivers might have been easier to do with proxy.
  Programmers doing device drivers might have an easier time
since it will be easier to prototype and debug a system in
user space.  SPI and I2C driver writers in particular may
appreciate the ability to build a working system without
having to go through the sometimes tedious process of a
kernel submission.
  Finally, some device drivers that are not possible today
would become possible.  In my case I have a USB-serial link
to a robot controller and so need a user space daemon to
terminate the serial line.  It is only with proxy that I
can hide the details of this and give users a nice /dev
view of the robot.

USE CASE #2:  End the madness of language bindings
Over 10 years ago kernel developers had the sense to escape
(some) ioctl language bindings with the introduction of
procfs.   How is it that in all this time we haven't done
the same thing for all the daemons that populate Linux?
No, today daemon writers are still being forced to open a
socket, define and document a protocol over it, and then
write a library for that protocol for all the popular
languages.   And we're not talking about just one or two
languages.  No, now it more like C, Java, Python, PHP, and
soon node.js.  Next week some new language will wander off
the street and need a yet another binding.  Eeeech!
   Let's let daemons use the same kind of interface that the
kernel has with /sys and /proc.  With proxy, daemon coders
could define an ASCII interface in exactly the same way the
kernel has.  The inclusion of 'echo' and 'cat' above is kind
of a litmus test.  If a daemon interface works with cat and
echo, it will _NEVER_ need dedicated per-language bindings.



thanks
Bob Smith



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC 46/51] ARM: DMA-API: better handing of DMA masks for coherent allocations

2013-08-05 Thread Russell King - ARM Linux
On Mon, Aug 05, 2013 at 05:43:47PM -0500, Rob Herring wrote:
> On Thu, Aug 1, 2013 at 5:20 PM, Russell King
>  wrote:
> > We need to start treating DMA masks as something which is specific to
> > the bus that the device resides on, otherwise we're going to hit all
> > sorts of nasty issues with LPAE and 32-bit DMA controllers in >32-bit
> > systems, where memory is offset from PFN 0.
> >
> > In order to start doing this, we convert the DMA mask to a PFN using
> > the device specific dma_to_pfn() macro.  This is the reverse of the
> > pfn_to_dma() macro which is used to get the DMA address for the device.
> >
> > This gives us a PFN mask, which we can then check against the PFN
> > limit of the DMA zone.
> >
> > Signed-off-by: Russell King 
> > ---
> >  arch/arm/mm/dma-mapping.c |   49 
> > 
> >  arch/arm/mm/init.c|2 +
> >  arch/arm/mm/mm.h  |2 +
> >  3 files changed, 48 insertions(+), 5 deletions(-)
> 
> I believe you missed handling __dma_alloc. I have a different fix than
> what Andreas posted. I think DMA zone handling is broken in all cases
> here. Feel free to combine this in to your patch if you agree.

I was starting to wonder if whether anyone was going to look at those
patches...

> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 7f9b179..3d9bdfb 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -651,7 +651,7 @@ static void *__dma_alloc(struct device *dev,
> size_t size, dma_addr_t *handle,
> if (!mask)
> return NULL;
> 
> -   if (mask < 0xULL)
> +   if (mask <= (u64)arm_dma_limit)
> gfp |= GFP_DMA;

I'm not convinced on that - I think you've missed the entire point in
this patch series about what address space the 'mask' is in.  'mask' is
in the device's address space, which may not be the same as the physical
address space.

With LPAE, the two can become quite different address spaces with a
4GB offset between them.  That's why we must stop the old-school
thinking that DMA addresses and physical addresses are the same thing.
We also need to stop doing stuff like passing dma_addr_t variables into
phys_to_virt().  (All those short-cuts are going to break!)

arm_dma_limit is the physical address space.  So, comparing the two
makes no sense what so ever.

However, the use of arm_dma_limit at the top of get_coherent_dma_mask()
is a bug, I think that should just become something like a 24-bit
constant mask, so the NULL device gets GFP_DMA allocations like they
do on x86 for that case.

I also think that 'mask' should be converted to a pfn at the location
you point out before comparing with the amount of memory in the DMA
zone.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread James Bottomley
On Mon, 2013-08-05 at 16:38 -0700, Roland Dreier wrote:
> On Mon, Aug 5, 2013 at 4:31 PM, James Bottomley
>  wrote:
> > I agree with the analysis.  The fix is a bit draconian, though.  A
> > workqueue actually runs in a kernel thread and there's a simple test for
> > that (!current->mm), so how about this instead (which is much less
> > intrusive)
> 
> > ---
> 
> > diff --git a/fs/bio.c b/fs/bio.c
> > index 94bbc04..e2ab39c 100644
> > --- a/fs/bio.c
> > +++ b/fs/bio.c
> > @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
> > bio_vec *iovecs,
> >  int bio_uncopy_user(struct bio *bio)
> >  {
> > struct bio_map_data *bmd = bio->bi_private;
> > -   int ret = 0;
> > +   struct bio_vec *bvec;
> > +   int ret = 0, i;
> >
> > -   if (!bio_flagged(bio, BIO_NULL_MAPPED))
> > -   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
> > -bmd->nr_sgvecs, bio_data_dir(bio) == 
> > READ,
> > -0, bmd->is_our_pages);
> > +   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
> > +   /*
> > +* if we're in a workqueue, the request is orphaned, so
> > +* don't copy into the kernel address space, just free
> > +*/
> > +   if (current->mm)
> > +   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
> > +bmd->nr_sgvecs, 
> > bio_data_dir(bio) == READ,
> > +0, bmd->is_our_pages);
> > +   else if (bmd->is_our_pages)
> > +   bio_for_each_segment_all(bvec, bio, i)
> > +   __free_page(bvec->bv_page);
> > +   }
> > bio_free_map_data(bmd);
> > bio_put(bio);
> > return ret;
> 
> Yes, looks reasonable -- I can't think of any reason why anyone would
> ever want the bio code to copy to a random userspace address space.
> 
> Acked-by: Roland Dreier 

You did all the work ... just replace this patch with your previous one
and keep the original tags. (test it first, of course ...)

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread Roland Dreier
On Mon, Aug 5, 2013 at 4:31 PM, James Bottomley
 wrote:
> I agree with the analysis.  The fix is a bit draconian, though.  A
> workqueue actually runs in a kernel thread and there's a simple test for
> that (!current->mm), so how about this instead (which is much less
> intrusive)

> ---

> diff --git a/fs/bio.c b/fs/bio.c
> index 94bbc04..e2ab39c 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
> bio_vec *iovecs,
>  int bio_uncopy_user(struct bio *bio)
>  {
> struct bio_map_data *bmd = bio->bi_private;
> -   int ret = 0;
> +   struct bio_vec *bvec;
> +   int ret = 0, i;
>
> -   if (!bio_flagged(bio, BIO_NULL_MAPPED))
> -   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
> -bmd->nr_sgvecs, bio_data_dir(bio) == 
> READ,
> -0, bmd->is_our_pages);
> +   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
> +   /*
> +* if we're in a workqueue, the request is orphaned, so
> +* don't copy into the kernel address space, just free
> +*/
> +   if (current->mm)
> +   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
> +bmd->nr_sgvecs, 
> bio_data_dir(bio) == READ,
> +0, bmd->is_our_pages);
> +   else if (bmd->is_our_pages)
> +   bio_for_each_segment_all(bvec, bio, i)
> +   __free_page(bvec->bv_page);
> +   }
> bio_free_map_data(bmd);
> bio_put(bio);
> return ret;

Yes, looks reasonable -- I can't think of any reason why anyone would
ever want the bio code to copy to a random userspace address space.

Acked-by: Roland Dreier 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] [SCSI] sg: Fix user memory corruption when SG_IO is interrupted by a signal

2013-08-05 Thread James Bottomley
On Mon, 2013-08-05 at 15:02 -0700, Roland Dreier wrote:
> From: Roland Dreier 
> 
> There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
> leads to one process writing data into the address space of some other
> random unrelated process if the ioctl is interrupted by a signal.
> What happens is the following:
> 
>  - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
>underlying SCSI command will transfer data from the SCSI device to
>the buffer provided in the ioctl)
> 
>  - Before the command finishes, a signal is sent to the process waiting
>in the ioctl.  This will end up waking up the sg_ioctl() code:
> 
>   result = wait_event_interruptible(sfp->read_wait,
>   (srp_done(sfp, srp) || sdp->detached));
> 
>but neither srp_done() nor sdp->detached is true, so we end up just
>setting srp->orphan and returning to userspace:
> 
>   srp->orphan = 1;
>   write_unlock_irq(>rq_list_lock);
>   return result;  /* -ERESTARTSYS because signal hit process */
> 
>At this point the original process is done with the ioctl and
>blithely goes ahead handling the signal, reissuing the ioctl, etc.
> 
>  - Eventually, the SCSI command issued by the first ioctl finishes and
>ends up in sg_rq_end_io().  At the end of that function, we run through:
> 
>   write_lock_irqsave(>rq_list_lock, iflags);
>   if (unlikely(srp->orphan)) {
>   if (sfp->keep_orphan)
>   srp->sg_io_owned = 0;
>   else
>   done = 0;
>   }
>   srp->done = done;
>   write_unlock_irqrestore(>rq_list_lock, iflags);
> 
>   if (likely(done)) {
>   /* Now wake up any sg_read() that is waiting for this
>* packet.
>*/
>   wake_up_interruptible(>read_wait);
>   kill_fasync(>async_qp, SIGPOLL, POLL_IN);
>   kref_put(>f_ref, sg_remove_sfp);
>   } else {
>   INIT_WORK(>ew.work, sg_rq_end_io_usercontext);
>   schedule_work(>ew.work);
>   }
> 
>Since srp->orphan *is* set, we set done to 0 (assuming the
>userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
>ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
>to run in a workqueue.
> 
>  - In workqueue context we go through sg_rq_end_io_usercontext() ->
>sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
>bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().
> 
>The key point here is that we are doing copy_to_user() on a
>workqueue -- that is, we're on a kernel thread with current->mm
>equal to whatever random previous user process was scheduled before
>this kernel thread.  So we end up copying whatever data the SCSI
>command returned to the virtual address of the buffer passed into
>the original ioctl, but it's quite likely we do this copying into a
>different address space!
> 
> Fix this by telling sg_finish_rem_req() whether we're on a workqueue
> or not, and if we are, calling a new function blk_rq_unmap_user_nocopy()
> that does everything the original blk_rq_unmap_user() does except
> calling copy_{to,from}_user().  This requires a few levels of plumbing
> through a "copy" flag in the bio layer.

I agree with the analysis.  The fix is a bit draconian, though.  A
workqueue actually runs in a kernel thread and there's a simple test for
that (!current->mm), so how about this instead (which is much less
intrusive)

James

---

diff --git a/fs/bio.c b/fs/bio.c
index 94bbc04..e2ab39c 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1045,12 +1045,22 @@ static int __bio_copy_iov(struct bio *bio, struct 
bio_vec *iovecs,
 int bio_uncopy_user(struct bio *bio)
 {
struct bio_map_data *bmd = bio->bi_private;
-   int ret = 0;
+   struct bio_vec *bvec;
+   int ret = 0, i;
 
-   if (!bio_flagged(bio, BIO_NULL_MAPPED))
-   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
-bmd->nr_sgvecs, bio_data_dir(bio) == READ,
-0, bmd->is_our_pages);
+   if (!bio_flagged(bio, BIO_NULL_MAPPED)) {
+   /*
+* if we're in a workqueue, the request is orphaned, so
+* don't copy into the kernel address space, just free
+*/
+   if (current->mm)
+   ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs,
+bmd->nr_sgvecs, bio_data_dir(bio) 
== READ,
+0, bmd->is_our_pages);
+   else if (bmd->is_our_pages)
+   bio_for_each_segment_all(bvec, bio, i)
+   __free_page(bvec->bv_page);
+   }
bio_free_map_data(bmd);
bio_put(bio);
return ret;


--
To unsubscribe from this list: send the 

Re: [PATCH] checkpatch: add a rule to check devinitconst mistakes

2013-08-05 Thread Joe Perches
On Mon, 2013-08-05 at 15:10 -0700, Andi Kleen wrote:
> From: Andi Kleen 
> 
> Check for const __devinitdata and non const __devinitconst
> 
> People get this regularly wrong and it breaks the LTO builds,
> as it causes a section attribute conflict.
> 
> This doesn't catch all mistakes -- spreading over multiple lines,
> getting const pointers wrong, but hopefully the common ones.
> 
> Signed-off-by: Andi Kleen 
> ---
>  scripts/checkpatch.pl | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 2ee9eb7..5d68d9c 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -2676,6 +2676,15 @@ sub process {
>   }
>   }
>  
> +# check for __devinitdata with const or const without __devintconst
> +# XXX should scan multiple lines and handle misplaced consts for pointers
> + if ($line =~ /const/ && $line =~ /__(dev)?initdata/) {
> + ERROR("DEVINITCONST", "const init definition must use 
> __devinitconst");
> + }
> + if ($line =~ /__(dev)?initconst/ && $line !~ /\Wconst\W/) {

There are no more uses of __devinitconst and
__devinitdata in the tree.

Shouldn't these be "\bconst\b", "\b__initdata\b"
and "\b__initconst\b"

I think also there'll be a few too many false
positives for function arguments.

It seems that every use of __initconst is of
the form

[static] const  [name] __initconst so

if ($line =~ /\b__initconst\b/ &&
$line !~ 
/^\+\s*(?:static\b)?\s*const\s+$Type\s*(?:$Ident)?\s*__initconst\b/) {
etc...
}

should work reasonably well.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3] clk: s2mps11: Add support for s2mps11

2013-08-05 Thread Mike Turquette
Quoting Yadwinder Singh Brar (2013-07-07 04:44:20)
> This patch adds support to register three(AP/CP/BT) buffered 32.768 KHz
> outputs of mfd-s2mps11 with common clock framework.
> 
> Signed-off-by: Yadwinder Singh Brar 

Yadwinder,

Looks good to me with the exception of a binding description document.
Can you provide one and squash it into this commit?

Thanks,
Mike

> ---
>  drivers/clk/Kconfig   |6 +
>  drivers/clk/Makefile  |1 +
>  drivers/clk/clk-s2mps11.c |  273 
> +
>  3 files changed, 280 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/clk/clk-s2mps11.c
> 
> diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
> index 0357ac4..3fdf10e 100644
> --- a/drivers/clk/Kconfig
> +++ b/drivers/clk/Kconfig
> @@ -65,6 +65,12 @@ config COMMON_CLK_SI5351
>   This driver supports Silicon Labs 5351A/B/C programmable clock
>   generators.
>  
> +config COMMON_CLK_S2MPS11
> +   tristate "Clock driver for S2MPS11 MFD"
> +   depends on MFD_SEC_CORE
> +   ---help---
> + This driver supports S2MPS11 crystal oscillator clock.
> +
>  config CLK_TWL6040
> tristate "External McPDM functional clock from twl6040"
> depends on TWL6040_CORE
> diff --git a/drivers/clk/Makefile b/drivers/clk/Makefile
> index 137d3e7..5fd642d 100644
> --- a/drivers/clk/Makefile
> +++ b/drivers/clk/Makefile
> @@ -38,4 +38,5 @@ obj-$(CONFIG_COMMON_CLK_AXI_CLKGEN) += clk-axi-clkgen.o
>  obj-$(CONFIG_COMMON_CLK_WM831X) += clk-wm831x.o
>  obj-$(CONFIG_COMMON_CLK_MAX77686) += clk-max77686.o
>  obj-$(CONFIG_COMMON_CLK_SI5351) += clk-si5351.o
> +obj-$(CONFIG_COMMON_CLK_S2MPS11) += clk-s2mps11.o
>  obj-$(CONFIG_CLK_TWL6040)  += clk-twl6040.o
> diff --git a/drivers/clk/clk-s2mps11.c b/drivers/clk/clk-s2mps11.c
> new file mode 100644
> index 000..7be41e6
> --- /dev/null
> +++ b/drivers/clk/clk-s2mps11.c
> @@ -0,0 +1,273 @@
> +/*
> + * clk-s2mps11.c - Clock driver for S2MPS11.
> + *
> + * Copyright (C) 2013 Samsung Electornics
> + *
> + * This program is free software; you can redistribute  it and/or modify it
> + * under  the terms of  the GNU General  Public License as published by the
> + * Free Software Foundation;  either version 2 of the  License, or (at your
> + * option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define s2mps11_name(a) (a->hw.init->name)
> +
> +static struct clk **clk_table;
> +static struct clk_onecell_data clk_data;
> +
> +enum {
> +   S2MPS11_CLK_AP = 0,
> +   S2MPS11_CLK_CP,
> +   S2MPS11_CLK_BT,
> +   S2MPS11_CLKS_NUM,
> +};
> +
> +struct s2mps11_clk {
> +   struct sec_pmic_dev *iodev;
> +   struct clk_hw hw;
> +   struct clk *clk;
> +   struct clk_lookup *lookup;
> +   u32 mask;
> +   bool enabled;
> +};
> +
> +static struct s2mps11_clk *to_s2mps11_clk(struct clk_hw *hw)
> +{
> +   return container_of(hw, struct s2mps11_clk, hw);
> +}
> +
> +static int s2mps11_clk_prepare(struct clk_hw *hw)
> +{
> +   struct s2mps11_clk *s2mps11 = to_s2mps11_clk(hw);
> +   int ret;
> +
> +   ret = regmap_update_bits(s2mps11->iodev->regmap,
> +   S2MPS11_REG_RTC_CTRL,
> +s2mps11->mask, s2mps11->mask);
> +   if (!ret)
> +   s2mps11->enabled = true;
> +
> +   return ret;
> +}
> +
> +static void s2mps11_clk_unprepare(struct clk_hw *hw)
> +{
> +   struct s2mps11_clk *s2mps11 = to_s2mps11_clk(hw);
> +   int ret;
> +
> +   ret = regmap_update_bits(s2mps11->iodev->regmap, S2MPS11_REG_RTC_CTRL,
> +  s2mps11->mask, ~s2mps11->mask);
> +
> +   if (!ret)
> +   s2mps11->enabled = false;
> +}
> +
> +static int s2mps11_clk_is_enabled(struct clk_hw *hw)
> +{
> +   struct s2mps11_clk *s2mps11 = to_s2mps11_clk(hw);
> +
> +   return s2mps11->enabled;
> +}
> +
> +static unsigned long s2mps11_clk_recalc_rate(struct clk_hw *hw,
> +unsigned long parent_rate)
> +{
> +   struct s2mps11_clk *s2mps11 = to_s2mps11_clk(hw);
> +   if (s2mps11->enabled)
> +   return 32768;
> +   else
> +   return 0;
> +}
> +
> +static struct clk_ops s2mps11_clk_ops = {
> +   .prepare= s2mps11_clk_prepare,
> +   .unprepare  = s2mps11_clk_unprepare,
> +   .is_enabled = 

Re: Cannot hot remove a memory device (patch)

2013-08-05 Thread Toshi Kani
On Mon, 2013-08-05 at 15:14 +0200, Rafael J. Wysocki wrote:
  :
> Can you please test the appended patch?  I tested it somewhat, but since the
> greatest number of physical nodes per ACPI device object I can get on my test
> machines is 2 (and even that after hacking the kernel somewhat), that was kind
> of unconclusive.
> 
> Thanks,
> Rafael
> 
> 
> ---
> From: Rafael J. Wysocki 
> Subject: ACPI: Drop physical_node_id_bitmap from struct acpi_device
> 
> The physical_node_id_bitmap in struct acpi_device is only used for
> looking up the first currently unused phyiscal dependent node ID
> by acpi_bind_one().  It is not really necessary, however, because
> acpi_bind_one() walks the entire physical_node_list of the given
> device object for sanity checking anyway and if that list is always
> sorted by node_id, it is straightforward to find the first gap
> between the currently used node IDs and use that number as the ID
> of the new list node.
> 
> This also removes the artificial limit of the maximum number of
> dependent physical devices per ACPI device object, which now depends
> only on the capacity of unsigend int.
> 
> Signed-off-by: Rafael J. Wysocki 

I like the change. Much better :-)

Acked-by: Toshi Kani 

Thanks,
-Toshi


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 08/16] x86, asmlinkage, kexec: Drop bogus asmlinkage in machine_kexec_32

2013-08-05 Thread Andi Kleen
On Mon, Aug 05, 2013 at 04:03:27PM -0700, H. Peter Anvin wrote:
> On 08/05/2013 03:02 PM, Andi Kleen wrote:
> > From: Andi Kleen 
> > 
> > A function pointer cannot be asmlinkage. Just drop it.
> > 
> > Signed-off-by: Andi Kleen 
> 
> Eh?  It certainly matters for the function pointer if it is regparm(0)
> or regparm(3), and the pointed-to function definitely assumes
> regparm(0).  So I think this patch is wrong, and if it isn't, it
> definitely needs better explanation why it isn't wrong.

Ok. Good point.

It causes compiler warnings with __attribute__((externally_visible))
because only a definition can be visible.

But yes it is needed for regparm.

So in a sense these two are incompatible. I guess it can be dropped
right now, as it's just a warning.

So please drop the patch and I'll too.
I'll not repost just for the drop, unless you ask me to 
(or other changes come up)

Longer term may need some different solution for this.

-Andi


-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] ARM: msm: Add support for MSM8974 Dragonboard

2013-08-05 Thread Rohit Vaswani
This patch adds basic board support for MSM8974 Dragonboard
which belongs to the Snapdragon 800 family.
For now, just support a basic machine with device tree.

Signed-off-by: Rohit Vaswani 
---
 arch/arm/boot/dts/Makefile|  3 ++-
 arch/arm/boot/dts/msm8974-db.dts  | 26 ++
 arch/arm/mach-msm/Kconfig | 21 ++---
 arch/arm/mach-msm/Makefile|  1 +
 arch/arm/mach-msm/board-dt-8974.c | 23 +++
 5 files changed, 70 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/boot/dts/msm8974-db.dts
 create mode 100644 arch/arm/mach-msm/board-dt-8974.c

diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index 641b3c9a..62cea36 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -97,7 +97,8 @@ dtb-$(CONFIG_ARCH_KIRKWOOD) += kirkwood-cloudbox.dtb \
kirkwood-openblocks_a6.dtb
 dtb-$(CONFIG_ARCH_MARCO) += marco-evb.dtb
 dtb-$(CONFIG_ARCH_MSM) += msm8660-surf.dtb \
-   msm8960-cdp.dtb
+   msm8960-cdp.dtb \
+   msm8974-db.dtb
 dtb-$(CONFIG_ARCH_MVEBU) += armada-370-db.dtb \
armada-370-mirabox.dtb \
armada-370-rd.dtb \
diff --git a/arch/arm/boot/dts/msm8974-db.dts b/arch/arm/boot/dts/msm8974-db.dts
new file mode 100644
index 000..badfc61
--- /dev/null
+++ b/arch/arm/boot/dts/msm8974-db.dts
@@ -0,0 +1,26 @@
+/dts-v1/;
+
+/include/ "skeleton.dtsi"
+
+/ {
+   model = "Qualcomm MSM8974 Dragonboard";
+   compatible = "qcom,msm8974-db", "qcom,msm8974";
+   interrupt-parent = <>;
+
+   intc: interrupt-controller@f900 {
+   compatible = "qcom,msm-qgic2";
+   interrupt-controller;
+   #interrupt-cells = <3>;
+   reg = < 0xf900 0x1000 >,
+ < 0xf9002000 0x1000 >;
+   };
+
+   timer {
+   compatible = "arm,armv7-timer";
+   interrupts = <1 2 0xf08>,
+<1 3 0xf08>,
+<1 4 0xf08>,
+<1 1 0xf08>;
+   clock-frequency = <1920>;
+   };
+};
diff --git a/arch/arm/mach-msm/Kconfig b/arch/arm/mach-msm/Kconfig
index 614e41e..343675b 100644
--- a/arch/arm/mach-msm/Kconfig
+++ b/arch/arm/mach-msm/Kconfig
@@ -1,12 +1,12 @@
 if ARCH_MSM
 
 comment "Qualcomm MSM SoC Type"
-   depends on (ARCH_MSM8X60 || ARCH_MSM8960)
+   depends on ARCH_MSM_DT
 
 choice
prompt "Qualcomm MSM SoC Type"
default ARCH_MSM7X00A
-   depends on !(ARCH_MSM8X60 || ARCH_MSM8960)
+   depends on !ARCH_MSM_DT
 
 config ARCH_MSM7X00A
bool "MSM7x00A / MSM7x01A"
@@ -60,6 +60,19 @@ config ARCH_MSM8960
select MSM_SCM if SMP
select USE_OF
 
+config ARCH_MSM8974
+   bool "MSM8974"
+   select ARM_GIC
+   select CPU_V7
+   select HAVE_ARM_ARCH_TIMER
+   select HAVE_SMP
+   select MSM_SCM if SMP
+   select USE_OF
+
+config ARCH_MSM_DT
+   def_bool y
+   depends on (ARCH_MSM8X60 || ARCH_MSM8960 || ARCH_MSM8974)
+
 config MSM_HAS_DEBUG_UART_HS
bool
 
@@ -68,6 +81,7 @@ config MSM_SOC_REV_A
 
 config  ARCH_MSM_ARM11
bool
+
 config  ARCH_MSM_SCORPION
bool
 
@@ -75,6 +89,7 @@ config  MSM_VIC
bool
 
 menu "Qualcomm MSM Board Type"
+   depends on !ARCH_MSM_DT
 
 config MACH_HALIBUT
depends on ARCH_MSM
@@ -121,7 +136,7 @@ config MSM_SMD
bool
 
 config MSM_GPIOMUX
-   depends on !(ARCH_MSM8X60 || ARCH_MSM8960)
+   depends on !ARCH_MSM_DT
bool "MSM V1 TLMM GPIOMUX architecture"
help
  Support for MSM V1 TLMM GPIOMUX architecture.
diff --git a/arch/arm/mach-msm/Makefile b/arch/arm/mach-msm/Makefile
index d257ff4..80e3b15 100644
--- a/arch/arm/mach-msm/Makefile
+++ b/arch/arm/mach-msm/Makefile
@@ -29,5 +29,6 @@ obj-$(CONFIG_ARCH_MSM7X30) += board-msm7x30.o 
devices-msm7x30.o
 obj-$(CONFIG_ARCH_QSD8X50) += board-qsd8x50.o devices-qsd8x50.o
 obj-$(CONFIG_ARCH_MSM8X60) += board-dt-8660.o
 obj-$(CONFIG_ARCH_MSM8960) += board-dt-8960.o
+obj-$(CONFIG_ARCH_MSM8974) += board-dt-8974.o
 obj-$(CONFIG_MSM_GPIOMUX) += gpiomux.o
 obj-$(CONFIG_ARCH_QSD8X50) += gpiomux-8x50.o
diff --git a/arch/arm/mach-msm/board-dt-8974.c 
b/arch/arm/mach-msm/board-dt-8974.c
new file mode 100644
index 000..697623e
--- /dev/null
+++ b/arch/arm/mach-msm/board-dt-8974.c
@@ -0,0 +1,23 @@
+/* Copyright (c) 2013, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+
+static const char * const 

Re: [PATCH 08/16] x86, asmlinkage, kexec: Drop bogus asmlinkage in machine_kexec_32

2013-08-05 Thread H. Peter Anvin
On 08/05/2013 03:02 PM, Andi Kleen wrote:
> From: Andi Kleen 
> 
> A function pointer cannot be asmlinkage. Just drop it.
> 
> Signed-off-by: Andi Kleen 

Eh?  It certainly matters for the function pointer if it is regparm(0)
or regparm(3), and the pointed-to function definitely assumes
regparm(0).  So I think this patch is wrong, and if it isn't, it
definitely needs better explanation why it isn't wrong.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >