date:20130409

Re: [RFC PATCH v2 6/6] powerpc: Use generic code for exception handling

2013-04-09 Thread Li Zhong

On Wed, 2013-04-10 at 13:32 +0800, Li Zhong wrote:
> On Wed, 2013-04-10 at 14:56 +1000, Michael Ellerman wrote:
> > On Fri, Mar 29, 2013 at 06:00:21PM +0800, Li Zhong wrote:
> > > After the exception handling moved to generic code, and some changes in
> > ...
> > > diff --git a/arch/powerpc/mm/hash_utils_64.c 
> > > b/arch/powerpc/mm/hash_utils_64.c
> > > index 360fba8..eeab30f 100644
> > > --- a/arch/powerpc/mm/hash_utils_64.c
> > > +++ b/arch/powerpc/mm/hash_utils_64.c
> > > @@ -33,6 +33,7 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  
> > >  #include 
> > >  #include 
> > > @@ -56,7 +57,6 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > -#include 
> > >  
> > >  #ifdef DEBUG
> > >  #define DBG(fmt...) udbg_printf(fmt)
> > > @@ -919,13 +919,17 @@ int hash_page(unsigned long ea, unsigned long 
> > > access, unsigned long trap)
> > >   const struct cpumask *tmp;
> > >   int rc, user_region = 0, local = 0;
> > >   int psize, ssize;
> > > + enum ctx_state prev_state;
> > > +
> > > + prev_state = exception_enter();
> > >  
> > >   DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> > >   ea, access, trap);
> > >  
> > >   if ((ea & ~REGION_MASK) >= PGTABLE_RANGE) {
> > >   DBG_LOW(" out of pgtable range !\n");
> > > - return 1;
> > > + rc = 1;
> > > + goto exit;
> > >   }
> > >  
> > >   /* Get region & vsid */
> > 
> > This no longer applies on mainline, please send an updated version.
> 
> Yes, for current mainline (powerpc tree), only previous five patches
> could be applied. The dependency of this patch is current in tip tree,
> and seems would be in for 3.10.
> 
> There are some more details in the cover letter (#0):
> 
> "I assume these patches would get in through powerpc tree, so I didn't
> combine the new patch (#6) with the original one (#2). So that if
> powerpc tree picks these, it could pick the first five patches, and
> apply patch #6 later when the dependency enters into powerpc tree (maybe
> on some 3.10-rcs)."

And I will send an updated version of this one when I see the dependency
commits in mainline. 

Thanks, Zhong

> Thanks, Zhong
> 
> > cheers
> > 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 3/3] ARM: davinci: da850: add EHRPWM & ECAP DT node

2013-04-09 Thread Sekhar Nori

On 4/10/2013 11:00 AM, Philip, Avinash wrote:
> On Tue, Apr 09, 2013 at 17:05:25, Nori, Sekhar wrote:
>> On 4/9/2013 2:12 PM, Philip, Avinash wrote:
>>> On Mon, Apr 08, 2013 at 18:39:57, Nori, Sekhar wrote:

 On 4/8/2013 2:39 PM, Philip, Avinash wrote:
> On Tue, Apr 02, 2013 at 14:03:34, Nori, Sekhar wrote:
>> On 3/25/2013 1:19 PM, Philip Avinash wrote:
>>> Add da850 EHRPWM & ECAP DT node.
>>> Also adds OF_DEV_AUXDATA for EHRPWM & ECAP driver to use EHRPWM & ECAP
>>> clock.
>>
>> This looks fine to me but I will wait for the bindings to get accepted
>> before taking this one.
>
> Sekhar,
>
> Binding document got accepted in PWM tree [1].
> Can you accept this patch?

 Can you also add the pinmux definitions and resend just this patch?
 Sorry I did not notice those were missing earlier.
>>>
>>> According to latest schematics, ECAP instance 2 being used for PWM backlight
>>> control. Should I add pin-mux only for ECAP2 or for all PWM instances?
>>
>> I meant add definitions in .dtsi. Since there is only one pin a given
>> functionality can be present on in DaVinci, it can be done in a board
>> independent manner.
> 
> I think here the expectation would be that .dtsi should populate the complete
> pin-mux for SOC and board files should just be able to re-use it (add it as a 
> phandler).

Yes, that's the idea.

> Also as per the above description .dtsi file will end up contain majorly 
> pin-mux info
> rather than the hardware data. Is it a good idea?

Pinmux is also hardware data, no? Thats why its present in DT.

> On looking da850.dtsi file NAND pins were defined for 8-bit part. 
> In case of NAND flash, the device might be sitting under different 
> chip-select or may
> have 16 bit part on  different boards. So pin-mux defined in soc.dtsi has to 
> be split
> separately for CS, DATA, Address.

The idea is to define pin groups that most of the time can be reused by
.dts file as-is and if there are any board specific extra pins needed
then they can be handled directly in .dts files. But the common cases
don't have to be repeated in all boards. In case of NAND, CS and the top
8-pins when using a 16-bit bus can be moved to a different group. So, I
agree instead of nand_cs3_pins, we could have had nand_pins and moved cs
definitions to another re-usable group.

> So it is always challenging to create pin-mux info in .dtsi file. So more 
> useful/meaningful
> way is to actually create pin-mux in board file rather in .dtsi file.

I don't see why it is so challenging. Repeating the same pinmux
information over multiple .dts file (while making errors copying) will
be challenging. And its not as if this is my original idea. imx (and I
think some others) are doing it as well. See how pinmux is defined in
imx53.dtsi and reused in a number of boards like evk, qsb, smd and so on.

>> See examples for other peripherals in existing
>> da850.dtsi file.
> 
> I have gone through .dtsi. But it didn't describe the complete pin-mux like 
> I2C1, MMC1, etc.

pinmux should be added for whatever nodes are added since pimux is part
of node.

> So the expectation here is only to add ECAP2 pin-mux. Is it correct?

No, please add pinmux information for all the IP nodes you are adding. I
am not insisting that you add all IP nodes at the same time. You can add
whatever you have tested.

Thanks,
Sekhar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the staging tree with the vfs tree

2013-04-09 Thread Stephen Rothwell

Hi Greg,

Today's linux-next merge of the staging tree got a conflict in
drivers/staging/vt6655/device_main.c between commit f805442e130c
("vt6655: slightly clean reading config file") from the vfs tree and
commits 915006cddc79 ("staging:vt6655:device_main: Whitespace cleanups")
and 5e0cc8a231be ("staging: vt6655: Convert to kernel brace style") from
the staging tree.

I fixed it up (I think - see below) and can carry the fix as necessary
(no action is required).

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

diff --cc drivers/staging/vt6655/device_main.c
index a89ab9b,be4f6c2..000
--- a/drivers/staging/vt6655/device_main.c
+++ b/drivers/staging/vt6655/device_main.c
@@@ -2933,39 -2723,61 +2724,39 @@@ static inline u32 ether_crc(int length
  
  //2008-8-4  by chester
  static int Config_FileGetParameter(unsigned char *string,
-   unsigned char *dest, unsigned char *source)
+  unsigned char *dest, unsigned char *source)
  {
-   unsigned char buf1[100];
-   int source_len = strlen(source);
+   unsigned char buf1[100];
+   int source_len = strlen(source);
  
- memset(buf1,0,100);
- strcat(buf1, string);
- strcat(buf1, "=");
- source+=strlen(buf1);
+   memset(buf1, 0, 100);
+   strcat(buf1, string);
+   strcat(buf1, "=");
+   source += strlen(buf1);
  
-memcpy(dest,source,source_len-strlen(buf1));
-  return true;
+   memcpy(dest, source, source_len - strlen(buf1));
+   return true;
  }
  
- int Config_FileOperation(PSDevice pDevice,bool fwrite,unsigned char 
*Parameter)
 -int Config_FileOperation(PSDevice pDevice, bool fwrite, unsigned char 
*Parameter) {
 -  unsigned char *config_path = CONFIG_PATH;
 -  unsigned char *buffer = NULL;
++int Config_FileOperation(PSDevice pDevice, bool fwrite, unsigned char 
*Parameter)
 +{
 +  unsigned char *buffer = kmalloc(1024, GFP_KERNEL);
unsigned char tmpbuffer[20];
 -  struct file   *filp = NULL;
 -  mm_segment_t old_fs = get_fs();
 -  //int oldfsuid=0,oldfsgid=0;
 +  struct file *file;
-   int result=0;
+   int result = 0;
  
 -  set_fs(KERNEL_DS);
 -
 -  /* Can't do this anymore, so we rely on correct filesystem permissions:
 -  //Make sure a caller can read or write power as root
 -  oldfsuid=current->cred->fsuid;
 -  oldfsgid=current->cred->fsgid;
 -  current->cred->fsuid = 0;
 -  current->cred->fsgid = 0;
 -  */
 -
 -  //open file
 -  filp = filp_open(config_path, O_RDWR, 0);
 -  if (IS_ERR(filp)) {
 -  printk("Config_FileOperation:open file fail?\n");
 -  result = -1;
 -  goto error2;
 -  }
 -
 -  if (!(filp->f_op) || !(filp->f_op->read) || !(filp->f_op->write)) {
 -  printk("file %s cann't readable or writable?\n", config_path);
 -  result = -1;
 -  goto error1;
 -  }
 -
 -  buffer = kmalloc(1024, GFP_KERNEL);
 -  if (buffer == NULL) {
 +  if (!buffer) {
printk("allocate mem for file fail?\n");
 -  result = -1;
 -  goto error1;
 +  return -1;
 +  }
 +  file = filp_open(CONFIG_PATH, O_RDONLY, 0);
 +  if (IS_ERR(file)) {
 +  kfree(buffer);
 +  printk("Config_FileOperation:open file fail?\n");
 +  return -1;
}
  
 -  if (filp->f_op->read(filp, buffer, 1024, >f_pos) < 0) {
 +  if (kernel_read(file, 0, buffer, 1024) < 0) {
printk("read file error?\n");
result = -1;
goto error1;


pgpRjWMAcaP52.pgp
Description: PGP signature

Re: linux-next: Tree for Apr 9 [cpufreq: NULL pointer deref]

2013-04-09 Thread Sedat Dilek

On Wed, Apr 10, 2013 at 7:41 AM, Sedat Dilek  wrote:
> On Tue, Apr 9, 2013 at 6:51 PM, Viresh Kumar  wrote:
>> On 9 April 2013 21:38, Sedat Dilek  wrote:
>>> With x=3 the system gets in an unuseable state.
>>>
>>>  root# echo 0 > /sys/devices/system/cpu/cpu3/online
>>>
>>> I could not write my reply and had to do a hard/cold reboot.
>>> The dmesg log I saw looked similiar to my digicam-shot.
>>
>> Few things i need from you. First is output of cpufreq-info. Then
>> all the steps you did to reproduce above? Removed any other cpus?
>>
>
> Here is the output of cpufreq-info of the stable distro-kernel I am using.
> If you need the one from the "BROKEN" kernel, please let me know.
>
> - Sedat -

Hmm, I see that the kernel-sources itself ships a...

./tools/power/cpupower/utils/cpufreq-info.c

IIRC current 'make deb-pkg' does not build the tools, but I have seen
a patch on linux-kbuild ML.
Can I build cpufreq-info.c afterwards? Do I need a already-compiled build-dir?

Thanks in advance for answering my questions.

BTW, I have found a nice article on LWN "cpupowerutils - cpufrequtils
extended with quite some features" (see [1]).

- Sedat -

[1] http://lwn.net/Articles/433002/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 0/3] Support memory hot-delete to boot memory

2013-04-09 Thread David Rientjes

On Mon, 8 Apr 2013, Toshi Kani wrote:

> > So we don't need this new code if CONFIG_MEMORY_HOTPLUG=n?  If so, can
> > we please arrange for it to not be present if the user doesn't need it?
> 
> Good point!  Yes, since the new function is intended for memory
> hot-delete and is only called from __remove_pages() in
> mm/memory_hotplug.c, it should be added as #ifdef CONFIG_MEMORY_HOTPLUG
> in PATCH 2/3.
> 
> I will make the change, and send an updated patch to PATCH 2/3.
> 

It should actually depend on CONFIG_MEMORY_HOTREMOVE, but the pseries 
OF_RECONFIG_DETACH_NODE code seems to be the only code that doesn't 
make that distinction.  CONFIG_MEMORY_HOTREMOVE acts as a wrapper to 
protect configs that don't have ARCH_ENABLE_MEMORY_HOTREMOVE, so we'll 
want to keep it around and presumably that powerpc code depends on it as 
well.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] mm, vmscan: count accidental reclaimed pages failed to put into lru

2013-04-09 Thread Joonsoo Kim

Hello, Minchan.

On Tue, Apr 09, 2013 at 02:55:14PM +0900, Minchan Kim wrote:
> Hello Joonsoo,
> 
> On Tue, Apr 09, 2013 at 10:21:16AM +0900, Joonsoo Kim wrote:
> > In shrink_(in)active_list(), we can fail to put into lru, and these pages
> > are reclaimed accidentally. Currently, these pages are not counted
> > for sc->nr_reclaimed, but with this information, we can stop to reclaim
> > earlier, so can reduce overhead of reclaim.
> > 
> > Signed-off-by: Joonsoo Kim 
> 
> Nice catch!
> 
> But this patch handles very corner case and makes reclaim function's name
> rather stupid so I'd like to see text size change after we apply this patch.
> Other nipicks below.

Ah... Yes.
I can re-work it to add number to sc->nr_reclaimed directly for both cases,
shrink_active_list() and age_active_anon().

> 
> > 
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 0f615eb..5d60ae0 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -365,7 +365,7 @@ void *alloc_pages_exact_nid(int nid, size_t size, gfp_t 
> > gfp_mask);
> >  extern void __free_pages(struct page *page, unsigned int order);
> >  extern void free_pages(unsigned long addr, unsigned int order);
> >  extern void free_hot_cold_page(struct page *page, int cold);
> > -extern void free_hot_cold_page_list(struct list_head *list, int cold);
> > +extern unsigned long free_hot_cold_page_list(struct list_head *list, int 
> > cold);
> >  
> >  extern void __free_memcg_kmem_pages(struct page *page, unsigned int order);
> >  extern void free_memcg_kmem_pages(unsigned long addr, unsigned int order);
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 8fcced7..a5f3952 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1360,14 +1360,18 @@ out:
> >  /*
> >   * Free a list of 0-order pages
> >   */
> > -void free_hot_cold_page_list(struct list_head *list, int cold)
> > +unsigned long free_hot_cold_page_list(struct list_head *list, int cold)
> >  {
> > +   unsigned long nr_reclaimed = 0;
> 
> How about nr_free or nr_freed for consistent with function title?

Okay.

> 
> > struct page *page, *next;
> >  
> > list_for_each_entry_safe(page, next, list, lru) {
> > trace_mm_page_free_batched(page, cold);
> > free_hot_cold_page(page, cold);
> > +   nr_reclaimed++;
> > }
> > +
> > +   return nr_reclaimed;
> >  }
> >  
> >  /*
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 88c5fed..eff2927 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -915,7 +915,6 @@ static unsigned long shrink_page_list(struct list_head 
> > *page_list,
> >  */
> > __clear_page_locked(page);
> >  free_it:
> > -   nr_reclaimed++;
> >  
> > /*
> >  * Is there need to periodically free_page_list? It would
> > @@ -954,7 +953,7 @@ keep:
> > if (nr_dirty && nr_dirty == nr_congested && global_reclaim(sc))
> > zone_set_flag(zone, ZONE_CONGESTED);
> >  
> > -   free_hot_cold_page_list(_pages, 1);
> > +   nr_reclaimed += free_hot_cold_page_list(_pages, 1);
> 
> Nice cleanup.
> 
> >  
> > list_splice(_pages, page_list);
> > count_vm_events(PGACTIVATE, pgactivate);
> > @@ -1321,7 +1320,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct 
> > lruvec *lruvec,
> > if (nr_taken == 0)
> > return 0;
> >  
> > -   nr_reclaimed = shrink_page_list(_list, zone, sc, TTU_UNMAP,
> > +   nr_reclaimed += shrink_page_list(_list, zone, sc, TTU_UNMAP,
> > _dirty, _writeback, false);
> 
> Do you have any reason to change?
> To me, '=' is more clear to initialize the variable.
> When I see above, I have to look through above lines to catch where code
> used the nr_reclaimed.
> 

There is no reason, I will change it.

> >  
> > spin_lock_irq(>lru_lock);
> > @@ -1343,7 +1342,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct 
> > lruvec *lruvec,
> >  
> > spin_unlock_irq(>lru_lock);
> >  
> > -   free_hot_cold_page_list(_list, 1);
> > +   nr_reclaimed += free_hot_cold_page_list(_list, 1);
> 
> How about considering vmstat, too?
> It could be minor but you are considering freed page as
> reclaim context. (ie, sc->nr_reclaimed) so it would be more appropriate.

I don't understand what you mean.
Please explain more what you have in mind :)

> 
> >  
> > /*
> >  * If reclaim is isolating dirty pages under writeback, it implies
> > @@ -1438,7 +1437,7 @@ static void move_active_pages_to_lru(struct lruvec 
> > *lruvec,
> > __count_vm_events(PGDEACTIVATE, pgmoved);
> >  }
> >  
> > -static void shrink_active_list(unsigned long nr_to_scan,
> > +static unsigned long shrink_active_list(unsigned long nr_to_scan,
> >struct lruvec *lruvec,
> >struct scan_control *sc,
> >enum lru_list lru)
> > @@ -1534,7 +1533,7 @@ static void shrink_active_list(unsigned long 
> > nr_to_scan,
> >

Re: linux-next: Tree for Apr 9 [cpufreq: NULL pointer deref]

2013-04-09 Thread Sedat Dilek

On Tue, Apr 9, 2013 at 6:51 PM, Viresh Kumar  wrote:
> On 9 April 2013 21:38, Sedat Dilek  wrote:
>> With x=3 the system gets in an unuseable state.
>>
>>  root# echo 0 > /sys/devices/system/cpu/cpu3/online
>>
>> I could not write my reply and had to do a hard/cold reboot.
>> The dmesg log I saw looked similiar to my digicam-shot.
>
> Few things i need from you. First is output of cpufreq-info. Then
> all the steps you did to reproduce above? Removed any other cpus?
>

Here is the output of cpufreq-info of the stable distro-kernel I am using.
If you need the one from the "BROKEN" kernel, please let me know.

- Sedat -
cpufrequtils 007: cpufreq-info (C) Dominik Brodowski 2004-2009
Report errors and bugs to cpuf...@vger.kernel.org, please.
analyzing CPU 0:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0 1 2 3
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency: 10.0 us.
  hardware limits: 800 MHz - 1.60 GHz
  available frequency steps: 1.60 GHz, 1.60 GHz, 1.50 GHz, 1.40 GHz, 1.30 GHz, 
1.20 GHz, 1.10 GHz, 1000 MHz, 900 MHz, 800 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, 
performance
  current policy: frequency should be within 800 MHz and 1.60 GHz.
  The governor "ondemand" may decide which speed to use
  within this range.
  current CPU frequency is 1.60 GHz (asserted by call to hardware).
  cpufreq stats: 1.60 GHz:18.87%, 1.60 GHz:0.89%, 1.50 GHz:0.45%, 1.40 
GHz:0.89%, 1.30 GHz:1.15%, 1.20 GHz:1.75%, 1.10 GHz:1.83%, 1000 MHz:1.91%, 900 
MHz:0.92%, 800 MHz:71.34%  (6355)
analyzing CPU 1:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0 1 2 3
  CPUs which need to have their frequency coordinated by software: 1
  maximum transition latency: 10.0 us.
  hardware limits: 800 MHz - 1.60 GHz
  available frequency steps: 1.60 GHz, 1.60 GHz, 1.50 GHz, 1.40 GHz, 1.30 GHz, 
1.20 GHz, 1.10 GHz, 1000 MHz, 900 MHz, 800 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, 
performance
  current policy: frequency should be within 800 MHz and 1.60 GHz.
  The governor "ondemand" may decide which speed to use
  within this range.
  current CPU frequency is 1.60 GHz (asserted by call to hardware).
  cpufreq stats: 1.60 GHz:18.68%, 1.60 GHz:3.85%, 1.50 GHz:0.35%, 1.40 
GHz:0.52%, 1.30 GHz:0.74%, 1.20 GHz:0.72%, 1.10 GHz:0.77%, 1000 MHz:1.02%, 900 
MHz:0.44%, 800 MHz:72.91%  (3815)
analyzing CPU 2:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0 1 2 3
  CPUs which need to have their frequency coordinated by software: 2
  maximum transition latency: 10.0 us.
  hardware limits: 800 MHz - 1.60 GHz
  available frequency steps: 1.60 GHz, 1.60 GHz, 1.50 GHz, 1.40 GHz, 1.30 GHz, 
1.20 GHz, 1.10 GHz, 1000 MHz, 900 MHz, 800 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, 
performance
  current policy: frequency should be within 800 MHz and 1.60 GHz.
  The governor "ondemand" may decide which speed to use
  within this range.
  current CPU frequency is 1.60 GHz (asserted by call to hardware).
  cpufreq stats: 1.60 GHz:21.21%, 1.60 GHz:0.26%, 1.50 GHz:0.34%, 1.40 
GHz:0.48%, 1.30 GHz:0.59%, 1.20 GHz:0.73%, 1.10 GHz:0.75%, 1000 MHz:1.16%, 900 
MHz:0.56%, 800 MHz:73.91%  (4108)
analyzing CPU 3:
  driver: acpi-cpufreq
  CPUs which run at the same hardware frequency: 0 1 2 3
  CPUs which need to have their frequency coordinated by software: 3
  maximum transition latency: 10.0 us.
  hardware limits: 800 MHz - 1.60 GHz
  available frequency steps: 1.60 GHz, 1.60 GHz, 1.50 GHz, 1.40 GHz, 1.30 GHz, 
1.20 GHz, 1.10 GHz, 1000 MHz, 900 MHz, 800 MHz
  available cpufreq governors: conservative, ondemand, userspace, powersave, 
performance
  current policy: frequency should be within 800 MHz and 1.60 GHz.
  The governor "ondemand" may decide which speed to use
  within this range.
  current CPU frequency is 800 MHz (asserted by call to hardware).
  cpufreq stats: 1.60 GHz:16.28%, 1.60 GHz:0.28%, 1.50 GHz:0.22%, 1.40 
GHz:0.30%, 1.30 GHz:0.43%, 1.20 GHz:0.69%, 1.10 GHz:0.79%, 1000 MHz:1.17%, 900 
MHz:0.62%, 800 MHz:79.22%  (2995)

Re: [PATCH 00/10] Add Intel Atom S1200 seris ioatdma support

2013-04-09 Thread Vinod Koul

On Wed, Apr 10, 2013 at 10:30:59AM +0530, Vinod Koul wrote:
> On Tue, Apr 09, 2013 at 05:53:59PM -0700, Dan Williams wrote:
> > On Tue, Apr 9, 2013 at 12:28 AM, Vinod Koul  wrote:
> > > Are you okay with the series. Merge window is very close...
> > >
> > >
> > Hi Vinod, thanks for the ping.
> > 
> > Chatted with Dave patches 1-3 and 5-8, and 10 are reviewed/acked.  For
> > patch 4 and 9 we think we have a slightly cleaner way to organize the quirk
> > handling and an update will be coming.
> Dave,
> 
> I tried these, but they fail to apply on for-linus branch of my tree.
> Cna you pls rebase these 8 patches and resend.
Okay i forgot about the HSW ID patch. Applied that and then these worked till
last two, you would need to rebase those two only

Have applied rest
--
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] dmaengine: omap-dma: Start DMA without delay for cyclic channels

2013-04-09 Thread Vinod Koul

On Tue, Apr 09, 2013 at 04:33:06PM +0200, Peter Ujfalusi wrote:
> cyclic DMA is only used by audio which needs DMA to be started without a
> delay.
> If the DMA for audio is started using the tasklet we experience random
> channel switch (to be more precise: channel shift).
> 
> Reported-by: Peter Meerwald 
> CC: sta...@vger.kernel.org  # v3.7+
> Signed-off-by: Peter Ujfalusi 
> Acked-by: Santosh Shilimkar 
> Acked-by: Russell King 
> ---
> Hi Vinod,
> 
> Would it be possible to send this patch for 3.9. The channel shift (or switch)
> issue in audio has been noticed recently and it turns out that it has been
> present since 3.7 kernel.
> It would be great if 3.9 kernel could work correctly out of box...
Applied to fixes. I will send this to linus in a day...

--
~Vinod
> 
> Changes since RFCv2:
> - added Acked-by from Santosh and Russell
> 
> Thank you,
> Peter
> 
>  drivers/dma/omap-dma.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/omap-dma.c b/drivers/dma/omap-dma.c
> index 2ea3d7e..ec3fc4f 100644
> --- a/drivers/dma/omap-dma.c
> +++ b/drivers/dma/omap-dma.c
> @@ -282,12 +282,20 @@ static void omap_dma_issue_pending(struct dma_chan 
> *chan)
>  
>   spin_lock_irqsave(>vc.lock, flags);
>   if (vchan_issue_pending(>vc) && !c->desc) {
> - struct omap_dmadev *d = to_omap_dma_dev(chan->device);
> - spin_lock(>lock);
> - if (list_empty(>node))
> - list_add_tail(>node, >pending);
> - spin_unlock(>lock);
> - tasklet_schedule(>task);
> + /*
> +  * c->cyclic is used only by audio and in this case the DMA need
> +  * to be started without delay.
> +  */
> + if (!c->cyclic) {
> + struct omap_dmadev *d = to_omap_dma_dev(chan->device);
> + spin_lock(>lock);
> + if (list_empty(>node))
> + list_add_tail(>node, >pending);
> + spin_unlock(>lock);
> + tasklet_schedule(>task);
> + } else {
> + omap_dma_start_desc(c);
> + }
>   }
>   spin_unlock_irqrestore(>vc.lock, flags);
>  }
> -- 
> 1.8.1.5
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/10] Add Intel Atom S1200 seris ioatdma support

2013-04-09 Thread Jiang, Dave

On Apr 9, 2013, at 10:30 PM, "Koul, Vinod"  wrote:

> On Tue, Apr 09, 2013 at 05:53:59PM -0700, Dan Williams wrote:
>> On Tue, Apr 9, 2013 at 12:28 AM, Vinod Koul  wrote:
>>> Are you okay with the series. Merge window is very close...
>>> 
>>> 
>> Hi Vinod, thanks for the ping.
>> 
>> Chatted with Dave patches 1-3 and 5-8, and 10 are reviewed/acked.  For
>> patch 4 and 9 we think we have a slightly cleaner way to organize the quirk
>> handling and an update will be coming.
> Dave,
> 
> I tried these, but they fail to apply on for-linus branch of my tree.
> Cna you pls rebase these 8 patches and resend.
> 
> --
> ~Vinod

Yes I will rebase and send them with the other two changed patches. It depends 
on the haswell patch. Could it be that?--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH v2 6/6] powerpc: Use generic code for exception handling

2013-04-09 Thread Li Zhong

On Wed, 2013-04-10 at 14:56 +1000, Michael Ellerman wrote:
> On Fri, Mar 29, 2013 at 06:00:21PM +0800, Li Zhong wrote:
> > After the exception handling moved to generic code, and some changes in
> ...
> > diff --git a/arch/powerpc/mm/hash_utils_64.c 
> > b/arch/powerpc/mm/hash_utils_64.c
> > index 360fba8..eeab30f 100644
> > --- a/arch/powerpc/mm/hash_utils_64.c
> > +++ b/arch/powerpc/mm/hash_utils_64.c
> > @@ -33,6 +33,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -56,7 +57,6 @@
> >  #include 
> >  #include 
> >  #include 
> > -#include 
> >  
> >  #ifdef DEBUG
> >  #define DBG(fmt...) udbg_printf(fmt)
> > @@ -919,13 +919,17 @@ int hash_page(unsigned long ea, unsigned long access, 
> > unsigned long trap)
> > const struct cpumask *tmp;
> > int rc, user_region = 0, local = 0;
> > int psize, ssize;
> > +   enum ctx_state prev_state;
> > +
> > +   prev_state = exception_enter();
> >  
> > DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
> > ea, access, trap);
> >  
> > if ((ea & ~REGION_MASK) >= PGTABLE_RANGE) {
> > DBG_LOW(" out of pgtable range !\n");
> > -   return 1;
> > +   rc = 1;
> > +   goto exit;
> > }
> >  
> > /* Get region & vsid */
> 
> This no longer applies on mainline, please send an updated version.

Yes, for current mainline (powerpc tree), only previous five patches
could be applied. The dependency of this patch is current in tip tree,
and seems would be in for 3.10.

There are some more details in the cover letter (#0):

"I assume these patches would get in through powerpc tree, so I didn't
combine the new patch (#6) with the original one (#2). So that if
powerpc tree picks these, it could pick the first five patches, and
apply patch #6 later when the dependency enters into powerpc tree (maybe
on some 3.10-rcs)."

Thanks, Zhong

> cheers
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH v3 3/3] ARM: davinci: da850: add EHRPWM & ECAP DT node

2013-04-09 Thread Philip, Avinash

On Tue, Apr 09, 2013 at 17:05:25, Nori, Sekhar wrote:
> On 4/9/2013 2:12 PM, Philip, Avinash wrote:
> > On Mon, Apr 08, 2013 at 18:39:57, Nori, Sekhar wrote:
> >>
> >> On 4/8/2013 2:39 PM, Philip, Avinash wrote:
> >>> On Tue, Apr 02, 2013 at 14:03:34, Nori, Sekhar wrote:
>  On 3/25/2013 1:19 PM, Philip Avinash wrote:
> > Add da850 EHRPWM & ECAP DT node.
> > Also adds OF_DEV_AUXDATA for EHRPWM & ECAP driver to use EHRPWM & ECAP
> > clock.
> 
>  This looks fine to me but I will wait for the bindings to get accepted
>  before taking this one.
> >>>
> >>> Sekhar,
> >>>
> >>> Binding document got accepted in PWM tree [1].
> >>> Can you accept this patch?
> >>
> >> Can you also add the pinmux definitions and resend just this patch?
> >> Sorry I did not notice those were missing earlier.
> > 
> > According to latest schematics, ECAP instance 2 being used for PWM backlight
> > control. Should I add pin-mux only for ECAP2 or for all PWM instances?
> 
> I meant add definitions in .dtsi. Since there is only one pin a given
> functionality can be present on in DaVinci, it can be done in a board
> independent manner.

I think here the expectation would be that .dtsi should populate the complete
pin-mux for SOC and board files should just be able to re-use it (add it as a 
phandler).
Also as per the above description .dtsi file will end up contain majorly 
pin-mux info
rather than the hardware data. Is it a good idea?

On looking da850.dtsi file NAND pins were defined for 8-bit part. 
In case of NAND flash, the device might be sitting under different chip-select 
or may
have 16 bit part on  different boards. So pin-mux defined in soc.dtsi has to be 
split
separately for CS, DATA, Address.

So it is always challenging to create pin-mux info in .dtsi file. So more 
useful/meaningful
way is to actually create pin-mux in board file rather in .dtsi file.

> See examples for other peripherals in existing
> da850.dtsi file.

I have gone through .dtsi. But it didn't describe the complete pin-mux like 
I2C1, MMC1, etc.
So the expectation here is only to add ECAP2 pin-mux. Is it correct?

Thanks
Avinash

> 
> Thanks,
> Sekhar
>

Re: [PATCH v3 02/22] x86, microcode: Use common get_ramdisk_image()

2013-04-09 Thread Tang Chen


On 04/05/2013 07:46 AM, Yinghai Lu wrote:

Use common get_ramdisk_image() to get ramdisk start phys address.

We need this to get correct ramdisk adress for 64bit bzImage that
initrd can be loaded above 4G by kexec-tools.

Signed-off-by: Yinghai Lu
Cc: Fenghua Yu
Acked-by: Tejun Heo
---
  arch/x86/kernel/microcode_intel_early.c | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/microcode_intel_early.c 
b/arch/x86/kernel/microcode_intel_early.c
index d893e8e..ea57bd8 100644
--- a/arch/x86/kernel/microcode_intel_early.c
+++ b/arch/x86/kernel/microcode_intel_early.c
@@ -742,8 +742,8 @@ load_ucode_intel_bsp(void)
struct boot_params *boot_params_p;

boot_params_p = (struct boot_params *)__pa_nodebug(_params);
-   ramdisk_image = boot_params_p->hdr.ramdisk_image;
-   ramdisk_size  = boot_params_p->hdr.ramdisk_size;
+   ramdisk_image = get_ramdisk_image(boot_params_p);
+   ramdisk_size  = get_ramdisk_image(boot_params_p);


Shoule be get_ramdisk_size(boot_params_p) ?


initrd_start_early = ramdisk_image;
initrd_end_early = initrd_start_early + ramdisk_size;

@@ -752,8 +752,8 @@ load_ucode_intel_bsp(void)
(unsigned long *)__pa_nodebug(_saved_in_initrd),
initrd_start_early, initrd_end_early,);
  #else
-   ramdisk_image = boot_params.hdr.ramdisk_image;
-   ramdisk_size  = boot_params.hdr.ramdisk_size;
+   ramdisk_image = get_ramdisk_image(_params);
+   ramdisk_size  = get_ramdisk_size(_params);
initrd_start_early = ramdisk_image + PAGE_OFFSET;
initrd_end_early = initrd_start_early + ramdisk_size;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/10] Add Intel Atom S1200 seris ioatdma support

2013-04-09 Thread Vinod Koul

On Tue, Apr 09, 2013 at 05:53:59PM -0700, Dan Williams wrote:
> On Tue, Apr 9, 2013 at 12:28 AM, Vinod Koul  wrote:
> > Are you okay with the series. Merge window is very close...
> >
> >
> Hi Vinod, thanks for the ping.
> 
> Chatted with Dave patches 1-3 and 5-8, and 10 are reviewed/acked.  For
> patch 4 and 9 we think we have a slightly cleaner way to organize the quirk
> handling and an update will be coming.
Dave,

I tried these, but they fail to apply on for-linus branch of my tree.
Cna you pls rebase these 8 patches and resend.

--
~Vinod
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: wake-affine throttle

2013-04-09 Thread Alex Shi

On 04/10/2013 01:11 PM, Michael Wang wrote:
>> > BTW, could you try the kbulid, hackbench and aim for this?
> Sure, the patch has already been tested with aim7, also the hackbench,
> kbench, and ebizzy, no notable changes on my box with the default 1ms
> interval.

That's fine.

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv3] driver: serial: prevent UART console idle on suspend while using "no_console_suspend"

2013-04-09 Thread Sourav Poddar


Hi Russell,
On Monday 08 April 2013 10:44 PM, Russell King - ARM Linux wrote:

On Fri, Apr 05, 2013 at 06:45:33PM +0530, Sourav Poddar wrote:

With dt boot, uart wakeup after suspend is non functional while using
"no_console_suspend" in the bootargs. With "no_console_suspend" used, we
should prevent the runtime suspend of the uart port which is getting used
as an console.

Cc: Santosh Shilimkar
Cc: Felipe Balbi
Cc: Rajendra nayak
Tested on omap5430evm, omap4430sdp.

Signed-off-by: Sourav Poddar
---
v2->v3
Based on Kevin Hilman and Santosh Shilimkar comments, modified
serial core/driver layer to bypass runtime suspend
for console uart while using "no_console_suspend".

This patch is based on Santosh Shilimkar serial patch[1]

Rather than introducing this "port_is_console" thing, please move
uart_console() into the serial_core.h header file, making it an inline
function, and use that in omap-serial.c.

Remember to fix drivers/tty/serial/mpc52xx_uart.c as well for that change.

Thanks for the pointer. Will take care of your suggestions
in the next version.

~Sourav
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] mm, slub: count freed pages via rcu as this task's reclaimed_slab

2013-04-09 Thread Joonsoo Kim

Hello, Christoph.

On Tue, Apr 09, 2013 at 02:28:06PM +, Christoph Lameter wrote:
> On Tue, 9 Apr 2013, Joonsoo Kim wrote:
> 
> > Currently, freed pages via rcu is not counted for reclaimed_slab, because
> > it is freed in rcu context, not current task context. But, this free is
> > initiated by this task, so counting this into this task's reclaimed_slab
> > is meaningful to decide whether we continue reclaim, or not.
> > So change code to count these pages for this task's reclaimed_slab.
> 
> slab->reclaim_state guides the reclaim actions in vmscan.c. With this
> patch slab->reclaim_state could get quite a high value without new pages being
> available for allocation. slab->reclaim_state will only be updated
> when the RCU period ends.

Okay.

In addition, there is a little place who use SLAB_DESTROY_BY_RCU.
I will drop this patch[2/3] and [3/3] for next spin.

Thanks.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority

2013-04-09 Thread Joonsoo Kim

Hello, Dave.

On Wed, Apr 10, 2013 at 11:07:34AM +1000, Dave Chinner wrote:
> On Tue, Apr 09, 2013 at 12:13:59PM +0100, Mel Gorman wrote:
> > On Tue, Apr 09, 2013 at 03:53:25PM +0900, Joonsoo Kim wrote:
> > 
> > > I think that outside of zone loop is better place to run shrink_slab(),
> > > because shrink_slab() is not directly related to a specific zone.
> > > 
> > 
> > This is true and has been the case for a long time. The slab shrinkers
> > are not zone aware and it is complicated by the fact that slab usage can
> > indirectly pin memory on other zones.
> ..
> > > And this is a question not related to this patch.
> > > Why nr_slab is used here to decide zone->all_unreclaimable?
> > 
> > Slab is not directly associated with a slab but as reclaiming slab can
> > free memory from unpredictable zones we do not consider a zone to be
> > fully unreclaimable until we cannot shrink slab any more.
> 
> This is something the numa aware shrinkers will greatly help with -
> instead of being a global shrink it becomes a
> node-the-zone-belongs-to shrink, and so
> 
> > You may be thinking that this is extremely heavy handed and you're
> > right, it is.
> 
> ... it is much less heavy handed than the current code...
> 
> > > nr_slab is not directly related whether a specific zone is reclaimable
> > > or not, and, moreover, nr_slab is not directly related to number of
> > > reclaimed pages. It just say some objects in the system are freed.
> > > 
> > 
> > All true, it's the indirect relation between slab objects and the memory
> > that is freed when slab objects are reclaimed that has to be taken into
> > account.
> 
> Node awareness within the shrinker infrastructure and LRUs make the
> relationship much more direct ;)

Yes, I think so ;)

Thanks.

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority

2013-04-09 Thread Joonsoo Kim

Hello, Mel.

On Tue, Apr 09, 2013 at 12:13:59PM +0100, Mel Gorman wrote:
> On Tue, Apr 09, 2013 at 03:53:25PM +0900, Joonsoo Kim wrote:
> > Hello, Mel.
> > Sorry for too late question.
> > 
> 
> No need to apologise at all.
> 
> > On Sun, Mar 17, 2013 at 01:04:14PM +, Mel Gorman wrote:
> > > If kswaps fails to make progress but continues to shrink slab then it'll
> > > either discard all of slab or consume CPU uselessly scanning shrinkers.
> > > This patch causes kswapd to only call the shrinkers once per priority.
> > > 
> > > Signed-off-by: Mel Gorman 
> > > ---
> > >  mm/vmscan.c | 28 +---
> > >  1 file changed, 21 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 7d5a932..84375b2 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -2661,9 +2661,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, 
> > > int order, long remaining,
> > >   */
> > >  static bool kswapd_shrink_zone(struct zone *zone,
> > >  struct scan_control *sc,
> > > -unsigned long lru_pages)
> > > +unsigned long lru_pages,
> > > +bool shrinking_slab)
> > >  {
> > > - unsigned long nr_slab;
> > > + unsigned long nr_slab = 0;
> > >   struct reclaim_state *reclaim_state = current->reclaim_state;
> > >   struct shrink_control shrink = {
> > >   .gfp_mask = sc->gfp_mask,
> > > @@ -2673,9 +2674,15 @@ static bool kswapd_shrink_zone(struct zone *zone,
> > >   sc->nr_to_reclaim = max(SWAP_CLUSTER_MAX, high_wmark_pages(zone));
> > >   shrink_zone(zone, sc);
> > >  
> > > - reclaim_state->reclaimed_slab = 0;
> > > - nr_slab = shrink_slab(, sc->nr_scanned, lru_pages);
> > > - sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> > > + /*
> > > +  * Slabs are shrunk for each zone once per priority or if the zone
> > > +  * being balanced is otherwise unreclaimable
> > > +  */
> > > + if (shrinking_slab || !zone_reclaimable(zone)) {
> > > + reclaim_state->reclaimed_slab = 0;
> > > + nr_slab = shrink_slab(, sc->nr_scanned, lru_pages);
> > > + sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> > > + }
> > >  
> > >   if (nr_slab == 0 && !zone_reclaimable(zone))
> > >   zone->all_unreclaimable = 1;
> > 
> > Why shrink_slab() is called here?
> 
> Preserves existing behaviour.

Yes, but, with this patch, existing behaviour is changed, that is, we call
shrink_slab() once per priority. For now, there is no reason this function
is called here. How about separating it and executing it outside of
zone loop?

We can do it with another zone loop in order to decide a
zone->all_unreclaimble. Below is pseudo code from my quick thought.


for each zone
shrink_zone()
end

nr_slab = shrink_slab()

if (nr_slab == 0) {
for each zone
if (!zone_reclaimable)
zone->all_unreclaimble = 1
end
end

}

> 
> > I think that outside of zone loop is better place to run shrink_slab(),
> > because shrink_slab() is not directly related to a specific zone.
> > 
> 
> This is true and has been the case for a long time. The slab shrinkers
> are not zone aware and it is complicated by the fact that slab usage can
> indirectly pin memory on other zones. Consider for example a slab object
> that is an inode entry that is allocated from the Normal zone on a
> 32-bit machine. Reclaiming may free memory from the Highmem zone.
> 
> It's less obvious a problem on 64-bit machines but freeing slab objects
> from a zone like DMA32 can indirectly free memory from the Normal zone or
> even another node entirely.
> 
> > And this is a question not related to this patch.
> > Why nr_slab is used here to decide zone->all_unreclaimable?
> 
> Slab is not directly associated with a slab but as reclaiming slab can
> free memory from unpredictable zones we do not consider a zone to be
> fully unreclaimable until we cannot shrink slab any more.
> 
> You may be thinking that this is extremely heavy handed and you're
> right, it is.
> 
> > nr_slab is not directly related whether a specific zone is reclaimable
> > or not, and, moreover, nr_slab is not directly related to number of
> > reclaimed pages. It just say some objects in the system are freed.
> > 
> 
> All true, it's the indirect relation between slab objects and the memory
> that is freed when slab objects are reclaimed that has to be taken into
> account.
> 
> > This question comes from my ignorance, so please enlighten me.
> > 
> 
> I hope this clarifies matters.

Very helpful :)

Thanks.

> 
> -- 
> Mel Gorman
> SUSE Labs
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to

Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()

2013-04-09 Thread Ric Mason


Hi Michal,
On 04/09/2013 06:14 PM, Michal Hocko wrote:

On Tue 09-04-13 18:05:30, Simon Jeons wrote:
[...]

I try this in v3.9-rc5:
dd if=/dev/sda of=/dev/null bs=1MB
14813+0 records in
14812+0 records out
1481200 bytes (15 GB) copied, 105.988 s, 140 MB/s

free -m -s 1

   total   used   free shared buffers
cached
Mem:  7912   1181   6731  0 663239
-/+ buffers/cache:277   7634
Swap: 8011  0   8011

It seems that almost 15GB copied before I stop dd, but the used
pages which I monitor during dd always around 1200MB. Weird, why?


Sorry for waste your time, but the test result is weird, is it?

I am not sure which values you have been watching but you have to
realize that you are reading a _partition_ not a file and those pages
go into buffers rather than the page chache.


Interesting. ;-)

What's the difference between buffers and page cache? Why buffers don't 
grow?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: wake-affine throttle

2013-04-09 Thread Michael Wang

On 04/10/2013 12:16 PM, Alex Shi wrote:
> On 04/10/2013 11:30 AM, Michael Wang wrote:
>> Suggested-by: Peter Zijlstra 
>> Signed-off-by: Michael Wang 
> 
> Reviewed-by: Alex Shi 

Thanks for your review :)

> 
> BTW, could you try the kbulid, hackbench and aim for this?

Sure, the patch has already been tested with aim7, also the hackbench,
kbench, and ebizzy, no notable changes on my box with the default 1ms
interval.

Regards,
Michael Wang

> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [PATCH 2/2] powerpc/dma/raidengine: enable Freescale RaidEngine device

2013-04-09 Thread Shi Xuelin-B29237

Hi Dan & vinod,

Do you have any comments about this patch?

Thanks,
Forrest

-Original Message-
From: Shi Xuelin-B29237 
Sent: 2012年11月21日 17:01
To: dan.j.willi...@gmail.com; vinod.k...@intel.com; 
linuxppc-...@lists.ozlabs.org; linux-kernel@vger.kernel.org
Cc: i...@ovro.caltech.edu; Shi Xuelin-B29237; Rai Harninder-B01044; Burmi 
Naveen-B16502
Subject: [PATCH 2/2] powerpc/dma/raidengine: enable Freescale RaidEngine device

From: Xuelin Shi 

The RaidEngine is a new FSL hardware that used as hardware acceration for 
RAID5/6.

This patch enables the RaidEngine functionality and provides hardware 
offloading capability for memcpy, xor and raid6 pq computation. It works under 
dmaengine control with async_layer interface.

Signed-off-by: Harninder Rai 
Signed-off-by: Naveen Burmi 
Signed-off-by: Xuelin Shi 
---
 drivers/dma/Kconfig|   14 +
 drivers/dma/Makefile   |1 +
 drivers/dma/fsl_raid.c |  990 
 drivers/dma/fsl_raid.h |  317 
 4 files changed, 1322 insertions(+)
 create mode 100644 drivers/dma/fsl_raid.c  create mode 100644 
drivers/dma/fsl_raid.h

diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index d4c1218..aa37279 
100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -320,6 +320,20 @@ config MMP_PDMA
help
  Support the MMP PDMA engine for PXA and MMP platfrom.
 
+config FSL_RAID
+tristate "Freescale RAID Engine Device Driver"
+depends on FSL_SOC && !FSL_DMA
+select DMA_ENGINE
+select ASYNC_TX_ENABLE_CHANNEL_SWITCH
+select ASYNC_MEMCPY
+select ASYNC_XOR
+select ASYNC_PQ
+---help---
+  Enable support for Freescale RAID Engine. RAID Engine is
+  available on some QorIQ SoCs (like P5020). It has
+  the capability to offload RAID5/RAID6 operations from CPU.
+  RAID5 is XOR and memcpy. RAID6 is P/Q and memcpy
+
 config DMA_ENGINE
bool
 
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile index 7428fea..29b65eb 
100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -9,6 +9,7 @@ obj-$(CONFIG_DMATEST) += dmatest.o
 obj-$(CONFIG_INTEL_IOATDMA) += ioat/
 obj-$(CONFIG_INTEL_IOP_ADMA) += iop-adma.o
 obj-$(CONFIG_FSL_DMA) += fsldma.o
+obj-$(CONFIG_FSL_RAID) += fsl_raid.o
 obj-$(CONFIG_MPC512X_DMA) += mpc512x_dma.o
 obj-$(CONFIG_MV_XOR) += mv_xor.o
 obj-$(CONFIG_DW_DMAC) += dw_dmac.o
diff --git a/drivers/dma/fsl_raid.c b/drivers/dma/fsl_raid.c new file mode 
100644 index 000..ec19817
--- /dev/null
+++ b/drivers/dma/fsl_raid.c
@@ -0,0 +1,990 @@
+/*
+ * drivers/dma/fsl_raid.c
+ *
+ * Freescale RAID Engine device driver
+ *
+ * Author:
+ * Harninder Rai 
+ * Naveen Burmi 
+ *
+ * Copyright (c) 2010-2012 Freescale Semiconductor, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ * * Redistributions of source code must retain the above copyright
+ *   notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ *   notice, this list of conditions and the following disclaimer in the
+ *   documentation and/or other materials provided with the distribution.
+ * * Neither the name of Freescale Semiconductor nor the
+ *   names of its contributors may be used to endorse or promote products
+ *   derived from this software without specific prior written permission.
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of 
+the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY Freescale Semiconductor ``AS IS'' AND 
+ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
+IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
+ARE
+ * DISCLAIMED. IN NO EVENT SHALL Freescale Semiconductor BE LIABLE FOR 
+ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
+DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 
+SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 
+CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 
+OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE 
+USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * Theory of operation:
+ *
+ * General capabilities:
+ * RAID Engine (RE) block is capable of offloading XOR, memcpy and P/Q
+ * calculations required in RAID5 and RAID6 operations. RE driver
+ * registers with Linux's ASYNC layer as dma driver. RE hardware
+ * maintains strict ordering of the requests through chained
+ * command queueing.
+ *
+ * Data flow:
+ *

Re: [PATCH] powerpc: fixing ptrace_get_reg to return an error

2013-04-09 Thread Michael Neuling

Alexey Kardashevskiy  wrote:

> Currently ptrace_get_reg returns error as a value
> what make impossible to tell whether it is a correct value or error code.
> 
> The patch adds a parameter which points to the real return data and
> returns an error code.
> 
> As get_user_msr() never fails and it is used in multiple places so it has not
> been changed by this patch.
> 
> Signed-off-by: Alexey Kardashevskiy 

FWIW:
Acked-by: Michael Neuling 


> ---
>  arch/powerpc/include/asm/ptrace.h |3 ++-
>  arch/powerpc/kernel/ptrace.c  |   29 ++---
>  arch/powerpc/kernel/ptrace32.c|   15 ---
>  3 files changed, 32 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ptrace.h 
> b/arch/powerpc/include/asm/ptrace.h
> index 5f99568..becc08e 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -92,7 +92,8 @@ static inline long regs_return_value(struct pt_regs *regs)
>   } while(0)
>  
>  struct task_struct;
> -extern unsigned long ptrace_get_reg(struct task_struct *task, int regno);
> +extern int ptrace_get_reg(struct task_struct *task, int regno,
> +   unsigned long *data);
>  extern int ptrace_put_reg(struct task_struct *task, int regno,
> unsigned long data);
>  
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 245c1b6..d5ff7ea 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -180,9 +180,10 @@ static int set_user_msr(struct task_struct *task, 
> unsigned long msr)
>  }
>  
>  #ifdef CONFIG_PPC64
> -static unsigned long get_user_dscr(struct task_struct *task)
> +static int get_user_dscr(struct task_struct *task, unsigned long *data)
>  {
> - return task->thread.dscr;
> + *data = task->thread.dscr;
> + return 0;
>  }
>  
>  static int set_user_dscr(struct task_struct *task, unsigned long dscr)
> @@ -192,7 +193,7 @@ static int set_user_dscr(struct task_struct *task, 
> unsigned long dscr)
>   return 0;
>  }
>  #else
> -static unsigned long get_user_dscr(struct task_struct *task)
> +static int get_user_dscr(struct task_struct *task, unsigned long *data)
>  {
>   return -EIO;
>  }
> @@ -216,19 +217,23 @@ static int set_user_trap(struct task_struct *task, 
> unsigned long trap)
>  /*
>   * Get contents of register REGNO in task TASK.
>   */
> -unsigned long ptrace_get_reg(struct task_struct *task, int regno)
> +int ptrace_get_reg(struct task_struct *task, int regno, unsigned long *data)
>  {
> - if (task->thread.regs == NULL)
> + if ((task->thread.regs == NULL) || !data)
>   return -EIO;
>  
> - if (regno == PT_MSR)
> - return get_user_msr(task);
> + if (regno == PT_MSR) {
> + *data = get_user_msr(task);
> + return 0;
> + }
>  
>   if (regno == PT_DSCR)
> - return get_user_dscr(task);
> + return get_user_dscr(task, data);
>  
> - if (regno < (sizeof(struct pt_regs) / sizeof(unsigned long)))
> - return ((unsigned long *)task->thread.regs)[regno];
> + if (regno < (sizeof(struct pt_regs) / sizeof(unsigned long))) {
> + *data = ((unsigned long *)task->thread.regs)[regno];
> + return 0;
> + }
>  
>   return -EIO;
>  }
> @@ -1559,7 +1564,9 @@ long arch_ptrace(struct task_struct *child, long 
> request,
>  
>   CHECK_FULL_REGS(child->thread.regs);
>   if (index < PT_FPR0) {
> - tmp = ptrace_get_reg(child, (int) index);
> + ret = ptrace_get_reg(child, (int) index, );
> + if (ret)
> + break;
>   } else {
>   unsigned int fpidx = index - PT_FPR0;
>  
> diff --git a/arch/powerpc/kernel/ptrace32.c b/arch/powerpc/kernel/ptrace32.c
> index c0244e7..f51599e 100644
> --- a/arch/powerpc/kernel/ptrace32.c
> +++ b/arch/powerpc/kernel/ptrace32.c
> @@ -95,7 +95,9 @@ long compat_arch_ptrace(struct task_struct *child, 
> compat_long_t request,
>  
>   CHECK_FULL_REGS(child->thread.regs);
>   if (index < PT_FPR0) {
> - tmp = ptrace_get_reg(child, index);
> + ret = ptrace_get_reg(child, index, );
> + if (ret)
> + break;
>   } else {
>   flush_fp_to_thread(child);
>   /*
> @@ -148,7 +150,11 @@ long compat_arch_ptrace(struct task_struct *child, 
> compat_long_t request,
>   tmp = ((u64 *)child->thread.fpr)
>   [FPRINDEX_3264(numReg)];
>   } else { /* register within PT_REGS struct */
> - tmp = ptrace_get_reg(child, numReg);
> + unsigned long tmp2;
> + ret = ptrace_get_reg(child, numReg, );
> + if (ret)
> +

Re: [RFC PATCH v2 6/6] powerpc: Use generic code for exception handling

2013-04-09 Thread Michael Ellerman

On Fri, Mar 29, 2013 at 06:00:21PM +0800, Li Zhong wrote:
> After the exception handling moved to generic code, and some changes in
...
> diff --git a/arch/powerpc/mm/hash_utils_64.c b/arch/powerpc/mm/hash_utils_64.c
> index 360fba8..eeab30f 100644
> --- a/arch/powerpc/mm/hash_utils_64.c
> +++ b/arch/powerpc/mm/hash_utils_64.c
> @@ -33,6 +33,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -56,7 +57,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  
>  #ifdef DEBUG
>  #define DBG(fmt...) udbg_printf(fmt)
> @@ -919,13 +919,17 @@ int hash_page(unsigned long ea, unsigned long access, 
> unsigned long trap)
>   const struct cpumask *tmp;
>   int rc, user_region = 0, local = 0;
>   int psize, ssize;
> + enum ctx_state prev_state;
> +
> + prev_state = exception_enter();
>  
>   DBG_LOW("hash_page(ea=%016lx, access=%lx, trap=%lx\n",
>   ea, access, trap);
>  
>   if ((ea & ~REGION_MASK) >= PGTABLE_RANGE) {
>   DBG_LOW(" out of pgtable range !\n");
> - return 1;
> + rc = 1;
> + goto exit;
>   }
>  
>   /* Get region & vsid */

This no longer applies on mainline, please send an updated version.

cheers
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] intel-iommu: Synchronize gcmd value with global command register

2013-04-09 Thread Takao Indoh

(2013/04/05 20:06), Joerg Roedel wrote:
> On Wed, Apr 03, 2013 at 09:24:39AM +0100, David Woodhouse wrote:
>> On Wed, 2013-04-03 at 16:11 +0900, Takao Indoh wrote:
>>> Yeah, you are right. I forgot such a case.
>>
>> If you disable translation and there's some device still doing DMA, it's
>> going to scribble over random areas of memory. You really want to have
>> translation enabled and all the page tables *cleared*, during kexec. I
>> think it's fair to insist that the secondary kernel should use the IOMMU
>> if the first one did.
> 
> Do we really need to insist on that? The IOMMU initialization on x86
> happens after the kernel scanned and enumerated the PCI bus. While doing
> this the kernel (at least it should) disables all devices it finds. So
> when the IOMMU init code runs we should be safe from any in-flight DMA
> and can either disable translation or re-initialize it for the kdump
> kernel. Until then translation needs to stay enabled of course, so that
> the old page-tables are still used and in-flight DMA doesn't corrupt
> any data.

So we should do in this order, right?
(1) PCI initialization. Stop all ongoing DMA here.
(2) Disable translation if already enable.
(3) Make pgtable and enable translation.

Thanks,
Takao Indoh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] kbuild: generate generic headers before recursing into scripts

2013-04-09 Thread Prabhakar Lad

On Tue, Apr 9, 2013 at 11:27 PM, Andreas Schwab  wrote:
> The headers are now needed inside scripts/mod since 6543bec
> ("mod/file2alias: make modalias generation safe for cross compiling").
>
> Signed-off-by: Andreas Schwab 

Reported-by: Lad, Prabhakar 
Tested-by: Lad, Prabhakar 

Regards,
--Prabhakar

> ---
> Prabhakar Lad  writes:
>
>> Whats the status of it ?
>
> I think it has sufficiently been tested by now.
>
> Andreas.
> ---
>  Makefile | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/Makefile b/Makefile
> index 6db672b..11157bd 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -513,7 +513,8 @@ ifeq ($(KBUILD_EXTMOD),)
>  # Carefully list dependencies so we do not try to build scripts twice
>  # in parallel
>  PHONY += scripts
> -scripts: scripts_basic include/config/auto.conf include/config/tristate.conf
> +scripts: scripts_basic include/config/auto.conf include/config/tristate.conf 
> \
> +asm-generic
> $(Q)$(MAKE) $(build)=$(@)
>
>  # Objects we will link into vmlinux / subdirs we need to visit
> --
> 1.8.2.1
>
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
> "And now for something completely different."
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-09 Thread Michael R. Hines


On 04/09/2013 11:24 PM, Michael S. Tsirkin wrote:
Which mechanism do you refer to? You patches still seem to pin each 
page in guest memory at some point, which will break all COW. In 
particular any pagemap tricks to detect duplicates on source that I 
suggested won't work. 


Sorry, I mispoke. I'm reffering to dynamic server page registration.

Of course it does not eliminate pinning - but it does mitigate the foot 
print of the VM as a feature that was requested.


I have implemented it and documented it.

- Michael


On 04/09/2013 03:03 PM, Michael S. Tsirkin wrote:

presumably is_dup_page reads the page, so should not break COW ...

I'm not sure about the cgroups swap limit - you might have
too many non COW pages so attempting to fault them all in
makes you exceed the limit. You really should look at
what is going on in the pagemap, to see if there's
measureable gain from the patch.


On Fri, Apr 05, 2013 at 05:32:30PM -0400, Michael R. Hines wrote:

Well, I have the "is_dup_page()" commented out...when RDMA is
activated.

Is there something else in QEMU that could be touching the page that
I don't know about?

- Michael


On 04/05/2013 05:03 PM, Roland Dreier wrote:

On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines
 wrote:

Sorry, I was wrong. ignore the comments about cgroups. That's still broken.
(i.e. trying to register RDMA memory while using a cgroup swap limit cause
the process get killed).

But the GIFT flag patch works (my understanding is that GIFT flag allows the
adapter to transmit stale memory information, it does not have anything to
do with cgroups specifically).

The point of the GIFT patch is to avoid triggering copy-on-write so
that memory doesn't blow up during migration.  If that doesn't work
then there's no point to the patch.

  - R.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/18] cpufreq: sh: move cpufreq driver to drivers/cpufreq

2013-04-09 Thread Simon Horman

On Wed, Apr 10, 2013 at 08:21:51AM +0530, Viresh Kumar wrote:
> On 10 April 2013 07:42, Simon Horman  wrote:
> > Thanks, I understand.
> >
> > I have no objections to this, but Paul should probably review it.
> 
> It is already Acked by him and applied by Rafael.

:)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-09 Thread Michael S. Tsirkin


On Tue, Apr 09, 2013 at 09:26:59PM -0400, Michael R. Hines wrote:
> With respect, I'm going to offload testing this patch back to the author =)
> because I'm trying to address all of Paolo's other minor issues
> with the RDMA patch before we can merge.

Fair enough, this likely means it won't happen anytime soon though.

> Since dynamic page registration (as you requested) is now fully
> implemented, this patch is less urgent since we now have a
> mechanism in place to avoid page pinning on both sides of the migration.
> 
> - Michael
> 

Which mechanism do you refer to? You patches still seem to pin
each page in guest memory at some point, which will break all
COW. In particular any pagemap tricks to detect duplicates
on source that I suggested won't work.

> On 04/09/2013 03:03 PM, Michael S. Tsirkin wrote:
> >presumably is_dup_page reads the page, so should not break COW ...
> >
> >I'm not sure about the cgroups swap limit - you might have
> >too many non COW pages so attempting to fault them all in
> >makes you exceed the limit. You really should look at
> >what is going on in the pagemap, to see if there's
> >measureable gain from the patch.
> >
> >
> >On Fri, Apr 05, 2013 at 05:32:30PM -0400, Michael R. Hines wrote:
> >>Well, I have the "is_dup_page()" commented out...when RDMA is
> >>activated.
> >>
> >>Is there something else in QEMU that could be touching the page that
> >>I don't know about?
> >>
> >>- Michael
> >>
> >>
> >>On 04/05/2013 05:03 PM, Roland Dreier wrote:
> >>>On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines
> >>> wrote:
> Sorry, I was wrong. ignore the comments about cgroups. That's still 
> broken.
> (i.e. trying to register RDMA memory while using a cgroup swap limit cause
> the process get killed).
> 
> But the GIFT flag patch works (my understanding is that GIFT flag allows 
> the
> adapter to transmit stale memory information, it does not have anything to
> do with cgroups specifically).
> >>>The point of the GIFT patch is to avoid triggering copy-on-write so
> >>>that memory doesn't blow up during migration.  If that doesn't work
> >>>then there's no point to the patch.
> >>>
> >>>  - R.
> >>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] sched: wake-affine throttle

2013-04-09 Thread Alex Shi

On 04/10/2013 11:30 AM, Michael Wang wrote:
> Suggested-by: Peter Zijlstra 
> Signed-off-by: Michael Wang 

Reviewed-by: Alex Shi 

BTW, could you try the kbulid, hackbench and aim for this?

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v2] of/base: release the node correctly in of_parse_phandle_with_args()

2013-04-09 Thread Yuantian.Tang

From: Tang Yuantian 

Call of_node_put() only when the out_args is NULL on success,
or the node's reference count will not be correct because the caller
will call of_node_put() again.

Signed-off-by: Tang Yuantian 
---
v2:
- modified the title and description. the 1st patch title is:
  of: remove the unnecessary of_node_put for 
of_parse_phandle_with_args()
  the 1st patch is not good enough.

 drivers/of/base.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 321d3ef..ee94f64 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1158,6 +1158,7 @@ static int __of_parse_phandle_with_args(const struct 
device_node *np,
if (!phandle)
goto err;
 
+   /* Found it! return success */
if (out_args) {
int i;
if (WARN_ON(count > MAX_PHANDLE_ARGS))
@@ -1166,11 +1167,10 @@ static int __of_parse_phandle_with_args(const struct 
device_node *np,
out_args->args_count = count;
for (i = 0; i < count; i++)
out_args->args[i] = 
be32_to_cpup(list++);
+   } else if (node) {
+   of_node_put(node);
}
 
-   /* Found it! return success */
-   if (node)
-   of_node_put(node);
return 0;
}
 
-- 
1.8.0


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3] kernel: module: using strlcpy and strcpy instead of strncpy

2013-04-09 Thread Chen Gang


  namebuf is NUL terminated string.  better always let it ended by '\0'.

  the module_name() is always the name field of struct module (which is
  a fixed array), or a literal "kernel", so strcpy is better.


Signed-off-by: Chen Gang 
---
 kernel/module.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 3c2c72d..09aeefd 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1283,7 +1283,7 @@ static const struct kernel_symbol *resolve_symbol(struct 
module *mod,
 
 getname:
/* We must make copy under the lock if we failed to get ref. */
-   strncpy(ownername, module_name(owner), MODULE_NAME_LEN);
+   strcpy(ownername, module_name(owner));
 unlock:
mutex_unlock(_mutex);
return sym;
@@ -3464,7 +3464,7 @@ const char *module_address_lookup(unsigned long addr,
}
/* Make a copy in here where it's safe */
if (ret) {
-   strncpy(namebuf, ret, KSYM_NAME_LEN - 1);
+   strlcpy(namebuf, ret, KSYM_NAME_LEN);
ret = namebuf;
}
preempt_enable();
-- 
1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3] pstore/ram: avoid atomic accesses for ioremapped regions

2013-04-09 Thread Colin Cross

On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring  wrote:
> From: Rob Herring 
>
> For persistent RAM outside of main memory, the memory may have limitations
> on supported accesses. For internal RAM on highbank platform exclusive
> accesses are not supported and will hang the system. So atomic_cmpxchg
> cannot be used. This commit uses spinlock protection for buffer size and
> start updates on ioremapped regions instead.

I used atomics in persistent_ram to support persistent ftrace, which
now exists as PSTORE_FTRACE.  At some point during development I had
trouble with recursive tracing causing an infinite loop, so you may
want to test that calling out to spinlock functions with PSTORE_FTRACE
turned on and enabled doesn't cause a problem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Drbd-dev] [PATCH] drivers/block/drbd: remove erroneous semicolon

2013-04-09 Thread Chen Gang

On 2013年04月09日 23:56, Lars Ellenberg wrote:
> The original report (by kbuild test robot) is from the week before iirc
> (don't have it anymore), the original commit in our drbd repo the day before.


  really it is !!

  thanks.

  :-)



-- 
Chen Gang

Asianux Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 1/3] pstore-ram: use write-combine mappings

2013-04-09 Thread Colin Cross

On Tue, Apr 9, 2013 at 8:08 PM, Rob Herring  wrote:
> From: Rob Herring 
>
> Atomic operations are undefined behavior on ARM for device or strongly
> ordered memory types. So use write-combine variants for mappings. This
> corresponds to normal, non-cacheable memory on ARM. For many other
> architectures, this change should not change the mapping type.

This is going to make ramconsole less reliable.  A debugging printk
followed by a __raw_writel that causes an immediate hard crash is
likely to lose the last updates, including the most useful message, in
the write buffers.

Also, isn't this patch unnecessary after patch 3 in this set?

> Signed-off-by: Rob Herring 
> Cc: Anton Vorontsov 
> Cc: Colin Cross 
> Cc: Kees Cook 
> Cc: Tony Luck 
> Cc: linux-kernel@vger.kernel.org
> ---
>  fs/pstore/ram_core.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
> index 0306303..e126d9f 100644
> --- a/fs/pstore/ram_core.c
> +++ b/fs/pstore/ram_core.c
> @@ -337,7 +337,7 @@ static void *persistent_ram_vmap(phys_addr_t start, 
> size_t size)
> page_start = start - offset_in_page(start);
> page_count = DIV_ROUND_UP(size + offset_in_page(start), PAGE_SIZE);
>
> -   prot = pgprot_noncached(PAGE_KERNEL);
> +   prot = pgprot_writecombine(PAGE_KERNEL);
Is this necessary?  Won't pgprot_noncached already be normal memory?

> pages = kmalloc(sizeof(struct page *) * page_count, GFP_KERNEL);
> if (!pages) {
> @@ -364,7 +364,7 @@ static void *persistent_ram_iomap(phys_addr_t start, 
> size_t size)
> return NULL;
> }
>
> -   return ioremap(start, size);
> +   return ioremap_wc(start, size);

ioremap_wc corresponds to MT_DEVICE_WC, which is still device memory,
so I don't see how this helps solve the problem in the commit message.

>  }
>
>  static int persistent_ram_buffer_map(phys_addr_t start, phys_addr_t size,
> --
> 1.7.10.4
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

linux-next: manual merge of the mfd tree with the v4l-dvb tree

2013-04-09 Thread Stephen Rothwell

Hi Samuel,

Today's linux-next merge of the mfd tree got a conflict in
drivers/mfd/Kconfig between commit 3f8ec5df11aa ("[media] mfd: Add header
files and Kbuild plumbing for SI476x MFD core") from the v4l-dvb tree and
commit ab85b120e692 ("mfd: Kconfig alphabetical re-ordering") from the
mfd tree.

I fixed it up (I think - see below) and can carry the fix as necessary
(no action is required).

diff --cc drivers/mfd/Kconfig
index b6bb6d5,2f3ce18..000
--- a/drivers/mfd/Kconfig
+++ b/drivers/mfd/Kconfig
@@@ -977,68 -920,38 +920,51 @@@ config MFD_WL1273_COR
  driver connects the radio-wl1273 V4L2 module and the wl1273
  audio codec.
  
 +config MFD_SI476X_CORE
 +  tristate "Support for Silicon Laboratories 4761/64/68 AM/FM radio."
 +  depends on I2C
 +  select MFD_CORE
 +  select REGMAP_I2C
 +  help
 +This is the core driver for the SI476x series of AM/FM
 +radio. This MFD driver connects the radio-si476x V4L2 module
 +and the si476x audio codec.
 +
 +To compile this driver as a module, choose M here: the
 +module will be called si476x-core.
 +
- config MFD_OMAP_USB_HOST
-   bool "Support OMAP USBHS core and TLL driver"
-   depends on USB_EHCI_HCD_OMAP || USB_OHCI_HCD_OMAP3
-   default y
-   help
- This is the core driver for the OAMP EHCI and OHCI drivers.
- This MFD driver does the required setup functionalities for
- OMAP USB Host drivers.
- 
- config MFD_PM8XXX
-   tristate
- 
- config MFD_PM8921_CORE
-   tristate "Qualcomm PM8921 PMIC chip"
-   depends on MSM_SSBI
+ config MFD_LM3533
+   tristate "TI/National Semiconductor LM3533 Lighting Power chip"
+   depends on I2C
select MFD_CORE
-   select MFD_PM8XXX
+   select REGMAP_I2C
+   depends on GENERIC_HARDIRQS
help
- If you say yes to this option, support will be included for the
- built-in PM8921 PMIC chip.
- 
- This is required if your board has a PM8921 and uses its features,
- such as: MPPs, GPIOs, regulators, interrupts, and PWM.
- 
- Say M here if you want to include support for PM8921 chip as a module.
- This will build a module called "pm8921-core".
+ Say yes here to enable support for National Semiconductor / TI
+ LM3533 Lighting Power chips.
  
- config MFD_PM8XXX_IRQ
-   bool "Support for Qualcomm PM8xxx IRQ features"
-   depends on MFD_PM8XXX
-   default y if MFD_PM8XXX
-   help
- This is the IRQ driver for Qualcomm PM 8xxx PMIC chips.
+ This driver provides common support for accessing the device;
+ additional drivers must be enabled in order to use the LED,
+ backlight or ambient-light-sensor functionality of the device.
  
- This is required to use certain other PM 8xxx features, such as GPIO
- and MPP.
+ config MFD_TIMBERDALE
+   tristate "Timberdale FPGA"
+   select MFD_CORE
+   depends on PCI && GPIOLIB
+   ---help---
+   This is the core driver for the timberdale FPGA. This device is a
+   multifunction device which exposes numerous platform devices.
  
- config TPS65911_COMPARATOR
-   tristate
+   The timberdale FPGA can be found on the Intel Atom development board
+   for in-vehicle infontainment, called Russellville.
  
- config MFD_TPS65090
-   bool "TPS65090 Power Management chips"
+ config MFD_TC3589X
+   bool "Toshiba TC35892 and variants"
depends on I2C=y && GENERIC_HARDIRQS
select MFD_CORE
-   select REGMAP_I2C
-   select REGMAP_IRQ
help
- If you say yes here you get support for the TPS65090 series of
- Power Management chips.
+ Support for the Toshiba TC35892 and variants I/O Expander.
+ 
  This driver provides common support for accessing the device,
  additional drivers must be enabled in order to use the
  functionality of the device.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpeVk6LEvCgy.pgp
Description: PGP signature

[PATCH] sched: wake-affine throttle

2013-04-09 Thread Michael Wang

Log since RFC:
1. Throttle only when wake-affine failed. (thanks to PeterZ)
2. Do throttle inside wake_affine(). (thanks to PeterZ)
3. Other small fix.

Recently testing show that wake-affine stuff cause regression on pgbench, the
hiding rat was finally catched out.

wake-affine stuff is always trying to pull wakee close to waker, by theory,
this will benefit us if waker's cpu cached hot data for wakee, or the extreme
ping-pong case.

However, the whole stuff is somewhat blindly, load balance is the only factor
to be guaranteed, and since the stuff itself is time-consuming, some workload
suffered, pgbench is just the one who has been found.

Thus, throttle the wake-affine stuff for such workload is necessary.

This patch introduced a new knob 'sysctl_sched_wake_affine_interval' with the
default value 1ms (default minimum balance interval), which means wake-affine
will keep silent for 1ms after it returned false.

By turning the new knob, those workload who suffered will have the chance to
stop the regression.

Test:
Test with 12 cpu X86 server and tip 3.9.0-rc2.

default
base1ms interval 10ms interval   100ms interval
| db_size | clients |  tps  |   |  tps  ||  tps  | |  tps  |
+-+-+---+-  +---++---+ +---+
| 21 MB   |   1 | 10572 |   | 10804 || 10802 | | 10801 |
| 21 MB   |   2 | 21275 |   | 21533 || 21400 | | 21498 |
| 21 MB   |   4 | 41866 |   | 42158 || 42410 | | 42306 |
| 21 MB   |   8 | 53931 |   | 55796 || 58608 | +8.67%  | 59916 | 
+11.10%
| 21 MB   |  12 | 50956 |   | 52266 || 54586 | +7.12%  | 55982 | 
+9.86%
| 21 MB   |  16 | 49911 |   | 52862 | +5.91% | 55668 | +11.53% | 57255 | 
+14.71%
| 21 MB   |  24 | 46046 |   | 48820 | +6.02% | 54269 | +17.86% | 58113 | 
+26.21%
| 21 MB   |  32 | 43405 |   | 46635 | +7.44% | 53690 | +23.70% | 57729 | 
+33.00%
| 7483 MB |   1 |  7734 |   |  8013 ||  8046 | |  7879 |
| 7483 MB |   2 | 19375 |   | 19459 || 19448 | | 19421 |
| 7483 MB |   4 | 37408 |   | 37780 || 37937 | | 37819 |
| 7483 MB |   8 | 49033 |   | 50389 || 51636 | +5.31%  | 52294 | 
+6.65%
| 7483 MB |  12 | 45525 |   | 47794 | +4.98% | 49828 | +9.45%  | 50571 | 
+11.08%
| 7483 MB |  16 | 45731 |   | 47921 | +4.79% | 50203 | +9.78%  | 52033 | 
+13.78%
| 7483 MB |  24 | 41533 |   | 44301 | +6.67% | 49697 | +19.66% | 53833 | 
+29.62%
| 7483 MB |  32 | 36370 |   | 38301 | +5.31% | 48146 | +32.38% | 52795 | 
+45.16%
| 15 GB   |   1 |  7576 |   |  7926 ||  7722 | |  7969 |
| 15 GB   |   2 | 19157 |   | 19284 || 19294 | | 19304 |
| 15 GB   |   4 | 37285 |   | 37539 || 37281 | | 37508 |
| 15 GB   |   8 | 48718 |   | 49176 || 50836 | +4.35%  | 51239 | 
+5.17%
| 15 GB   |  12 | 45167 |   | 47180 | +4.45% | 49206 | +8.94%  | 50126 | 
+10.98%
| 15 GB   |  16 | 45270 |   | 47293 | +4.47% | 49638 | +9.65%  | 51748 | 
+14.31%
| 15 GB   |  24 | 40984 |   | 43366 | +5.81% | 49356 | +20.43% | 53157 | 
+29.70%
| 15 GB   |  32 | 35918 |   | 37632 | +4.77% | 47923 | +33.42% | 52241 | 
+45.45%

Suggested-by: Peter Zijlstra 
Signed-off-by: Michael Wang 
---
 include/linux/sched.h |5 +
 kernel/sched/fair.c   |   31 +++
 kernel/sysctl.c   |   10 ++
 3 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index d35d2b6..e9efd3a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1197,6 +1197,10 @@ enum perf_event_task_context {
perf_nr_task_contexts,
 };
 
+#ifdef CONFIG_SMP
+extern unsigned int sysctl_sched_wake_affine_interval;
+#endif
+
 struct task_struct {
volatile long state;/* -1 unrunnable, 0 runnable, >0 stopped */
void *stack;
@@ -1207,6 +1211,7 @@ struct task_struct {
 #ifdef CONFIG_SMP
struct llist_node wake_entry;
int on_cpu;
+   unsigned long next_wake_affine;
 #endif
int on_rq;
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7a33e59..68eedd7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3087,6 +3087,22 @@ static inline unsigned long effective_load(struct 
task_group *tg, int cpu,
 
 #endif
 
+/*
+ * Default is 1ms, to prevent the wake_affine() stuff working too frequently.
+ */
+unsigned int sysctl_sched_wake_affine_interval = 1U;
+
+static inline int wake_affine_throttled(struct task_struct *p)
+{
+   return time_before(jiffies, p->next_wake_affine);
+}
+
+static inline void wake_affine_throttle(struct task_struct *p)
+{
+   p->next_wake_affine = jiffies +
+   msecs_to_jiffies(sysctl_sched_wake_affine_interval);
+}
+
 static int wake_affine(struct

[PATCH v3 05/12] tracing: switch syscall tracing to use event_trace_ops backend

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

Other tracepoints already switched to use event_trace_ops as
backend store mechanism, syscall tracing can use same backend.

This change would also expose syscall tracing to external modules
with same interface like other tracepoints.

Signed-off-by: zhangwei(Jovi) 
---
 kernel/trace/trace_syscalls.c |   49 ++---
 1 file changed, 16 insertions(+), 33 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 322e164..72675b1 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -302,12 +302,10 @@ static int __init syscall_exit_define_fields(struct 
ftrace_event_call *call)
 static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 {
struct trace_array *tr = data;
+   struct ftrace_event_file event_file;
+   struct trace_descriptor_t desc;
struct syscall_trace_enter *entry;
struct syscall_metadata *sys_data;
-   struct ring_buffer_event *event;
-   struct ring_buffer *buffer;
-   unsigned long irq_flags;
-   int pc;
int syscall_nr;
int size;
 
@@ -323,34 +321,26 @@ static void ftrace_syscall_enter(void *data, struct 
pt_regs *regs, long id)
 
size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
 
-   local_save_flags(irq_flags);
-   pc = preempt_count();
-
-   buffer = tr->trace_buffer.buffer;
-   event = trace_buffer_lock_reserve(buffer,
-   sys_data->enter_event->event.type, size, irq_flags, pc);
-   if (!event)
+   event_file.tr = tr;
+   event_file.event_call = sys_data->enter_event;
+   event_file.flags = FTRACE_EVENT_FL_ENABLED;
+   entry = tr->ops->pre_trace(_file, size, );
+   if (!entry)
return;
 
-   entry = ring_buffer_event_data(event);
entry->nr = syscall_nr;
syscall_get_arguments(current, regs, 0, sys_data->nb_args, entry->args);
 
-   if (!filter_current_check_discard(buffer, sys_data->enter_event,
- entry, event))
-   trace_current_buffer_unlock_commit(buffer, event,
-  irq_flags, pc);
+   tr->ops->do_trace(_file, entry, size, );
 }
 
 static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 {
struct trace_array *tr = data;
+   struct ftrace_event_file event_file;
+   struct trace_descriptor_t desc;
struct syscall_trace_exit *entry;
struct syscall_metadata *sys_data;
-   struct ring_buffer_event *event;
-   struct ring_buffer *buffer;
-   unsigned long irq_flags;
-   int pc;
int syscall_nr;
 
syscall_nr = trace_get_syscall_nr(current, regs);
@@ -363,24 +353,17 @@ static void ftrace_syscall_exit(void *data, struct 
pt_regs *regs, long ret)
if (!sys_data)
return;
 
-   local_save_flags(irq_flags);
-   pc = preempt_count();
-
-   buffer = tr->trace_buffer.buffer;
-   event = trace_buffer_lock_reserve(buffer,
-   sys_data->exit_event->event.type, sizeof(*entry),
-   irq_flags, pc);
-   if (!event)
+   event_file.tr = tr;
+   event_file.event_call = sys_data->exit_event;
+   event_file.flags = FTRACE_EVENT_FL_ENABLED;
+   entry = tr->ops->pre_trace(_file, sizeof(*entry), );
+   if (!entry)
return;
 
-   entry = ring_buffer_event_data(event);
entry->nr = syscall_nr;
entry->ret = syscall_get_return_value(current, regs);
 
-   if (!filter_current_check_discard(buffer, sys_data->exit_event,
- entry, event))
-   trace_current_buffer_unlock_commit(buffer, event,
-  irq_flags, pc);
+   tr->ops->do_trace(_file, entry, sizeof(*entry), );
 }
 
 static int reg_event_syscall_enter(struct ftrace_event_file *file,
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 04/12] tracing: export ftrace_events

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

let modules can access ftrace_events

Signed-off-by: zhangwei(Jovi) 
---
 include/linux/ftrace_event.h |1 +
 kernel/trace/trace.h |1 -
 kernel/trace/trace_events.c  |2 ++
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4b55272..f6a6e48 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -346,6 +346,7 @@ enum {
 #define EVENT_STORAGE_SIZE 128
 extern struct mutex event_storage_mutex;
 extern char event_storage[EVENT_STORAGE_SIZE];
+extern struct list_head ftrace_events;
 
 extern int trace_event_raw_init(struct ftrace_event_call *call);
 extern int trace_define_field(struct ftrace_event_call *call, const char *type,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 0a1f4be..8f4966b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -917,7 +917,6 @@ extern int event_trace_add_tracer(struct dentry *parent, 
struct trace_array *tr)
 extern int event_trace_del_tracer(struct trace_array *tr);
 
 extern struct mutex event_mutex;
-extern struct list_head ftrace_events;
 
 extern const char *__start___trace_bprintk_fmt[];
 extern const char *__stop___trace_bprintk_fmt[];
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 09ca479..7c52a51 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -34,6 +34,8 @@ char event_storage[EVENT_STORAGE_SIZE];
 EXPORT_SYMBOL_GPL(event_storage);
 
 LIST_HEAD(ftrace_events);
+EXPORT_SYMBOL_GPL(ftrace_events);
+
 static LIST_HEAD(ftrace_common_fields);
 
 #define GFP_TRACE (GFP_KERNEL | __GFP_ZERO)
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 02/12] tracing: fix irqs-off tag display in syscall tracing

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

Now all syscall tracing irqs-off tag is wrong,
syscall enter entry doesn't disable irq.

 [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event
 [root@jovi tracing]# cat trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 13/13   #P:2
 #
 #  _-=> irqs-off
 # / _=> need-resched
 #| / _---=> hardirq/softirq
 #|| / _--=> preempt-depth
 #||| / delay
 #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
 #  | |   |      | |
   irqbalance-513   [000] d... 56115.496766: sys_open(filename: 804e1a6, 
flags: 0, mode: 1b6)
   irqbalance-513   [000] d... 56115.497008: sys_open(filename: 804e1bb, 
flags: 0, mode: 1b6)
 sendmail-771   [000] d... 56115.827982: sys_open(filename: b770e6d1, 
flags: 0, mode: 1b6)

The reason is syscall tracing doesn't record irq_flags into buffer.
Fix this after this patch:

 [root@jovi tracing]#echo "syscalls:sys_enter_open" > set_event
 [root@jovi tracing]# cat trace
 # tracer: nop
 #
 # entries-in-buffer/entries-written: 14/14   #P:2
 #
 #  _-=> irqs-off
 # / _=> need-resched
 #| / _---=> hardirq/softirq
 #|| / _--=> preempt-depth
 #||| / delay
 #   TASK-PID   CPU#  TIMESTAMP  FUNCTION
 #  | |   |      | |
   irqbalance-514   [001] 46.213921: sys_open(filename: 804e1a6, 
flags: 0, mode: 1b6)
   irqbalance-514   [001] 46.214160: sys_open(filename: 804e1bb, 
flags: 0, mode: 1b6)
<...>-920   [001] 47.307260: sys_open(filename: 4e82a0c5, 
flags: 8, mode: 0)

Signed-off-by: zhangwei(Jovi) 
---
 kernel/trace/trace_syscalls.c |   21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 8f2ac73..322e164 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -306,6 +306,8 @@ static void ftrace_syscall_enter(void *data, struct pt_regs 
*regs, long id)
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
+   unsigned long irq_flags;
+   int pc;
int syscall_nr;
int size;
 
@@ -321,9 +323,12 @@ static void ftrace_syscall_enter(void *data, struct 
pt_regs *regs, long id)
 
size = sizeof(*entry) + sizeof(unsigned long) * sys_data->nb_args;
 
+   local_save_flags(irq_flags);
+   pc = preempt_count();
+
buffer = tr->trace_buffer.buffer;
event = trace_buffer_lock_reserve(buffer,
-   sys_data->enter_event->event.type, size, 0, 0);
+   sys_data->enter_event->event.type, size, irq_flags, pc);
if (!event)
return;
 
@@ -333,7 +338,8 @@ static void ftrace_syscall_enter(void *data, struct pt_regs 
*regs, long id)
 
if (!filter_current_check_discard(buffer, sys_data->enter_event,
  entry, event))
-   trace_current_buffer_unlock_commit(buffer, event, 0, 0);
+   trace_current_buffer_unlock_commit(buffer, event,
+  irq_flags, pc);
 }
 
 static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
@@ -343,6 +349,8 @@ static void ftrace_syscall_exit(void *data, struct pt_regs 
*regs, long ret)
struct syscall_metadata *sys_data;
struct ring_buffer_event *event;
struct ring_buffer *buffer;
+   unsigned long irq_flags;
+   int pc;
int syscall_nr;
 
syscall_nr = trace_get_syscall_nr(current, regs);
@@ -355,9 +363,13 @@ static void ftrace_syscall_exit(void *data, struct pt_regs 
*regs, long ret)
if (!sys_data)
return;
 
+   local_save_flags(irq_flags);
+   pc = preempt_count();
+
buffer = tr->trace_buffer.buffer;
event = trace_buffer_lock_reserve(buffer,
-   sys_data->exit_event->event.type, sizeof(*entry), 0, 0);
+   sys_data->exit_event->event.type, sizeof(*entry),
+   irq_flags, pc);
if (!event)
return;
 
@@ -367,7 +379,8 @@ static void ftrace_syscall_exit(void *data, struct pt_regs 
*regs, long ret)
 
if (!filter_current_check_discard(buffer, sys_data->exit_event,
  entry, event))
-   trace_current_buffer_unlock_commit(buffer, event, 0, 0);
+   trace_current_buffer_unlock_commit(buffer, event,
+  irq_flags, pc);
 }
 
 static int reg_event_syscall_enter(struct ftrace_event_file *file,
-- 
1.7.9.7

[PATCH v3 06/12] tracing: expose structure ftrace_event_field

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

Currently event tracing field information is only stored in
struct ftrace_event_field, this structure is defined in
internal trace.h.
Move this ftrace_event_field into include/linux/ftrace_event.h,
then external modules can make use this structure to parse event
field(like ktap).

Signed-off-by: zhangwei(Jovi) 
---
 include/linux/ftrace_event.h |   10 ++
 kernel/trace/trace.h |   10 --
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index f6a6e48..ee4dc8d 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -176,6 +176,16 @@ enum trace_reg {
 #endif
 };
 
+struct ftrace_event_field {
+   struct list_headlink;
+   const char  *name;
+   const char  *type;
+   int filter_type;
+   int offset;
+   int size;
+   int is_signed;
+};
+
 struct ftrace_event_call;
 
 struct ftrace_event_class {
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 8f4966b..89da073 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -800,16 +800,6 @@ enum {
TRACE_EVENT_TYPE_RAW= 2,
 };
 
-struct ftrace_event_field {
-   struct list_headlink;
-   const char  *name;
-   const char  *type;
-   int filter_type;
-   int offset;
-   int size;
-   int is_signed;
-};
-
 struct event_filter {
int n_preds;/* Number assigned */
int a_preds;/* allocated */
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 07/12] tracing: remove TRACE_EVENT_TYPE enum definition

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

TRACE_EVENT_TYPE enum is not used at present, remove it.

Signed-off-by: zhangwei(Jovi) 
---
 kernel/trace/trace.h |6 --
 1 file changed, 6 deletions(-)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 89da073..9964695 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -794,12 +794,6 @@ static inline void trace_branch_disable(void)
 /* set ring buffers to default size if not already done so */
 int tracing_update_buffers(void);
 
-/* trace event type bit fields, not numeric */
-enum {
-   TRACE_EVENT_TYPE_PRINTF = 1,
-   TRACE_EVENT_TYPE_RAW= 2,
-};
-
 struct event_filter {
int n_preds;/* Number assigned */
int a_preds;/* allocated */
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 11/12] tracing: guard tracing_selftest_disabled by CONFIG_FTRACE_STARTUP_TEST

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

Variable tracing_selftest_disabled have not any sense when
CONFIG_FTRACE_STARTUP_TEST is disabled.

This patch also remove __read_mostly attribute, since variable
tracing_selftest_disabled really not read mostly.

Signed-off-by: zhangwei(Jovi) 
---
 kernel/trace/trace.c|6 --
 kernel/trace/trace.h|2 +-
 kernel/trace/trace_events.c |2 ++
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ee4e110..09a3aa8 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -58,10 +58,12 @@ bool ring_buffer_expanded;
  */
 static bool __read_mostly tracing_selftest_running;
 
+#ifdef CONFIG_FTRACE_STARTUP_TEST
 /*
  * If a tracer is running, we do not want to run SELFTEST.
  */
-bool __read_mostly tracing_selftest_disabled;
+bool tracing_selftest_disabled;
+#endif
 
 /* For tracers that don't implement custom flags */
 static struct tracer_opt dummy_tracer_opt[] = {
@@ -1069,8 +1071,8 @@ int register_tracer(struct tracer *type)
tracing_set_tracer(type->name);
default_bootup_tracer = NULL;
/* disable other selftests, since this will break it. */
-   tracing_selftest_disabled = true;
 #ifdef CONFIG_FTRACE_STARTUP_TEST
+   tracing_selftest_disabled = true;
printk(KERN_INFO "Disabling FTRACE selftests due to running tracer 
'%s'\n",
   type->name);
 #endif
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 9b8afa7..e9ef8b7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -546,10 +546,10 @@ extern int DYN_FTRACE_TEST_NAME(void);
 extern int DYN_FTRACE_TEST_NAME2(void);
 
 extern bool ring_buffer_expanded;
-extern bool tracing_selftest_disabled;
 DECLARE_PER_CPU(int, ftrace_cpu_disabled);
 
 #ifdef CONFIG_FTRACE_STARTUP_TEST
+extern bool tracing_selftest_disabled;
 extern int trace_selftest_startup_function(struct tracer *trace,
   struct trace_array *tr);
 extern int trace_selftest_startup_function_graph(struct tracer *trace,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 7c52a51..7c4a16b 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2251,7 +2251,9 @@ static __init int setup_trace_event(char *str)
 {
strlcpy(bootup_event_buf, str, COMMAND_LINE_SIZE);
ring_buffer_expanded = true;
+#ifdef CONFIG_FTRACE_STARTUP_TEST
tracing_selftest_disabled = true;
+#endif
 
return 1;
 }
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 10/12] tracing: use per trace_array clock_id instead of global trace_clock_id

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

tracing clock id already changed into per trace_array variable,
but there still use global trace_clock_id, which value always is 0 now.

Signed-off-by: zhangwei(Jovi) 
---
 kernel/trace/trace.c |8 +++-
 kernel/trace/trace.h |2 --
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index dd0c122..ee4e110 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -652,8 +652,6 @@ static struct {
ARCH_TRACE_CLOCKS
 };
 
-int trace_clock_id;
-
 /*
  * trace_parser_get_init - gets the buffer for trace parser
  */
@@ -2806,7 +2804,7 @@ __tracing_open(struct inode *inode, struct file *file, 
bool snapshot)
iter->iter_flags |= TRACE_FILE_ANNOTATE;
 
/* Output in nanoseconds only if we are using a clock in nanoseconds. */
-   if (trace_clocks[trace_clock_id].in_ns)
+   if (trace_clocks[tr->clock_id].in_ns)
iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
 
/* stop the trace while dumping if we are not opening "snapshot" */
@@ -3805,7 +3803,7 @@ static int tracing_open_pipe(struct inode *inode, struct 
file *filp)
iter->iter_flags |= TRACE_FILE_LAT_FMT;
 
/* Output in nanoseconds only if we are using a clock in nanoseconds. */
-   if (trace_clocks[trace_clock_id].in_ns)
+   if (trace_clocks[tr->clock_id].in_ns)
iter->iter_flags |= TRACE_FILE_TIME_IN_NS;
 
iter->cpu_file = tc->cpu;
@@ -5075,7 +5073,7 @@ tracing_stats_read(struct file *filp, char __user *ubuf,
cnt = ring_buffer_bytes_cpu(trace_buf->buffer, cpu);
trace_seq_printf(s, "bytes: %ld\n", cnt);
 
-   if (trace_clocks[trace_clock_id].in_ns) {
+   if (trace_clocks[tr->clock_id].in_ns) {
/* local or global for trace_clock */
t = ns2usecs(ring_buffer_oldest_event_ts(trace_buf->buffer, 
cpu));
usec_rem = do_div(t, USEC_PER_SEC);
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index bb3fd1b..9b8afa7 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -588,8 +588,6 @@ enum print_line_t print_trace_line(struct trace_iterator 
*iter);
 
 extern unsigned long trace_flags;
 
-extern int trace_clock_id;
-
 /* Standard output formatting function used for function return traces */
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 12/12] libtraceevent: add libtraceevent prefix in warning message

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

Now using perf tracepoint, perf output some warning message
which hard to understand what's wrong in perf.

[root@jovi perf]# ./perf stat -e timer:* ls
  Warning: unknown op '{'
  Warning: unknown op '{'
...

Actually these warning message is caused by libtraceevent format
parsing code.

So add libtraceevent prefix to identify this more clearly.

(we should remove all those warning message when running perf stat in future,
it's not necessary to parse event format when running perf stat)

Signed-off-by: zhangwei(Jovi) 
---
 tools/lib/traceevent/event-parse.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/lib/traceevent/event-parse.c 
b/tools/lib/traceevent/event-parse.c
index 82b0606..a3971d2 100644
--- a/tools/lib/traceevent/event-parse.c
+++ b/tools/lib/traceevent/event-parse.c
@@ -47,7 +47,7 @@ static int show_warning = 1;
 #define do_warning(fmt, ...)   \
do {\
if (show_warning)   \
-   warning(fmt, ##__VA_ARGS__);\
+   warning("libtraceevent: "fmt, ##__VA_ARGS__);   \
} while (0)
 
 static void init_input_buf(const char *buf, unsigned long long size)
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 01/12] tracing: move trace_array definition into include/linux/trace_array.h

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

Prepare for expose event tracing infrastructure.
(struct trace_array shall be use by external modules)

Signed-off-by: zhangwei(Jovi) 
---
 include/linux/trace_array.h |  117 +++
 kernel/trace/trace.h|  116 +-
 2 files changed, 118 insertions(+), 115 deletions(-)
 create mode 100644 include/linux/trace_array.h

diff --git a/include/linux/trace_array.h b/include/linux/trace_array.h
new file mode 100644
index 000..c5b7a13
--- /dev/null
+++ b/include/linux/trace_array.h
@@ -0,0 +1,117 @@
+#ifndef _LINUX_KERNEL_TRACE_ARRAY_H
+#define _LINUX_KERNEL_TRACE_ARRAY_H
+
+#ifdef CONFIG_FTRACE_SYSCALLS
+#include /* For NR_SYSCALLS   */
+#include/* some archs define it here */
+#endif
+
+struct trace_cpu {
+   struct trace_array  *tr;
+   struct dentry   *dir;
+   int cpu;
+};
+
+/*
+ * The CPU trace array - it consists of thousands of trace entries
+ * plus some other descriptor data: (for example which task started
+ * the trace, etc.)
+ */
+struct trace_array_cpu {
+   struct trace_cputrace_cpu;
+   atomic_tdisabled;
+   void*buffer_page;   /* ring buffer spare */
+
+   unsigned long   entries;
+   unsigned long   saved_latency;
+   unsigned long   critical_start;
+   unsigned long   critical_end;
+   unsigned long   critical_sequence;
+   unsigned long   nice;
+   unsigned long   policy;
+   unsigned long   rt_priority;
+   unsigned long   skipped_entries;
+   cycle_t preempt_timestamp;
+   pid_t   pid;
+   kuid_t  uid;
+   charcomm[TASK_COMM_LEN];
+};
+
+struct tracer;
+
+struct trace_buffer {
+   struct trace_array  *tr;
+   struct ring_buffer  *buffer;
+   struct trace_array_cpu __percpu *data;
+   cycle_t time_start;
+   int cpu;
+};
+
+/*
+ * The trace array - an array of per-CPU trace arrays. This is the
+ * highest level data structure that individual tracers deal with.
+ * They have on/off state as well:
+ */
+struct trace_array {
+   struct list_headlist;
+   char*name;
+   struct trace_buffer trace_buffer;
+#ifdef CONFIG_TRACER_MAX_TRACE
+   /*
+* The max_buffer is used to snapshot the trace when a maximum
+* latency is reached, or when the user initiates a snapshot.
+* Some tracers will use this to store a maximum trace while
+* it continues examining live traces.
+*
+* The buffers for the max_buffer are set up the same as the 
trace_buffer
+* When a snapshot is taken, the buffer of the max_buffer is swapped
+* with the buffer of the trace_buffer and the buffers are reset for
+* the trace_buffer so the tracing can continue.
+*/
+   struct trace_buffer max_buffer;
+   boolallocated_snapshot;
+#endif
+   int buffer_disabled;
+   struct trace_cputrace_cpu;  /* place holder */
+#ifdef CONFIG_FTRACE_SYSCALLS
+   int sys_refcount_enter;
+   int sys_refcount_exit;
+   DECLARE_BITMAP(enabled_enter_syscalls, NR_syscalls);
+   DECLARE_BITMAP(enabled_exit_syscalls, NR_syscalls);
+#endif
+   int stop_count;
+   int clock_id;
+   struct tracer   *current_trace;
+   unsigned intflags;
+   raw_spinlock_t  start_lock;
+   struct dentry   *dir;
+   struct dentry   *options;
+   struct dentry   *percpu_dir;
+   struct dentry   *event_dir;
+   struct list_headsystems;
+   struct list_headevents;
+   struct task_struct  *waiter;
+   int ref;
+};
+
+enum {
+   TRACE_ARRAY_FL_GLOBAL   = (1 << 0)
+};
+
+extern struct list_head ftrace_trace_arrays;
+
+/*
+ * The global tracer (top) should be the first trace array added,
+ * but we check the flag anyway.
+ */
+static inline struct trace_array *top_trace_array(void)
+{
+   struct trace_array *tr;
+
+   tr = list_entry(ftrace_trace_arrays.prev,
+   typeof(*tr), list);
+   WARN_ON(!(tr->flags & TRACE_ARRAY_FL_GLOBAL));
+   return tr;
+}
+
+#endif /* _LINUX_KERNEL_TRACE_ARRAY_H */
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 9e01458..a8acfcd 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -12,11 +12,7 @@
 #include 
 #include 
 #include 
-
-#ifdef CONFIG_FTRACE_SYSCALLS
-#include /* For NR_SYSCALLS   */
-#include

[PATCH v3 03/12] tracing: expose event tracing infrastructure

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

Currently event tracing only can be use for ftrace and perf,
there don't have any mechanism to let modules(like external tracing tool)
register callback tracing function.

Event tracing implement based on tracepoint, compare with raw tracepoint,
event tracing infrastructure provide built-in structured event annotate format,
this feature should expose to external user.

For example, simple pseudo ktap script demonstrate how to use this event
tracing expose change.

function event_trace(e)
{
printf("%s", e.annotate);
}

os.trace("sched:sched_switch", event_trace);
os.trace("irq:softirq_raise", event_trace);

The running result:
sched_switch: prev_comm=rcu_sched prev_pid=10 prev_prio=120 prev_state=S ==> 
next_comm=swapper/1 next_pid=0 next_prio=120
softirq_raise: vec=1 [action=TIMER]
...

This expose change can be use by other tracing tool, like systemtap/lttng,
if they would implement this.

This patch introduce struct event_trace_ops in trace_array, it have
two callback functions, pre_trace and do_trace.
when ftrace_raw_event_ function hit, it will call all
registered event_trace_ops.

the benefit of this change is kernel size shrink ~18K

(the kernel size will reduce more when perf tracing code
converting to use this mechanism in future)

textdata bss dec hex filename
7402131  804364 3149824 11356319 ad489f vmlinux.old
7383115  804684 3149824 11337623 acff97 vmlinux.new

Signed-off-by: zhangwei(Jovi) 
---
 include/linux/ftrace_event.h |   21 +
 include/linux/trace_array.h  |1 +
 include/trace/ftrace.h   |   69 +-
 kernel/trace/trace.c |4 ++-
 kernel/trace/trace.h |2 ++
 kernel/trace/trace_events.c  |   51 +++
 6 files changed, 99 insertions(+), 49 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4e28b01..4b55272 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct trace_array;
 struct trace_buffer;
@@ -245,6 +246,26 @@ struct ftrace_event_call {
 #endif
 };
 
+
+/*
+ * trace_descriptor_t is purpose for passing arguments between
+ * pre_trace and do_trace function.
+ */
+struct trace_descriptor_t {
+   struct ring_buffer_event *event;
+   struct ring_buffer *buffer;
+   unsigned long irq_flags;
+   int pc;
+};
+
+/* callback function for tracing */
+struct event_trace_ops {
+   void *(*pre_trace)(struct ftrace_event_file *file,
+  int entry_size, void *data);
+   void (*do_trace)(struct ftrace_event_file *file, void *entry,
+int entry_size, void *data);
+};
+
 struct trace_array;
 struct ftrace_subsystem_dir;
 
diff --git a/include/linux/trace_array.h b/include/linux/trace_array.h
index c5b7a13..b362c5f 100644
--- a/include/linux/trace_array.h
+++ b/include/linux/trace_array.h
@@ -56,6 +56,7 @@ struct trace_array {
struct list_headlist;
char*name;
struct trace_buffer trace_buffer;
+   struct event_trace_ops  *ops;
 #ifdef CONFIG_TRACER_MAX_TRACE
/*
 * The max_buffer is used to snapshot the trace when a maximum
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 4bda044..743e754 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -401,41 +401,28 @@ static inline notrace int ftrace_get_offsets_##call(  
\
  *
  * static struct ftrace_event_call event_;
  *
- * static void ftrace_raw_event_(void *__data, proto)
+ * static notrace void ftrace_raw_event_##call(void *__data, proto)
  * {
  * struct ftrace_event_file *ftrace_file = __data;
- * struct ftrace_event_call *event_call = ftrace_file->event_call;
- * struct ftrace_data_offsets_ __maybe_unused __data_offsets;
- * struct ring_buffer_event *event;
- * struct ftrace_raw_ *entry; <-- defined in stage 1
- * struct ring_buffer *buffer;
- * unsigned long irq_flags;
- * int __data_size;
- * int pc;
+ * struct ftrace_data_offsets_##call __maybe_unused __data_offsets;
+ * struct trace_descriptor_t __desc;
+ * struct event_trace_ops *ops = ftrace_file->tr->ops;
+ * struct ftrace_raw_##call *entry; <-- defined in stage 1
+ * int __data_size, __entry_size;
  *
- * if (test_bit(FTRACE_EVENT_FL_SOFT_DISABLED_BIT,
- *  _file->flags))
- * return;
- *
- * local_save_flags(irq_flags);
- * pc = preempt_count();
- *
- * __data_size = ftrace_get_offsets_(&__data_offsets, args);
+ * __data_size = ftrace_get_offsets_##call(&__data_offsets, args);
+ * __entry_size = sizeof(*entry) + __data_size;
  *
- * event = trace_event_buffer_lock_reserve(, ftrace_file,
- *   event_->event.type,
- *   sizeof(*entry) +

[PATCH v3 00/12] event tracing expose change and bugfix/cleanup

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

Hi steven,

I have reworked this patchset again with minor change.
[v2 -> v3:
-   change trace_descripte_t defintion in patch 3
-   new patch "export ftrace_events"
-   remove patch "export syscall metadata"
(syscall tracing are use same event_trace_ops backend as normal event 
tracepoint,
 so there's no need to export anything of syscall)
-   remove private data field in ftrace_event_file struct (also not needed)
]

This patchset contain:
1) event tracing expose work (v3)
   new implementation is based on multi-instances buffer work,
   it also integrate syscall tracing code to use same event backend store 
mechanism.
   The change include patch 1-6(patch 2 also fix a long-term minor bug)

2) some cleanup
   This include patch 7-11.

3) patch 12 fix libtraceevent warning

Note that these patches is based on latest linux-trace git tree:
(on top of multi-instances buffer implementation)

git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/perf/core

All patches pass basic testing.


Note that ktap code already make use of this event tracing export work,
If you are interesting, you can check ktap code in below link to see
how this export work is implemented by external modules.
https://github.com/ktap/ktap/blob/master/library/trace.c

And even more, you can try it. :)

Thanks very much

zhangwei(Jovi) (12):
  tracing: move trace_array definition into include/linux/trace_array.h
  tracing: fix irqs-off tag display in syscall tracing
  tracing: expose event tracing infrastructure
  tracing: export ftrace_events
  tracing: switch syscall tracing to use event_trace_ops backend
  tracing: expose structure ftrace_event_field
  tracing: remove TRACE_EVENT_TYPE enum definition
  tracing: remove obsolete macro guard _TRACE_PROFILE_INIT
  tracing: remove ftrace(...) function
  tracing: use per trace_array clock_id instead of global
trace_clock_id
  tracing: guard tracing_selftest_disabled by
CONFIG_FTRACE_STARTUP_TEST
  libtraceevent: add libtraceevent prefix in warning message

 include/linux/ftrace_event.h   |   32 
 include/linux/trace_array.h|  118 +
 include/trace/ftrace.h |   71 ++
 kernel/trace/trace.c   |   27 +++
 kernel/trace/trace.h   |  144 +---
 kernel/trace/trace_events.c|   55 ++
 kernel/trace/trace_syscalls.c  |   36 -
 tools/lib/traceevent/event-parse.c |2 +-
 8 files changed, 257 insertions(+), 228 deletions(-)
 create mode 100644 include/linux/trace_array.h

-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 09/12] tracing: remove ftrace(...) function

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

The only caller of function ftrace(...) was removed at long time ago,
so remove the function body also.

Signed-off-by: zhangwei(Jovi) 
---
 kernel/trace/trace.c |9 -
 kernel/trace/trace.h |5 -
 2 files changed, 14 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 224b152..dd0c122 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1534,15 +1534,6 @@ trace_function(struct trace_array *tr,
__buffer_unlock_commit(buffer, event);
 }
 
-void
-ftrace(struct trace_array *tr, struct trace_array_cpu *data,
-   unsigned long ip, unsigned long parent_ip, unsigned long flags,
-   int pc)
-{
-   if (likely(!atomic_read(>disabled)))
-   trace_function(tr, ip, parent_ip, flags, pc);
-}
-
 #ifdef CONFIG_STACKTRACE
 
 #define FTRACE_STACK_MAX_ENTRIES (PAGE_SIZE / sizeof(unsigned long))
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 9964695..bb3fd1b 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -445,11 +445,6 @@ void tracing_iter_reset(struct trace_iterator *iter, int 
cpu);
 
 void poll_wait_pipe(struct trace_iterator *iter);
 
-void ftrace(struct trace_array *tr,
-   struct trace_array_cpu *data,
-   unsigned long ip,
-   unsigned long parent_ip,
-   unsigned long flags, int pc);
 void tracing_sched_switch_trace(struct trace_array *tr,
struct task_struct *prev,
struct task_struct *next,
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH v3 08/12] tracing: remove obsolete macro guard _TRACE_PROFILE_INIT

2013-04-09 Thread zhangwei(Jovi)

From: "zhangwei(Jovi)" 

Macro _TRACE_PROFILE_INIT was removed at long time ago,
but leave guard "#undef" here, remove it.

Signed-off-by: zhangwei(Jovi) 
---
 include/trace/ftrace.h |2 --
 1 file changed, 2 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 743e754..b95cc52 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -677,5 +677,3 @@ static inline void perf_test_probe_##call(void) 
\
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 #endif /* CONFIG_PERF_EVENTS */
 
-#undef _TRACE_PROFILE_INIT
-
-- 
1.7.9.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 linux-next] cpufreq: ondemand: Calculate gradient of CPU load to early increase frequency

2013-04-09 Thread Viresh Kumar

On 9 April 2013 22:26, Stratos Karafotis  wrote:
> On 04/05/2013 10:50 PM, Stratos Karafotis wrote:
>>
>> Hi Viresh,
>>
>> On 04/04/2013 07:54 AM, Viresh Kumar wrote:
>>>
>>> Hi Stratos,
>>>
>>> Yes, your results show some improvements. BUT if performance is the only
>>> thing
>>> we were looking for, then we will never use ondemand governor but
>>> performance
>>> governor.
>>>
>>> I suspect this little increase in performance must have increased power
>>> numbers
>>> too (significantly). So, if you can get numbers in the form of
>>> power/performance
>>> with and without your patch, it will be great.
>>>
>>> --
>>> viresh
>>>
>>
>> I run some more tests. I increased the number of iterations to 100 (from
>> 20).
>> I also test for counter 1,000,000 (~4200us), 5,000,000 (~1us),
>> 15,000,000 (~3us).
>>
>> This time, I also extracted statistics from cpufreq_stats driver. I think
>> this will be an
>> indication for power consumption. Below the results and attached the
>> program I used for to
>> get these numbers.
>
>
> Any comments would be appreciated.

Sorry, i forgot about this mail earlier..

Your performance numbers look improved but i am still not sure about
power consumption. But as this is not going to be the default settings, i
think we can take this patch.

@Rafael:?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] mm, slub: count freed pages via rcu as this task's reclaimed_slab

2013-04-09 Thread Simon Jeons


Hi Christoph,
On 04/09/2013 10:32 PM, Christoph Lameter wrote:

On Tue, 9 Apr 2013, Simon Jeons wrote:


+   int pages = 1 << compound_order(page);

One question irrelevant this patch. Why slab cache can use compound
page(hugetlbfs pages/thp pages)? They are just used by app to optimize tlb
miss, is it?

Slab caches can use any order pages because these pages are never on
the LRU and are not part of the page cache. Large continuous physical
memory means that objects can be arranged in a more efficient way in the
page. This is particularly useful for larger objects where we might use a
lot of memory because objects do not fit well into a 4k page.

It also reduces the slab page management if higher order pages are used.
In the case of slub the page size also determines the number of objects
that can be allocated/freed without the need for some form of
synchronization.


It seems that you misunderstand my question. I don't doubt slab/slub can 
use high order pages. However, what I focus on is why slab/slub can use 
compound page, PageCompound() just on behalf of hugetlbfs pages or thp 
pages which should used by apps, isn't it?






--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

mfd, arizona: Fix the deadlock between interrupt handler and dpm_suspend

2013-04-09 Thread Chuansheng Liu


When system try to do the suspend:
T1:suspend_thread T2: interrupt thread handler
enter_state() arizona_irq_thread()
 suspend_devices_and_enter()regmap_read()
  __device_suspend()
  regmap_spi_read()
   spi_write_then_read()
spi_sync()
 __spi_sync()
  wait_for_completion()
  T2 <== Blocked here due to spi 
suspended
 suspend_enter()
  dpm_suspend_end()
   dpm_suspend_noirq()
suspend_device_irqs()
 synchronize_irq()
T1 <== Blocked here due waiting for T2 finished

Then T1 and T2 is deadlocked there.

Here the arizona irq is not NOSUSPEND irq, so when doing device suspend,
we can disable the arizona irq, and enable it until devices resuming finished.

Signed-off-by: liu chuansheng 
---
 drivers/mfd/arizona-core.c |   11 ---
 1 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/mfd/arizona-core.c b/drivers/mfd/arizona-core.c
index b562c7b..b11bd01 100644
--- a/drivers/mfd/arizona-core.c
+++ b/drivers/mfd/arizona-core.c
@@ -264,11 +264,11 @@ static int arizona_runtime_suspend(struct device *dev)
 #endif
 
 #ifdef CONFIG_PM_SLEEP
-static int arizona_resume_noirq(struct device *dev)
+static int arizona_suspend(struct device *dev)
 {
struct arizona *arizona = dev_get_drvdata(dev);
 
-   dev_dbg(arizona->dev, "Early resume, disabling IRQ\n");
+   dev_dbg(arizona->dev, "Suspend, disabling IRQ\n");
disable_irq(arizona->irq);
 
return 0;
@@ -278,7 +278,7 @@ static int arizona_resume(struct device *dev)
 {
struct arizona *arizona = dev_get_drvdata(dev);
 
-   dev_dbg(arizona->dev, "Late resume, reenabling IRQ\n");
+   dev_dbg(arizona->dev, "Resume, reenabling IRQ\n");
enable_irq(arizona->irq);
 
return 0;
@@ -289,10 +289,7 @@ const struct dev_pm_ops arizona_pm_ops = {
SET_RUNTIME_PM_OPS(arizona_runtime_suspend,
   arizona_runtime_resume,
   NULL)
-   SET_SYSTEM_SLEEP_PM_OPS(NULL, arizona_resume)
-#ifdef CONFIG_PM_SLEEP
-   .resume_noirq = arizona_resume_noirq,
-#endif
+   SET_SYSTEM_SLEEP_PM_OPS(arizona_suspend, arizona_resume)
 };
 EXPORT_SYMBOL_GPL(arizona_pm_ops);
 
-- 
1.7.0.4



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[BUG] Fatal exception in interrupt - nf_nat_cleanup_conntrack during IPv6 tests

2013-04-09 Thread CAI Qian

Just hit this very often during IPv6 tests in both the latest stable
and mainline kernel.

[ 3597.179161] general protection fault:  [#1] SMP  
[ 3597.206166] Modules linked in: btrfs(F) zlib_deflate(F) vfat(F) fat(F) 
nfs_layout_nfsv41_files(F) nfsv4(F) auth_rpcgss(F) nfsv3(F) nfs_acl(F) nfsv2(F) 
nfs(F) lockd(F) sunrpc(F) fscache(F) nfnetlink_log(F) nfnetlink(F) bluetooth(F) 
rfkill(F) arc4(F) md4(F) nls_utf8(F) cifs(F) dns_resolver(F) nf_tproxy_core(F) 
nls_koi8_u(F) nls_cp932(F) ts_kmp(F) sctp(F) nf_conntrack_netbios_ns(F) 
nf_conntrack_broadcast(F) ipt_MASQUERADE(F) ip6table_mangle(F) ip6t_REJECT(F) 
nf_conntrack_ipv6(F) nf_defrag_ipv6(F) nf_nat_ipv4(F-) nf_nat(F) 
iptable_mangle(F) ipt_REJECT(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) 
xt_conntrack(F) nf_conntrack(F) ebtable_filter(F) ebtables(F) 
ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) sg(F) 
coretemp(F) ixgbe(F) kvm_intel(F) kvm(F) ptp(F) iTCO_wdt(F) 
iTCO_vendor_support(F) crc32c_intel(F) pps_core(F) mdio(F) 
ghash_clmulni_intel(F) e1000e(F) lpc_ich(F) dca(F) hpilo(F) hpwdt(F) 
mfd_core(F) pcspkr(F) serio_raw(F) microcode(F) xfs(F) libcrc32c(F) 
ata_generic(F) mgag200(F) i2c_algo_bit(F) pata_acpi(F) drm_kms_helper(F) 
sd_mod(F) crc_t10dif(F) ttm(F) ata_piix(F) drm(F) hpsa(F) libata(F) i2c_core(F) 
dm_mirror(F) dm_region_hash(F) dm_log(F) dm_mod(F) [last unloaded: iptable_nat] 
[ 3597.750154] CPU 3  
[ 3597.759743] Pid: 0, comm: swapper/3 Tainted: GF3.8.5+ #1 HP 
ProLiant DL120 G7 
[ 3597.804861] RIP: 0010:[]  [] 
nf_nat_cleanup_conntrack+0x42/0x70 [nf_nat] 
[ 3597.855207] RSP: 0018:880202c63d40  EFLAGS: 00010246 
[ 3597.881350] RAX:  RBX: 8801ac7bec28 RCX: 
8801d0eedbe0 
[ 3597.917226] RDX: dead00200200 RSI: 0011 RDI: 
a03265b8 
[ 3597.952432] RBP: 880202c63d50 R08: 03451753e1c0 R09: 
dead00200200 
[ 3597.989578] R10:  R11: 0004 R12: 
8801ac7beba8 
[ 3598.026430] R13: 8801d0eedb58 R14: a02fe6a0 R15: 
8801d0eedb58 
[ 3598.061575] FS:  () GS:880202c6() 
knlGS: 
[ 3598.102232] CS:  0010 DS:  ES:  CR0: 80050033 
[ 3598.131044] CR2: 0280e508 CR3: 018f2000 CR4: 
000407e0 
[ 3598.166274] DR0:  DR1:  DR2: 
 
[ 3598.202110] DR3:  DR6: 0ff0 DR7: 
0400 
[ 3598.238221] Process swapper/3 (pid: 0, threadinfo 8801ffaee000, task 
8801ffaf5010) 
[ 3598.282292] Stack: 
[ 3598.292908]   0001 880202c63d80 
a0305bb4 
[ 3598.332820]  8161646d 8801d0eedb58 819a5d40 
0100 
[ 3598.371541]  880202c63da0 a02fd3fe 8801d0eedb58 
819a5d40 
[ 3598.409036] Call Trace: 
[ 3598.421036]
[ 3598.430467]  [] __nf_ct_ext_destroy+0x44/0x60 
[nf_conntrack] 
[ 3598.467970]  [] ? notifier_call_chain+0x4d/0x70 
[ 3598.499191]  [] nf_conntrack_free+0x2e/0x70 [nf_conntrack] 
[ 3598.534121]  [] destroy_conntrack+0xbd/0x110 
[nf_conntrack] 
[ 3598.569981]  [] nf_conntrack_destroy+0x17/0x20 
[ 3598.599579]  [] death_by_timeout+0xdc/0x1b0 [nf_conntrack] 
[ 3598.634842]  [] ? kill_report+0x180/0x180 [nf_conntrack] 
[ 3598.669352]  [] call_timer_fn+0x3a/0x120 
[ 3598.696383]  [] ? kill_report+0x180/0x180 [nf_conntrack] 
[ 3598.730896]  [] ? cpufreq_p4_target+0x130/0x130 
[ 3598.762416]  [] run_timer_softirq+0x1fe/0x2b0 
[ 3598.793980]  [] ? cpufreq_p4_target+0x130/0x130 
[ 3598.826872]  [] __do_softirq+0xd0/0x210 
[ 3598.855564]  [] ? native_sched_clock+0x13/0x80 
[ 3598.887548]  [] ? cpufreq_p4_target+0x130/0x130 
[ 3598.920717]  [] call_softirq+0x1c/0x30 
[ 3598.947948]  [] do_softirq+0x75/0xb0 
[ 3598.973283]  [] irq_exit+0xb5/0xc0 
[ 3598.998710]  [] smp_apic_timer_interrupt+0x6e/0x99 
[ 3599.030033]  [] apic_timer_interrupt+0x6d/0x80 
[ 3599.060308]
[ 3599.069737]  [] ? cpuidle_wrap_enter+0x50/0xa0 
[ 3599.100918]  [] ? cpuidle_wrap_enter+0x49/0xa0 
[ 3599.130548]  [] cpuidle_enter_tk+0x10/0x20 
[ 3599.158994]  [] cpuidle_idle_call+0xa9/0x260 
[ 3599.187843]  [] cpu_idle+0xaf/0x120 
[ 3599.213581]  [] start_secondary+0x255/0x257 
[ 3599.241868] Code: 83 ec 08 0f b6 58 11 84 db 74 43 48 01 c3 48 83 7b 20 00 
74 39 48 c7 c7 b8 65 32 a0 e8 98 fc 2e e1 48 8b 03 48 8b 53 08 48 85 c0 <48> 89 
02 74 04 48 89 50 08 48 ba 00 02 20 00 00 00 ad de 48 c7  
[ 3599.337037] RIP  [] nf_nat_cleanup_conntrack+0x42/0x70 
[nf_nat] 
[ 3599.378020]  RSP  
[ 3599.402446] bad: scheduling from the idle thread! 
[ 3599.428245] Pid: 0, comm: swapper/3 Tainted: GF3.8.5+ #1 
[ 3599.463162] Call Trace: 
[ 3599.476367][] dequeue_task_idle+0x2f/0x40 
[ 3599.508112]  [] dequeue_task+0x8e/0xa0 
[ 3599.535485]  [] deactivate_task+0x23/0x30 
[ 3599.563276]  [] __schedule+0x599/0x7b0 
[ 3599.590450]  [] ? sched_clock+0x9/0x10 
[ 3599.616588]  [] schedule+0x29/0x70 
[ 3599.641650]

[PATCH] checkpatch: Warn on comparisons to true and false

2013-04-09 Thread Joe Perches

Comparisons of A to true and false are better written
as A and !A.

Bleat a message on use.

Signed-off-by: Joe Perches 
---
 scripts/checkpatch.pl | 17 +
 1 file changed, 17 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 3fb6d86..080e7f6 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -3538,6 +3538,23 @@ sub process {
 "Using yield() is generally wrong. See yield() 
kernel-doc (sched/core.c)\n"  . $herecurr);
}
 
+# check for comparisons against true and false
+   if ($line =~ 
/\+\s*(.*?)($Lval)\s*(==|\!=)\s*(true|false)\b(.*)$/i) {
+   my $lead = $1;
+   my $arg = $2;
+   my $test = $3;
+   my $otype = $4;
+   my $trail = $5;
+   my $type = lc($otype);
+   my $op = "!";
+   if (("$test" eq "==" && "$type" eq "true") ||
+   ("$test" eq "!=" && "$type" eq "false")) {
+   $op = "";
+   }
+   WARN("BOOL_COMPARISON",
+"Using comparison to $otype is poor style. Use 
'${lead}${op}${arg}${trail}'\n" . $herecurr);
+   }
+
 # check for semaphores initialized locked
if ($line =~ /^.\s*sema_init.+,\W?0\W?\)/) {
WARN("CONSIDER_COMPLETION",


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 3/3] pstore/ram: avoid atomic accesses for ioremapped regions

2013-04-09 Thread Rob Herring

From: Rob Herring 

For persistent RAM outside of main memory, the memory may have limitations
on supported accesses. For internal RAM on highbank platform exclusive
accesses are not supported and will hang the system. So atomic_cmpxchg
cannot be used. This commit uses spinlock protection for buffer size and
start updates on ioremapped regions instead.

Signed-off-by: Rob Herring 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: linux-kernel@vger.kernel.org
---
 fs/pstore/ram_core.c |   54 --
 1 file changed, 52 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index e126d9f..97e640b 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -46,7 +46,7 @@ static inline size_t buffer_start(struct persistent_ram_zone 
*prz)
 }
 
 /* increase and wrap the start pointer, returning the old value */
-static inline size_t buffer_start_add(struct persistent_ram_zone *prz, size_t 
a)
+static size_t buffer_start_add_atomic(struct persistent_ram_zone *prz, size_t 
a)
 {
int old;
int new;
@@ -62,7 +62,7 @@ static inline size_t buffer_start_add(struct 
persistent_ram_zone *prz, size_t a)
 }
 
 /* increase the size counter until it hits the max size */
-static inline void buffer_size_add(struct persistent_ram_zone *prz, size_t a)
+static void buffer_size_add_atomic(struct persistent_ram_zone *prz, size_t a)
 {
size_t old;
size_t new;
@@ -78,6 +78,53 @@ static inline void buffer_size_add(struct 
persistent_ram_zone *prz, size_t a)
} while (atomic_cmpxchg(>buffer->size, old, new) != old);
 }
 
+static DEFINE_RAW_SPINLOCK(buffer_lock);
+
+/* increase and wrap the start pointer, returning the old value */
+static size_t buffer_start_add_locked(struct persistent_ram_zone *prz, size_t 
a)
+{
+   int old;
+   int new;
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(_lock, flags);
+
+   old = atomic_read(>buffer->start);
+   new = old + a;
+   while (unlikely(new > prz->buffer_size))
+   new -= prz->buffer_size;
+   atomic_set(>buffer->start, new);
+
+   raw_spin_unlock_irqrestore(_lock, flags);
+
+   return old;
+}
+
+/* increase the size counter until it hits the max size */
+static void buffer_size_add_locked(struct persistent_ram_zone *prz, size_t a)
+{
+   size_t old;
+   size_t new;
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(_lock, flags);
+
+   old = atomic_read(>buffer->size);
+   if (old == prz->buffer_size)
+   goto exit;
+
+   new = old + a;
+   if (new > prz->buffer_size)
+   new = prz->buffer_size;
+   atomic_set(>buffer->size, new);
+
+exit:
+   raw_spin_unlock_irqrestore(_lock, flags);
+}
+
+static size_t (*buffer_start_add)(struct persistent_ram_zone *, size_t) = 
buffer_start_add_atomic;
+static void (*buffer_size_add)(struct persistent_ram_zone *, size_t) = 
buffer_size_add_atomic;
+
 static void notrace persistent_ram_encode_rs8(struct persistent_ram_zone *prz,
uint8_t *data, size_t len, uint8_t *ecc)
 {
@@ -364,6 +411,9 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t 
size)
return NULL;
}
 
+   buffer_start_add = buffer_start_add_locked;
+   buffer_size_add = buffer_size_add_locked;
+
return ioremap_wc(start, size);
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 1/3] pstore-ram: use write-combine mappings

2013-04-09 Thread Rob Herring

From: Rob Herring 

Atomic operations are undefined behavior on ARM for device or strongly
ordered memory types. So use write-combine variants for mappings. This
corresponds to normal, non-cacheable memory on ARM. For many other
architectures, this change should not change the mapping type.

Signed-off-by: Rob Herring 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: linux-kernel@vger.kernel.org
---
 fs/pstore/ram_core.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index 0306303..e126d9f 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -337,7 +337,7 @@ static void *persistent_ram_vmap(phys_addr_t start, size_t 
size)
page_start = start - offset_in_page(start);
page_count = DIV_ROUND_UP(size + offset_in_page(start), PAGE_SIZE);
 
-   prot = pgprot_noncached(PAGE_KERNEL);
+   prot = pgprot_writecombine(PAGE_KERNEL);
 
pages = kmalloc(sizeof(struct page *) * page_count, GFP_KERNEL);
if (!pages) {
@@ -364,7 +364,7 @@ static void *persistent_ram_iomap(phys_addr_t start, size_t 
size)
return NULL;
}
 
-   return ioremap(start, size);
+   return ioremap_wc(start, size);
 }
 
 static int persistent_ram_buffer_map(phys_addr_t start, phys_addr_t size,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC PATCH 2/3] pstore ram: remove the power of buffer size limitation

2013-04-09 Thread Rob Herring

From: Rob Herring 

There doesn't appear to be any reason for the overall pstore RAM buffer to
be a power of 2 size, so remove it. The individual console, ftrace and oops
buffers are still a power of 2 size.

Signed-off-by: Rob Herring 
Cc: Anton Vorontsov 
Cc: Colin Cross 
Cc: Kees Cook 
Cc: Tony Luck 
Cc: linux-kernel@vger.kernel.org
---
 fs/pstore/ram.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 288f068..f980077 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -391,8 +391,6 @@ static int ramoops_probe(struct platform_device *pdev)
goto fail_out;
}
 
-   if (!is_power_of_2(pdata->mem_size))
-   pdata->mem_size = rounddown_pow_of_two(pdata->mem_size);
if (!is_power_of_2(pdata->record_size))
pdata->record_size = rounddown_pow_of_two(pdata->record_size);
if (!is_power_of_2(pdata->console_size))
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 2/2] initramfs with digital signature protection

2013-04-09 Thread Mimi Zohar

On Tue, 2013-04-09 at 10:38 -0400, Vivek Goyal wrote:
> On Mon, Apr 08, 2013 at 04:17:56PM -0400, Josh Boyer wrote:
> 
> [..]
> > >> > I was thinking about this point that keys can be loaded from signed
> > >> > initramfs. But how is it better than embedding the keys in kernel the
> > >> > way we do for module signing and lock down ima keyring before control
> > >> > is passed to initramfs.
> > >> >
> > >> > Reason being, that anyway a user can not put its own keys in signed
> > >> > initramfs. Signed initramfs will be shipped by distribution. So then
> > >> > it does not matter whether distribution's keys are embedded in the
> > >> > kernel or are loaded from signed initramfs.
> > >>
> > >> Although both the early initramfs and the kernel are signed, building
> > >> the keys into the kernel implies a static set of predefined public keys,
> > >> while the initramfs could load, in addition to the distro keys, keys
> > >> from the UEFI databases.
> > >
> > > Kernel already loads all the keys from UEFI database and MOK into module
> > > keyring.
> > 
> > Small point of order: there are patches to allow the kernel to do this.
> > None of those patches are upstream.
> 
> Ok, thanks Josh. We had been talking about copying all the UEFI keys
> (including dbx) and MOK keys, so I assumed that patches already went in.
> 
> So assuming all that will go in kernel (as it is required for module
> signature too), does not look like we will benefit from signed initramfs.

The module keyring is a special case.  Loading these keys from the
kernel and, presumably, locking the keyring is probably fine.  In the
case of IMA, however, files will be signed by any number of package
owners.  If the _ima keyring is locked by the kernel, how would you add
these other keys?

thanks,

Mimi



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/18] cpufreq: sh: move cpufreq driver to drivers/cpufreq

2013-04-09 Thread Viresh Kumar

On 10 April 2013 07:42, Simon Horman  wrote:
> Thanks, I understand.
>
> I have no objections to this, but Paul should probably review it.

It is already Acked by him and applied by Rafael.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 0/4] Support vranges on files

2013-04-09 Thread Minchan Kim

On Tue, Apr 09, 2013 at 03:36:20PM -0700, John Stultz wrote:
> On 04/08/2013 10:07 PM, Minchan Kim wrote:
> >On Mon, Apr 08, 2013 at 08:27:50PM -0700, John Stultz wrote:
> >>marked volatile, it should remain volatile until someone who has the
> >>file open marks it as non-volatile.  The only time we clear the
> >>volatility is when the file is closed by all users.
> >Yes. We need it that clear volatile ranges when the file is closed
> >by ball users. That's what we need and blow my concern out.
> 
> Ok, sorry this wasn't more clear. In all the implementations I've
> pushed, the volatility only persists as long as someone holds the
> file open. Once its closed by all users, the volatility is cleared.

I now confirmed it with your implementation.
Sorry for the confusing without looking into your code in detail. :(

> 
> Hopefully that calms your worries here. :)

Yeb.

> 
> 
> 
> >>I think the concern about surprising an application that isn't
> >>expecting volatility is odd, since if an application jumped in and
> >>punched a hole in the data, that could surprise other applications
> >>as well.  If you're going to use a file that can be shared,
> >>applications have to deal with potential changes to that file by
> >>others.
> >True. My concern is delayed punching without any client of fd and
> >there is no interface to detect some range of file is volatile state or
> >not. It means anyone mapped a file with shared could encunter SIGBUS
> >although he try to best effort to check it with lsof before using.
> 
> I'll grant the SIGBUG semantics create the potential for stranger
> behavior then usual, but I think the use cases are still attractive
> enough to try to make it work.

Indeed.

> 
> 
> >>To me, the value in using volatile ranges on the file data is
> >>exactly because the file data can be shared. So it makes sense to me
> >>to have the volatility state be like the data in the file. I guess
> >>the only exception in my case is that if all the references to a
> >>file are closed, we can clear the volatility (since we don't have a
> >>sane way for the volatility to persist past that point).
> >Agree if you provide to clear out volatility when file are closed by
> >all stakeholder.
> 
> Agreed.
> 
> 
> >>One question that might help resolve this: Would having some sort of
> >>volatility checking interface be helpful in easing your concern
> >>about applications being surprised by volatility?
> >If we can provide above things, I think we don't need such interface
> >until someone want it with reasonable logic.
> 
> Sure, I just wanted to know if you saw a need right away. For now we
> can leave it be.
> 
> >>True. And performance needs to be good if this hinting interface is
> >>to be used easily. Although I worry about performance trumping sane
> >>semantics. So let me try to implement the desired behavior and we
> >>can measure the difference.
> >NP. But keep in mind that mmap_sem was really terrible for performance
> >when I took a expereiment(ie, concurrent page fault by many threads
> >while a thread calls mmap).
> >I guess primary reason is CONFIG_MUTEX_SPIN_ON_OWNER.
> >So at least, we should avoid it by introducing new mode like
> >VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH if we want to
> >support mvrange-file and mvragne interface was thing userland people
> >really want although ashmem have used fd-based model.
> 
> The VOLATILE_ANON|VOLATILE_FILE|VOLATILE_BOTH may be an interesting
> compromise.
> 
> Though, if one marks a VOLATILE_ANON range on an address that's an
> mmaped file, how do we detect this and provide a sane error value
> without checking the vmas?
> 

Should we check vma?
If there are conflict with existing vrange type, just return an -EINVAL?

> 
> thanks
> -john
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] efi: Determine how much space is used by boot services-only variables

2013-04-09 Thread Matthew Garrett

EFI variables can be flagged as being accessible only within boot services.
This makes it awkward for us to figure out how much space they use at
runtime. In theory we could figure this out by simply comparing the results
from QueryVariableInfo() to the space used by all of our variables, but
that fails if the platform doesn't garbage collect on every boot. Thankfully,
calling QueryVariableInfo() while still inside boot services gives a more
reliable answer. This patch passes that information from the EFI boot stub
up to the efivars code, letting us calculate a reasonably accurate value.

Signed-off-by: Matthew Garrett 
---
 arch/x86/boot/compressed/eboot.c  | 47 +++
 arch/x86/include/asm/efi.h| 10 
 arch/x86/include/uapi/asm/bootparam.h |  1 +
 arch/x86/platform/efi/efi.c   | 24 ++
 drivers/firmware/efivars.c| 29 +
 5 files changed, 111 insertions(+)

diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c
index c205035..8615f75 100644
--- a/arch/x86/boot/compressed/eboot.c
+++ b/arch/x86/boot/compressed/eboot.c
@@ -251,6 +251,51 @@ static void find_bits(unsigned long mask, u8 *pos, u8 
*size)
*size = len;
 }
 
+static efi_status_t setup_efi_vars(struct boot_params *params)
+{
+   struct setup_data *data;
+   struct efi_var_bootdata *efidata;
+   u64 store_size, remaining_size, var_size;
+   efi_status_t status;
+
+   if (!sys_table->runtime->query_variable_info)
+   return EFI_UNSUPPORTED;
+
+   data = (struct setup_data *)(unsigned long)params->hdr.setup_data;
+
+   while (data && data->next)
+   data = (struct setup_data *)(unsigned long)data->next;
+
+   status = efi_call_phys4(sys_table->runtime->query_variable_info,
+   EFI_VARIABLE_NON_VOLATILE |
+   EFI_VARIABLE_BOOTSERVICE_ACCESS |
+   EFI_VARIABLE_RUNTIME_ACCESS, _size,
+   _size, _size);
+
+   if (status != EFI_SUCCESS)
+   return status;
+
+   status = efi_call_phys3(sys_table->boottime->allocate_pool,
+   EFI_LOADER_DATA, sizeof(*efidata), );
+
+   if (status != EFI_SUCCESS)
+   return status;
+
+   efidata->data.type = SETUP_EFI_VARS;
+   efidata->data.len = sizeof(struct efi_var_bootdata) -
+   sizeof(struct setup_data);
+   efidata->data.next = 0;
+   efidata->store_size = store_size;
+   efidata->remaining_size = remaining_size;
+   efidata->max_var_size = var_size;
+
+   if (data)
+   data->next = (unsigned long)efidata;
+   else
+   params->hdr.setup_data = (unsigned long)efidata;
+
+}
+
 static efi_status_t setup_efi_pci(struct boot_params *params)
 {
efi_pci_io_protocol *pci;
@@ -1157,6 +1202,8 @@ struct boot_params *efi_main(void *handle, 
efi_system_table_t *_table,
 
setup_graphics(boot_params);
 
+   setup_efi_vars(boot_params);
+
setup_efi_pci(boot_params);
 
status = efi_call_phys3(sys_table->boottime->allocate_pool,
diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
index 60c89f3..6c3a154 100644
--- a/arch/x86/include/asm/efi.h
+++ b/arch/x86/include/asm/efi.h
@@ -93,6 +93,9 @@ extern void __iomem *efi_ioremap(unsigned long addr, unsigned 
long size,
 
 #endif /* CONFIG_X86_32 */
 
+extern u64 efi_var_store_size;
+extern u64 efi_var_remaining_size;
+extern u64 efi_var_max_var_size;
 extern int add_efi_memmap;
 extern unsigned long x86_efi_facility;
 extern void efi_set_executable(efi_memory_desc_t *md, bool executable);
@@ -102,6 +105,13 @@ extern void efi_call_phys_epilog(void);
 extern void efi_unmap_memmap(void);
 extern void efi_memory_uc(u64 addr, unsigned long size);
 
+struct efi_var_bootdata {
+   struct setup_data data;
+   u64 store_size;
+   u64 remaining_size;
+   u64 max_var_size;
+};
+
 #ifdef CONFIG_EFI
 
 static inline bool efi_is_native(void)
diff --git a/arch/x86/include/uapi/asm/bootparam.h 
b/arch/x86/include/uapi/asm/bootparam.h
index c15ddaf..0874424 100644
--- a/arch/x86/include/uapi/asm/bootparam.h
+++ b/arch/x86/include/uapi/asm/bootparam.h
@@ -6,6 +6,7 @@
 #define SETUP_E820_EXT 1
 #define SETUP_DTB  2
 #define SETUP_PCI  3
+#define SETUP_EFI_VARS 4
 
 /* ram_size flags */
 #define RAMDISK_IMAGE_START_MASK   0x07FF
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index c89c245..659da48 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -71,6 +71,13 @@ static efi_system_table_t efi_systab __initdata;
 
 unsigned long x86_efi_facility;
 
+u64 efi_var_store_size;
+EXPORT_SYMBOL(efi_var_store_size);
+u64 efi_var_remaining_size;
+EXPORT_SYMBOL(efi_var_remaining_size);
+u64

[PATCH 3/3] efi: Distinguish between "remaining space" and actually used space

2013-04-09 Thread Matthew Garrett

EFI implementations distinguish between space that is actively used by a
variable and space that merely hasn't been garbage collected yet. Space
that hasn't yet been garbage collected isn't available for use and so isn't
counted in the remaining_space field returned by QueryVariableInfo().

Combined with commit 68d9298 this can cause problems. Some implementations
don't garbage collect until the remaining space is smaller than the maximum
variable size, and as a result check_var_size() will always fail once more
than 50% of the variable store has been used even if most of that space is
marked as available for garbage collection. The user is unable to create
new variables, and deleting variables doesn't increase the remaining space.

The problem that 68d9298 was attempting to avoid was one where certain
platforms fail if the actively used space is greater than 50% of the
available storage space. We should be able to calculate that by simply
summing the size of each available variable and subtracting that from
the total storage space. With luck this will fix the problem described in
https://bugzilla.kernel.org/show_bug.cgi?id=55471 without permitting
damage to occur to the machines 68d9298 was attempting to fix.

Signed-off-by: Matthew Garrett 
---
 drivers/firmware/efivars.c | 104 +++--
 1 file changed, 101 insertions(+), 3 deletions(-)

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 60e7d8f..f5a4e87 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -104,6 +104,13 @@ MODULE_VERSION(EFIVARS_VERSION);
  */
 #define GUID_LEN 36
 
+/*
+ * There's some additional metadata associated with each
+ * variable. Intel's reference implementation is 60 bytes - bump that
+ * to account for potential alignment constraints
+ */
+#define VAR_METADATA_SIZE 64
+
 static bool efivars_pstore_disable =
IS_ENABLED(CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE);
 
@@ -405,6 +412,52 @@ validate_var(struct efi_variable *var, u8 *data, unsigned 
long len)
 }
 
 static efi_status_t
+get_var_size_locked(struct efivars *efivars, struct efi_variable *var)
+{
+   efi_status_t status;
+   void *dummy;
+
+   var->DataSize = 0;
+   status = efivars->ops->get_variable(var->VariableName,
+   >VendorGuid,
+   >Attributes,
+   >DataSize,
+   var->Data);
+
+   if (status != EFI_BUFFER_TOO_SMALL)
+   return status;
+
+   dummy = kmalloc(var->DataSize, GFP_ATOMIC);
+
+   status = efivars->ops->get_variable(var->VariableName,
+   >VendorGuid,
+   >Attributes,
+   >DataSize,
+   dummy);
+
+   kfree(dummy);
+
+   return status;
+}
+
+static efi_status_t
+get_var_size(struct efivars *efivars, struct efi_variable *var)
+{
+   efi_status_t status;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   status = get_var_size_locked(efivars, var);
+   spin_unlock_irqrestore(>lock, flags);
+
+   if (status != EFI_SUCCESS) {
+   printk(KERN_WARNING "efivars: Failed to get var size 0x%lx!\n",
+   status);
+   }
+   return status;
+}
+
+static efi_status_t
 get_var_data_locked(struct efivars *efivars, struct efi_variable *var)
 {
efi_status_t status;
@@ -415,6 +468,10 @@ get_var_data_locked(struct efivars *efivars, struct 
efi_variable *var)
>Attributes,
>DataSize,
var->Data);
+
+   if (status != EFI_SUCCESS)
+   var->DataSize = 0;
+
return status;
 }
 
@@ -440,8 +497,18 @@ check_var_size_locked(struct efivars *efivars, u32 
attributes,
unsigned long size)
 {
u64 storage_size, remaining_size, max_size;
+   struct efivar_entry *entry;
+   struct efi_variable *var;
efi_status_t status;
const struct efivar_operations *fops = efivars->ops;
+   unsigned long active_size = 0;
+
+   /*
+* Any writes other than EFI_VARIABLE_NON_VOLATILE will only hit
+* RAM, not flash, so ignore them.
+*/
+   if (!(attributes & EFI_VARIABLE_NON_VOLATILE))
+   return EFI_SUCCESS;
 
if (!efivars->ops->query_variable_info)
return EFI_UNSUPPORTED;
@@ -452,8 +519,39 @@ check_var_size_locked(struct efivars *efivars, u32 
attributes,
if (status != EFI_SUCCESS)
return status;
 
-   if (!storage_size || size > remaining_size || size > max_size ||
-   (remaining_size - size) < (storage_size / 2))
+   list_for_each_entry(entry, >list, list)

[PATCH 2/3] Revert "x86, efivars: firmware bug workarounds should be in platform code"

2013-04-09 Thread Matthew Garrett

This reverts commit a6e4d5a03e9e3587e88aba687d8f225f4f04c792. Doing this
workaround properly requires us to work within the variable code.

Signed-off-by: Matthew Garrett 
---
 arch/x86/platform/efi/efi.c | 25 -
 drivers/firmware/efivars.c  | 18 +++---
 include/linux/efi.h |  9 +
 3 files changed, 16 insertions(+), 36 deletions(-)

diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
index 659da48..fdc5074 100644
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -1023,28 +1023,3 @@ u64 efi_mem_attributes(unsigned long phys_addr)
}
return 0;
 }
-
-/*
- * Some firmware has serious problems when using more than 50% of the EFI
- * variable store, i.e. it triggers bugs that can brick machines. Ensure that
- * we never use more than this safe limit.
- *
- * Return EFI_SUCCESS if it is safe to write 'size' bytes to the variable
- * store.
- */
-efi_status_t efi_query_variable_store(u32 attributes, unsigned long size)
-{
-   efi_status_t status;
-   u64 storage_size, remaining_size, max_size;
-
-   status = efi.query_variable_info(attributes, _size,
-_size, _size);
-   if (status != EFI_SUCCESS)
-   return status;
-
-   if (!storage_size || size > remaining_size || size > max_size ||
-   (remaining_size - size) < (storage_size / 2))
-   return EFI_OUT_OF_RESOURCES;
-
-   return EFI_SUCCESS;
-}
diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 684a118..60e7d8f 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -439,12 +439,24 @@ static efi_status_t
 check_var_size_locked(struct efivars *efivars, u32 attributes,
unsigned long size)
 {
+   u64 storage_size, remaining_size, max_size;
+   efi_status_t status;
const struct efivar_operations *fops = efivars->ops;
 
-   if (!efivars->ops->query_variable_store)
+   if (!efivars->ops->query_variable_info)
return EFI_UNSUPPORTED;
 
-   return fops->query_variable_store(attributes, size);
+   status = fops->query_variable_info(attributes, _size,
+  _size, _size);
+
+   if (status != EFI_SUCCESS)
+   return status;
+
+   if (!storage_size || size > remaining_size || size > max_size ||
+   (remaining_size - size) < (storage_size / 2))
+   return EFI_OUT_OF_RESOURCES;
+
+   return status;
 }
 
 
@@ -2144,7 +2156,7 @@ efivars_init(void)
ops.get_variable = efi.get_variable;
ops.set_variable = efi.set_variable;
ops.get_next_variable = efi.get_next_variable;
-   ops.query_variable_store = efi_query_variable_store;
+   ops.query_variable_info = efi.query_variable_info;
 
 #ifdef CONFIG_X86
boot_used_size = efi_var_store_size - efi_var_remaining_size;
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 3d7df3d..9bf2f1f 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -333,7 +333,6 @@ typedef efi_status_t 
efi_query_capsule_caps_t(efi_capsule_header_t **capsules,
  unsigned long count,
  u64 *max_size,
  int *reset_type);
-typedef efi_status_t efi_query_variable_store_t(u32 attributes, unsigned long 
size);
 
 /*
  *  EFI Configuration Table and GUID definitions
@@ -576,15 +575,9 @@ extern void efi_enter_virtual_mode (void); /* switch EFI 
to virtual mode, if pos
 #ifdef CONFIG_X86
 extern void efi_late_init(void);
 extern void efi_free_boot_services(void);
-extern efi_status_t efi_query_variable_store(u32 attributes, unsigned long 
size);
 #else
 static inline void efi_late_init(void) {}
 static inline void efi_free_boot_services(void) {}
-
-static inline efi_status_t efi_query_variable_store(u32 attributes, unsigned 
long size)
-{
-   return EFI_SUCCESS;
-}
 #endif
 extern void __iomem *efi_lookup_mapped_addr(u64 phys_addr);
 extern u64 efi_get_iobase (void);
@@ -738,7 +731,7 @@ struct efivar_operations {
efi_get_variable_t *get_variable;
efi_get_next_variable_t *get_next_variable;
efi_set_variable_t *set_variable;
-   efi_query_variable_store_t *query_variable_store;
+   efi_query_variable_info_t *query_variable_info;
 };
 
 struct efivars {
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Sys V shared memory limited to 8TiB.

2013-04-09 Thread Robin Holt

Trying to run an application which was trying to put data into half
of memory using shmget(), we found that having a shmall value below
8EiB-8TiB would prevent us from using anything more than 8TiB.  By setting
kernel.shmall greater that 8EiB-8TiB would make the job work.

In the newseg() function, ns->shm_tot which, at 8TiB is INT_MAX.

ipc/shm.c:
 458 static int newseg(struct ipc_namespace *ns, struct ipc_params *params)
 459 {
...
 465 int numpages = (size + PAGE_SIZE -1) >> PAGE_SHIFT;
...
 474 if (ns->shm_tot + numpages > ns->shm_ctlall)
 475 return -ENOSPC;

Signed-off-by: Robin Holt 
Reported-by: Alex Thorlton 
To: Andrew Morton 
Cc: Stable Kernel Maintainers 
Cc: lkml 

---

 include/linux/ipc_namespace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index ae221a7..ca62eca 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -43,8 +43,8 @@ struct ipc_namespace {
 
size_t  shm_ctlmax;
size_t  shm_ctlall;
+   unsigned long   shm_tot;
int shm_ctlmni;
-   int shm_tot;
/*
 * Defines whether IPC_RMID is forced for _all_ shm segments regardless
 * of shmctl()
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3.8.y] drm/i915: add quirk to invert brightness on eMachines e725

2013-04-09 Thread Ben Hutchings

On Tue, 2013-04-09 at 10:22 -0400, Josh Boyer wrote:
> On Tue, Apr 09, 2013 at 05:12:18PM +0300, Jani Nikula wrote:
> > On Tue, 09 Apr 2013, Josh Boyer  wrote:
> > > Upstream commit 01e3a8feb40e54b962a20fa7eb595c5efef5e109
> > 
> > This patch seems to be the above commit and
> > 
> > commit 160320879830e469e26062c18f75236822ba
> > Author: Jani Nikula 
> > Date:   Tue Jan 22 12:50:34 2013 +0200
> > 
> > drm/i915: add quirk to invert brightness on eMachines G725
> > 
> > squashed together. There's a separate bug for that too:
> > https://bugs.freedesktop.org/show_bug.cgi?id=59628
> 
> Oh...  that actually does seem to be the case.  I did a cherry-pick
> of just the commit I mentioned on top of 3.8.6 and it seems I resolved
> the conflict by including in changes for both.  The conflict came up as:
> 
> <<< HEAD
> /* Acer Aspire 4736Z */
> { 0x2a42, 0x1025, 0x0260, quirk_invert_brightness },
> ===
> /* Acer/eMachines G725 */
> { 0x2a42, 0x1025, 0x0210, quirk_invert_brightness },
> 
> /* Acer/eMachines e725 */
> { 0x2a42, 0x1025, 0x0212, quirk_invert_brightness },
> >>> 01e3a8f... drm/i915: add quirk to invert brightness on eMachines e725
> 
> Thanks for catching that.
> 
> > I think both are okay for stable, but by the stable rules you should
> > probably split this up, with the appropriate upstream references in
> > both. Or do whatever the stable team tells you to do. ;)
> 
> Yeah.  I'm happy to split them up and send them out separately.  Or if
> Greg wants to just list both upstream commit IDs with a single backport,
> that would also be fine.

I cherry-picked the following series for Debian:

[pre-3.8]
7bd90909bbf9 drm/i915: panel: invert brightness via parameter
4dca20efb1a9 drm/i915: panel: invert brightness via quirk
5a15ab5b93e4 drm/i915: panel: invert brightness acer aspire 5734z
5f85f176c2f1 DRM/i915: Add QUIRK_INVERT_BRIGHTNESS for NCR machines.
[post-3.8]
16032087 drm/i915: add quirk to invert brightness on eMachines G725
01e3a8feb40e drm/i915: add quirk to invert brightness on eMachines e725
5559ecadad5a drm/i915: add quirk to invert brightness on Packard Bell NCL20

Should these all be applied to the various 3.x.y stable branches?

Ben.

-- 
Ben Hutchings
The program is absolutely right; therefore, the computer must be wrong.


signature.asc
Description: This is a digitally signed message part

Re: [patch v3 6/8] sched: consider runnable load average in move_tasks

2013-04-09 Thread Alex Shi

On 04/09/2013 11:16 PM, Vincent Guittot wrote:
>> > Thanks a lot for info sharing! Vincent.
>> >
>> > But I checked the rq->avg and task->se.avg, seems none of them are
>> > possible be updated on different CPU at the same time. So my printk
>> > didn't catch this with benchmark kbuild and aim7 on my SNB EP box.
> The problem can happen because reader and writer are accessing the
> variable asynchronously and on different CPUs
> 
> CPUA write runnable_avg_sum
> CPUB read runnable_avg_sum
> CPUB read runnable_avg_period
> CPUA write runnable_avg_period
> 
> I agree that the time window, during which this can occur, is short
> but not impossible

May I didn't say clear. Vincent.

member of rq struct include avg, should be protected by rq->lock when
other cpu want to access them. Or be accessed only by local cpu. So they
should be no above situation.
And for per task avg, the task is impossible to be on different cpu at
the same time. so there are also no above problem.

I thought the problem may exists for middle level entity, but seems task
group has each of entity on every cpu. So there is no this problem too.

So, you may better to hold rq->lock when check the buddy cpu info.

Correct me if sth wrong. :)

-- 
Thanks Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/10] memory-hotplug: enable memory hotplug to handle hugepage

2013-04-09 Thread Naoya Horiguchi

On Tue, Apr 09, 2013 at 09:56:58PM -0400, KOSAKI Motohiro wrote:
> On Tue, Apr 9, 2013 at 6:43 PM, Naoya Horiguchi
...
> > MIGRATE_ISOLTE is changed only within the range [start_pfn, end_pfn)
> > given as the argument of __offline_pages (see also 
> > start_isolate_page_range),
> > so it's set only for pages within the single memblock to be offlined.
> 
> When partial memory hot remove, that's correct behavior. different
> node is not required.
> 
> > BTW, in previous discussion I already agreed with checking migrate type
> > in hugepage allocation code (maybe it will be in dequeue_huge_page_vma(),)
> > so what you concern should be solved in the next post.
> 
> Umm.. Maybe I missed such discussion. Do you have a pointer?

Please see the bottom of the following:
  http://thread.gmane.org/gmane.linux.kernel.mm/96665/focus=96920
It's not exactly the same, but we need to prevent the allocation
from the memblock under hotremoving not only to be efficient,
but also to avoid the race.

Thanks,
Naoya
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] xen: minor fix in apic ipi interface func

2013-04-09 Thread Zhenzhong Duan



On 2013-04-10 03:20, Konrad Rzeszutek Wilk wrote:

On Tue, Apr 09, 2013 at 06:51:09PM +0800, Zhenzhong Duan wrote:

xen_send_IPI_mask_allbutself uses native vector as input other than xen_vector.

Ouch. But it looks as the only user of the .send_IPI_mask_allbutself
is just xen_send_IPI_allbutself? Or is there another user of this?
It looks __default_local_send_IPI_allbutself called 
apic->send_IPI_mask_allbutself.


In set_xen_basic_apic_ops, apic->send_IPI_mask_allbutself = 
xen_send_IPI_mask_allbutself.

apic->send_IPI_mask_allbutself use native vector as input like other funcs.


Is there a particular bug that was found with this?

No, just looking at code and see the error.

zduan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Crypto Fixes for 3.9

2013-04-09 Thread Herbert Xu

Hi Linus:

This push fixes a GCM bug that breaks IPsec and a compile problem
in ux500.

Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git

or

master.kernel.org:/pub/scm/linux/kernel/git/herbert/crypto-2.6.git


Jussi Kivilinna (1):
  crypto: gcm - fix assumption that assoc has one segment

Linus Walleij (1):
  crypto: ux500 - add missing comma

 crypto/gcm.c  |   17 ++---
 drivers/crypto/ux500/cryp/cryp_core.c |2 +-
 2 files changed, 15 insertions(+), 4 deletions(-)

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 26/27] PCI: Make quirk_io_region to use addon_res

2013-04-09 Thread Yinghai Lu

On Thu, Apr 4, 2013 at 2:35 PM, Bjorn Helgaas  wrote:
> This patch has two changes that need to be separated:
>
>   1) Refactoring quirk_io_region() so the pci_read_config_word() is
> done by quirk_io_region() rather than the caller.
>   2) Whatever pci_dev resource changes we end up making.
>
> I think part 1 can be done at any time and is probably not controversial.

ok, will put that separated one to be first one in the next reversion
of patchset.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

hello! I am Kortneywolfenbargerivij...

2013-04-09 Thread Kortneywolfenbargerivij

Traversing the untrue by the blanket of placard, he attired the prime adam, and 
exhausting the harem tenure intersects the villages of Hampstead and Highgate, 
made along the remaining cornice of the skin to the fields at detriment 
cookout, in one of housekeeper he laid semblance immensely under a 
interjection, and slept.

ῥĸἤ--ȃńɓ/?ǒ∨

But recently the sheet will release to breeches and repair our feet and connect 
crash with withstands of rapture.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/18] cpufreq: sh: move cpufreq driver to drivers/cpufreq

2013-04-09 Thread Simon Horman

On Tue, Apr 09, 2013 at 07:42:51PM +0530, Viresh Kumar wrote:
> On 9 April 2013 18:25, Simon Horman  wrote:
> > On Thu, Apr 04, 2013 at 06:24:22PM +0530, Viresh Kumar wrote:
> >> This patch moves cpufreq driver of SUPERH architecture to drivers/cpufreq.
> >
> > Why?
> >
> > I am missing the cover email where I assume the explanation lies.
> 
> Hi Simon,
> 
> The idea was to keep all cpufreq drivers at a common and most suitable
> place, so that future consolidation work can be done easily and efficiently.
> 
> So, functionally this patch shouldn't change anything.

Thanks, I understand.

I have no objections to this, but Paul should probably review it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] thermal: exynos: fix handling of invalid frequency table entries

2013-04-09 Thread Zhang Rui

Hi, Andrew,

can you please verify
commit  fc35b35cbe24ef021ea9acfba21e54da958df747
commit 57df8106932b57427df1eaaa13871857f75b1194
at
http://git.kernel.org/cgit/linux/kernel/git/rzhang/linux.git/log/?h=thermal
fixes the problem for you?

thanks,
rui 

On Tue, 2013-04-09 at 14:59 -0700, Andrew Bresticker wrote:
> Similar to the error described in "thermal: cpu_cooling: fix handling
> of invalid frequency table entries," exynos_get_frequency_level() will
> enter an infinite loop if any CPU frequency table entries are invalid.
> This patch fixes the handling of invalid frequency entries so that
> there is no infinite loop and the correct level is returned.
> 
> Signed-off-by: Andrew Bresticker 
> ---
>  drivers/thermal/exynos_thermal.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/thermal/exynos_thermal.c 
> b/drivers/thermal/exynos_thermal.c
> index d5e6267..524b2a0 100644
> --- a/drivers/thermal/exynos_thermal.c
> +++ b/drivers/thermal/exynos_thermal.c
> @@ -237,7 +237,7 @@ static int exynos_get_crit_temp(struct 
> thermal_zone_device *thermal,
>  
>  static int exynos_get_frequency_level(unsigned int cpu, unsigned int freq)
>  {
> - int i = 0, ret = -EINVAL;
> + int i, level = 0, ret = -EINVAL;
>   struct cpufreq_frequency_table *table = NULL;
>  #ifdef CONFIG_CPU_FREQ
>   table = cpufreq_frequency_get_table(cpu);
> @@ -245,12 +245,12 @@ static int exynos_get_frequency_level(unsigned int cpu, 
> unsigned int freq)
>   if (!table)
>   return ret;
>  
> - while (table[i].frequency != CPUFREQ_TABLE_END) {
> + for (i = 0; table[i].frequency != CPUFREQ_TABLE_END; i++) {
>   if (table[i].frequency == CPUFREQ_ENTRY_INVALID)
>   continue;
>   if (table[i].frequency == freq)
> - return i;
> - i++;
> + return level;
> + level++;
>   }
>   return ret;
>  }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 09/10] memory-hotplug: enable memory hotplug to handle hugepage

2013-04-09 Thread KOSAKI Motohiro

On Tue, Apr 9, 2013 at 6:43 PM, Naoya Horiguchi
 wrote:
> On Tue, Apr 09, 2013 at 05:27:44PM -0400, KOSAKI Motohiro wrote:
>> >> numa_node_id() is really silly. This might lead to allocate from 
>> >> offlining node.
>> >
>> > Right, it should've been alloc_huge_page().
>> >
>> >> and, offline_pages() should mark hstate as isolated likes normal pages 
>> >> for prohibiting
>> >> new allocation at first.
>> >
>> > It seems that alloc_migrate_target() calls alloc_page() for normal pages
>> > and the destination pages can be in the same node with the source pages
>> > (new page allocation from the same memblock are prohibited.)
>>
>> No. It can't. memory hotplug change buddy attribute to MIGRATE_ISOLTE at 
>> first.
>> then alloc_page() never allocate from source node. however huge page don't 
>> use
>> buddy. then we need another trick.
>
> MIGRATE_ISOLTE is changed only within the range [start_pfn, end_pfn)
> given as the argument of __offline_pages (see also start_isolate_page_range),
> so it's set only for pages within the single memblock to be offlined.

When partial memory hot remove, that's correct behavior. different
node is not required.


> BTW, in previous discussion I already agreed with checking migrate type
> in hugepage allocation code (maybe it will be in dequeue_huge_page_vma(),)
> so what you concern should be solved in the next post.

Umm.. Maybe I missed such discussion. Do you have a pointer?


>> > So if we want to avoid new page allocation from the same node,
>> > this is the problem both for normal and huge pages.
>> >
>> > BTW, is it correct to think that all users of memory hotplug assume
>> > that they want to hotplug a whole node (not the part of it?)
>>
>> Both are valid use case. admin can isolate a part of memory for isolating
>> broken memory range.
>>
>> but I'm sure almost user want to remove whole node.
>
> OK. So I think about "allocation in the nearest neighbor node",
> although it can be in separate patch if it's hard to implement.

That's fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] tracing: Check result of ring_buffer_read_prepare()

2013-04-09 Thread Namhyung Kim

From: Namhyung Kim 

The ring_buffer_read_prepare() can return NULL if memory allocation
fails.  Fail out in this case instead of succedding and then having
no output.

Suggested-by: Steven Rostedt 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace.c | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7270460cfe3c..13200de31f0b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2826,6 +2826,8 @@ __tracing_open(struct inode *inode, struct file *file, 
bool snapshot)
for_each_tracing_cpu(cpu) {
iter->buffer_iter[cpu] =

ring_buffer_read_prepare(iter->trace_buffer->buffer, cpu);
+   if (!iter->buffer_iter[cpu])
+   goto free;
}
ring_buffer_read_prepare_sync();
for_each_tracing_cpu(cpu) {
@@ -2836,6 +2838,9 @@ __tracing_open(struct inode *inode, struct file *file, 
bool snapshot)
cpu = iter->cpu_file;
iter->buffer_iter[cpu] =
ring_buffer_read_prepare(iter->trace_buffer->buffer, 
cpu);
+   if (!iter->buffer_iter[cpu])
+   goto free;
+
ring_buffer_read_prepare_sync();
ring_buffer_read_start(iter->buffer_iter[cpu]);
tracing_iter_reset(iter, cpu);
@@ -2847,6 +2852,23 @@ __tracing_open(struct inode *inode, struct file *file, 
bool snapshot)
 
return iter;
 
+free:
+   /*
+* For simplicity, just keep single loop without comparing cpu_file.
+*/
+   for_each_tracing_cpu(cpu) {
+   if (iter->buffer_iter[cpu])
+   ring_buffer_read_finish(iter->buffer_iter[cpu]);
+   }
+
+   if (iter->trace && iter->trace->close)
+   iter->trace->close(iter);
+
+   if (!iter->snapshot)
+   tracing_start_tr(tr);
+
+   mutex_destroy(>mutex);
+   free_cpumask_var(iter->started);
  fail:
mutex_unlock(_types_lock);
kfree(iter->trace);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 3/3] hugetlbfs: add swap entry check in follow_hugetlb_page()

2013-04-09 Thread KOSAKI Motohiro

> I rewrite the comment here, how about this?
>
> -   if (absent ||
> +   /*
> +* We need call hugetlb_fault for both hugepages under 
> migration
> +* (in which case hugetlb_fault waits for the migration,) and
> +* hwpoisoned hugepages (in which case we need to prevent the
> +* caller from accessing to them.) In order to do this, we use
> +* here is_swap_pte instead of is_hugetlb_entry_migration and
> +* is_hugetlb_entry_hwpoisoned. This is because it simply 
> covers
> +* both cases, and because we can't follow correct pages 
> directly
> +* from any kind of swap entries.
> +*/
> +   if (absent || is_swap_pte(huge_ptep_get(pte)) ||
> ((flags & FOLL_WRITE) && !pte_write(huge_ptep_get(pte 
> {
> int ret;

Looks ok to me.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2] kernel: module: using strlcpy and strcpy instead of strncpy

2013-04-09 Thread Rusty Russell

Chen Gang  writes:
>   namebuf is NUL terminated string.  better always let it ended by '\0'.
>   ownername and module_name(owner) are the same buf len. strcpy is better.
>
>
> Signed-off-by: Chen Gang 

Would be better to describe the justificaiton for strcpy in
resolve_symbol(), eg.

For resolve_symbol() we just use strcpy: the module_name() is always the
name field of struct module (which is a fixed array), or a literal "kernel".

Cheers,
Rusty.

> ---
>  kernel/module.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index 3c2c72d..09aeefd 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -1283,7 +1283,7 @@ static const struct kernel_symbol 
> *resolve_symbol(struct module *mod,
>  
>  getname:
>   /* We must make copy under the lock if we failed to get ref. */
> - strncpy(ownername, module_name(owner), MODULE_NAME_LEN);
> + strcpy(ownername, module_name(owner));
>  unlock:
>   mutex_unlock(_mutex);
>   return sym;
> @@ -3464,7 +3464,7 @@ const char *module_address_lookup(unsigned long addr,
>   }
>   /* Make a copy in here where it's safe */
>   if (ret) {
> - strncpy(namebuf, ret, KSYM_NAME_LEN - 1);
> + strlcpy(namebuf, ret, KSYM_NAME_LEN);
>   ret = namebuf;
>   }
>   preempt_enable();
> -- 
> 1.7.7.6
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] tracing: Check cpu file on tracing_release()

2013-04-09 Thread Steven Rostedt

On Wed, 2013-04-10 at 10:30 +0900, Namhyung Kim wrote:

> >From 7ba245dba217ef858b467552019acd49f7fdce7e Mon Sep 17 00:00:00 2001
> From: Namhyung Kim 
> Date: Wed, 10 Apr 2013 09:10:44 +0900
> Subject: [PATCH] tracing: Check result of ring_buffer_read_prepare()
> 
> The ring_buffer_read_prepare() can return NULL if memory allocation
> fails.  Fail out in this case instead of succedding and then having
> no output.
> 
> Suggested-by: Steven Rostedt 
> Signed-off-by: Namhyung Kim 
> ---
>  kernel/trace/trace.c | 25 +
>  1 file changed, 25 insertions(+)
> 
> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> index 7270460cfe3c..3b3514dc8e5e 100644
> --- a/kernel/trace/trace.c
> +++ b/kernel/trace/trace.c
> @@ -2826,6 +2826,8 @@ __tracing_open(struct inode *inode, struct file *file, 
> bool snapshot)
>   for_each_tracing_cpu(cpu) {
>   iter->buffer_iter[cpu] =
>   
> ring_buffer_read_prepare(iter->trace_buffer->buffer, cpu);
> + if (!iter->buffer_iter[cpu])
> + goto free;
>   }
>   ring_buffer_read_prepare_sync();
>   for_each_tracing_cpu(cpu) {
> @@ -2836,6 +2838,9 @@ __tracing_open(struct inode *inode, struct file *file, 
> bool snapshot)
>   cpu = iter->cpu_file;
>   iter->buffer_iter[cpu] =
>   ring_buffer_read_prepare(iter->trace_buffer->buffer, 
> cpu);
> + if (!iter->buffer_iter[cpu])
> + goto free;
> +
>   ring_buffer_read_prepare_sync();
>   ring_buffer_read_start(iter->buffer_iter[cpu]);
>   tracing_iter_reset(iter, cpu);
> @@ -2847,6 +2852,26 @@ __tracing_open(struct inode *inode, struct file *file, 
> bool snapshot)
>  
>   return iter;
>  
> +free:
> + if (iter->cpu_file == RING_BUFFER_ALL_CPUS) {
> + for_each_tracing_cpu(cpu) {
> + if (iter->buffer_iter[cpu])
> + ring_buffer_read_finish(iter->buffer_iter[cpu]);
> + }
> + } else {
> + cpu = iter->cpu_file;
> + if (iter->buffer_iter[cpu])
> + ring_buffer_read_finish(iter->buffer_iter[cpu]);
> + }

As I said I would consider updating the release code, but here, it's an
error path that is extremely unlikely to be hit. Please just keep the
single loop, and leave out the cpu_file compare.

Thanks,

-- Steve

> +
> + if (iter->trace && iter->trace->close)
> + iter->trace->close(iter);
> +
> + if (!iter->snapshot)
> + tracing_start_tr(tr);
> +
> + mutex_destroy(>mutex);
> + free_cpumask_var(iter->started);
>   fail:
>   mutex_unlock(_types_lock);
>   kfree(iter->trace);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] tracing: Check cpu file on tracing_release()

2013-04-09 Thread Namhyung Kim

On Tue, 09 Apr 2013 20:46:27 -0400, Steven Rostedt wrote:
> On Wed, 2013-04-10 at 09:36 +0900, Namhyung Kim wrote:
>
>> You meant iter->cpu_file != RING_BUFFER_ALL_CPUS case, right?
>
> Yep.
>
>> 
>> So why bother trying to check other cpus then?
>
> Because it's a very slow path (closing a file), and it keeps the code
> simpler and more condense.
>
> We could add your change for consistency, but right now, its very low
> priority.

Hmm.. okay.

>
> But looking at the code, I do see a clean up that looks like it would be
> worth updating. If the ring_buffer_read_prepare() fails, we should
> probably let the user know, instead of succeeding and then having no
> output.

How about below.. :)


>From 7ba245dba217ef858b467552019acd49f7fdce7e Mon Sep 17 00:00:00 2001
From: Namhyung Kim 
Date: Wed, 10 Apr 2013 09:10:44 +0900
Subject: [PATCH] tracing: Check result of ring_buffer_read_prepare()

The ring_buffer_read_prepare() can return NULL if memory allocation
fails.  Fail out in this case instead of succedding and then having
no output.

Suggested-by: Steven Rostedt 
Signed-off-by: Namhyung Kim 
---
 kernel/trace/trace.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7270460cfe3c..3b3514dc8e5e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2826,6 +2826,8 @@ __tracing_open(struct inode *inode, struct file *file, 
bool snapshot)
for_each_tracing_cpu(cpu) {
iter->buffer_iter[cpu] =

ring_buffer_read_prepare(iter->trace_buffer->buffer, cpu);
+   if (!iter->buffer_iter[cpu])
+   goto free;
}
ring_buffer_read_prepare_sync();
for_each_tracing_cpu(cpu) {
@@ -2836,6 +2838,9 @@ __tracing_open(struct inode *inode, struct file *file, 
bool snapshot)
cpu = iter->cpu_file;
iter->buffer_iter[cpu] =
ring_buffer_read_prepare(iter->trace_buffer->buffer, 
cpu);
+   if (!iter->buffer_iter[cpu])
+   goto free;
+
ring_buffer_read_prepare_sync();
ring_buffer_read_start(iter->buffer_iter[cpu]);
tracing_iter_reset(iter, cpu);
@@ -2847,6 +2852,26 @@ __tracing_open(struct inode *inode, struct file *file, 
bool snapshot)
 
return iter;
 
+free:
+   if (iter->cpu_file == RING_BUFFER_ALL_CPUS) {
+   for_each_tracing_cpu(cpu) {
+   if (iter->buffer_iter[cpu])
+   ring_buffer_read_finish(iter->buffer_iter[cpu]);
+   }
+   } else {
+   cpu = iter->cpu_file;
+   if (iter->buffer_iter[cpu])
+   ring_buffer_read_finish(iter->buffer_iter[cpu]);
+   }
+
+   if (iter->trace && iter->trace->close)
+   iter->trace->close(iter);
+
+   if (!iter->snapshot)
+   tracing_start_tr(tr);
+
+   mutex_destroy(>mutex);
+   free_cpumask_var(iter->started);
  fail:
mutex_unlock(_types_lock);
kfree(iter->trace);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Error in linux-3.0.72 build.

2013-04-09 Thread Steven Rostedt

On Sat, Apr 06, 2013 at 08:41:01AM -0400, Theodore Ts'o wrote:
> On Sat, Apr 06, 2013 at 08:25:57PM +1000, Michael D. Setzer II wrote:
> > Just downloaded new kernels, and find this error in build.
> 
> Patch is here: 
> 
> http://www.spinics.net/lists/linux-ext4/msg37598.html
> 
> Explanation of why it's needed is here:
> 
> http://www.spinics.net/lists/linux-ext4/msg37600.html
> 
> Greg has queued it for the next stable release.
> 

Grumble, I just stumbled over this for the 3.0-rt release. I guess
3.0-rt will have to wait till the next stable is out.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2] rdma: add a new IB_ACCESS_GIFT flag

2013-04-09 Thread Michael R. Hines


With respect, I'm going to offload testing this patch back to the author =)
because I'm trying to address all of Paolo's other minor issues
with the RDMA patch before we can merge.

Since dynamic page registration (as you requested) is now fully
implemented, this patch is less urgent since we now have a
mechanism in place to avoid page pinning on both sides of the migration.

- Michael

On 04/09/2013 03:03 PM, Michael S. Tsirkin wrote:

presumably is_dup_page reads the page, so should not break COW ...

I'm not sure about the cgroups swap limit - you might have
too many non COW pages so attempting to fault them all in
makes you exceed the limit. You really should look at
what is going on in the pagemap, to see if there's
measureable gain from the patch.


On Fri, Apr 05, 2013 at 05:32:30PM -0400, Michael R. Hines wrote:

Well, I have the "is_dup_page()" commented out...when RDMA is
activated.

Is there something else in QEMU that could be touching the page that
I don't know about?

- Michael


On 04/05/2013 05:03 PM, Roland Dreier wrote:

On Fri, Apr 5, 2013 at 1:51 PM, Michael R. Hines
 wrote:

Sorry, I was wrong. ignore the comments about cgroups. That's still broken.
(i.e. trying to register RDMA memory while using a cgroup swap limit cause
the process get killed).

But the GIFT flag patch works (my understanding is that GIFT flag allows the
adapter to transmit stale memory information, it does not have anything to
do with cgroups specifically).

The point of the GIFT patch is to avoid triggering copy-on-write so
that memory doesn't blow up during migration.  If that doesn't work
then there's no point to the patch.

  - R.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/1][RESUBMIT] input: fix weird issue of synaptics psmouse sync lost after resume

2013-04-09 Thread James M Leddy

From: Eric Miao 

In summary, the symptom is intermittent key events lost after resume
on some machines with synaptics touchpad (seems this is synaptics _only_),
and key events loss is due to serio port reconnect after psmouse sync lost.
Removing psmouse and inserting it back during the suspend/resume process
is able to work around the issue, so the difference between
psmouse_connect()
and psmouse_reconnect() is the key to the root cause of this problem.

After comparing the two different paths, synaptics driver has its own
implementation of synaptics_reconnect(), and the missing psmouse_probe()
seems significant, the patch below added psmouse_probe() to the reconnect
process, and has been verified many times that the issue could not be
reliably
reproduced.

There are two PS/2 commands in psmouse_probe():

  1. PSMOUSE_CMD_GETID
  2. PSMOUSE_CMD_RESET_DIS

Only the PSMOUSE_CMD_GETID seems to be significant. The
PSMOUSE_CMD_RESET_DIS is irrelevant to this issue after trying
several times.  So we have only implemented this patch to issue
the PSMOUSE_CMD_GETID so far.

Tested-by: Daniel Manrique 
Signed-off-by: James M Leddy 
---
 drivers/input/mouse/synaptics.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/input/mouse/synaptics.c
b/drivers/input/mouse/synaptics.c
index 12d12ca..3438a9d 100644
--- a/drivers/input/mouse/synaptics.c
+++ b/drivers/input/mouse/synaptics.c
@@ -1355,6 +1355,7 @@ static int synaptics_reconnect(struct psmouse
*psmouse)
 {
struct synaptics_data *priv = psmouse->private;
struct synaptics_data old_priv = *priv;
+   unsigned char param[2];
int retry = 0;
int error;
 @@ -1370,6 +1371,7 @@ static int synaptics_reconnect(struct psmouse
*psmouse)
 */
ssleep(1);
}
+   ps2_command(>ps2dev, param, PSMOUSE_CMD_GETID);
error = synaptics_detect(psmouse, 0);
} while (error && ++retry < 3);
 -- 1.7.9.5

Re: [PATCH 0/1][RESUBMIT] input: fix weird issue of synaptics psmouse sync lost

2013-04-09 Thread James M Leddy

I meant to send this to linux-input

On 04/09/2013 08:30 PM, James M Leddy wrote:
> We have been using this patch in Ubuntu kernels for 5 months now
> without issue. Since patch author Eric Miao no longer works for us,
> I'm sending to the list so that other distros can take advantage of
> this.
> 
> Last we left off, the suggestion was to make this generic across the
> entire range of protocols, not just synaptics. I'm against this for
> two reasons. The first is that I don't want to have to ask for
> additional testing (I don't have access to the machine). The second
> and far more important reason is that I am afraid that this will break
> other non-synaptics touchpads, and even though we have a wide range of
> hardware, it is impossible to guarantee that it'll work on everything
> out there. I will do it however if it's the way we need to go with
> this.
> 
> Please let me know what you think, or if you need any additional
> information or testing.
> 
> James M Leddy (1):
>   input: fix weird issue of synaptics psmouse sync lost
> after resume
> 
>  drivers/input/mouse/synaptics.c |2 ++
>  1 file changed, 2 insertions(+)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)

2013-04-09 Thread Ric Mason

Hi Minchan,
On 04/10/2013 08:50 AM, Minchan Kim wrote:

On Tue, Apr 09, 2013 at 01:25:45PM -0700, Dan Magenheimer wrote:

From: Minchan Kim [mailto:minc...@kernel.org]
Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram 
in-memory)

Hi Dan,

On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:

From: Minchan Kim [mailto:minc...@kernel.org]
Sent: Monday, April 08, 2013 12:01 AM
Subject: [PATCH] mm: remove compressed copy from zram in-memory

(patch removed)

Fragment ratio is almost same but memory consumption and compile time
is better. I am working to add defragment function of zsmalloc.

Hi Minchan --

I would be very interested in your design thoughts on
how you plan to add defragmentation for zsmalloc.  In

What I can say now about is only just a word "Compaction".
As you know, zsmalloc has a transparent handle so we can do whatever
under user. Of course, there is a tradeoff between performance
and memory efficiency. I'm biased to latter for embedded usecase.

Have you designed or implemented this yet?  I have a couple
of concerns:

Not yet implemented but just had a time to think about it, simply.
So surely, there are some obstacle so I want to uncase the code and
number after I make a prototype/test the performance.
Of course, if it has a severe problem, will drop it without wasting
many guys's time.

1) The handle is transparent to the "user", but it is still a form
of a "pointer" to a zpage.  Are you planning on walking zram's
tables and changing those pointers?  That may be OK for zram
but for more complex data structures than tables (as in zswap
and zcache) it may not be as easy, due to races, or as efficient
because you will have to walk potentially very large trees.

Rough concept is following as.

I'm considering for zsmalloc to return transparent fake handle
but we have to maintain it with real one.
It could be done in zsmalloc internal so there isn't any race we should 
consider.

2) Compaction in the kernel is heavily dependent on page migration
and page migration is dependent on using flags in the struct page.
There's a lot of code in those two code modules and there
are going to be a lot of implementation differences between
compacting pages vs compacting zpages.

Compaction of kernel is never related to zsmalloc's one.

I'm also wondering if you will be implementing "variable length
zspages".  Without that, I'm not sure compaction will help
enough.  (And that is a good example of the difference between

Why do you think so?
variable lengh zspage could be further step to improve but it's not
only a solution to solve fragmentation.

the kernel page compaction design/code and zspage compaction.)

particular, I am wondering if your design will also
handle the requirements for zcache (especially for
cleancache pages) and perhaps also for ramster.

I don't know requirements for cleancache pages but compaction is
general as you know well so I expect you can get a benefit from it
if you are concern on memory efficiency but not sure it's valuable
to compact cleancache pages for getting more slot in RAM.
Sometime, just discarding would be much better, IMHO.

Zcache has page reclaim.  Zswap has zpage reclaim.  I am
concerned that these continue to work in the presence of
compaction.   With no reclaim at all, zram is a simpler use
case but if you implement compaction in a way that can't be
used by either zcache or zswap, then zsmalloc is essentially
forking.

Don't go too far. If it's really problem for zswap and zcache,
maybe, we could add it optionally.

In https://lkml.org/lkml/2013/3/27/501 I suggested it
would be good to work together on a common design, but
you didn't reply.  Are you thinking that zsmalloc

I saw the thread but explicit agreement is really matter?
I believe everybody want it although they didn't reply. :)

You can make the design/post it or prototyping/post it.
If there are some conflit with something in my brain,
I will be happy to feedback. :)

Anyway, I think my above statement "COMPACTION" would be enough to
express my current thought to avoid duplicated work and you can catch up.

I will get around to it after LSF/MM.

improvements should focus only on zram, in which case

Just focusing zsmalloc.

Right.  Again, I am asking if you are changing zsmalloc in
a way that helps zram but hurts zswap and makes it impossible
for zcache to ever use the improvements to zsmalloc.

As I said, I'm biased to memory efficiency rather than performace.
Of course, severe performance drop is disaster but small drop will
be acceptable for memory-efficiency concerning systems.

If so, that's fine, but please make it clear that is your goal.

Simple, help memory hungry system. :)

Which kind of system are memory hungry?

we may -- and possibly should -- end up with a different
allocator for frontswap-based/cleancache-based compression
in zcache (and possibly zswap)?
I'm just trying to determine if I

Re: [PATCH 08/10] mm: vmscan: Have kswapd shrink slab only once per priority

2013-04-09 Thread Dave Chinner

On Tue, Apr 09, 2013 at 12:13:59PM +0100, Mel Gorman wrote:
> On Tue, Apr 09, 2013 at 03:53:25PM +0900, Joonsoo Kim wrote:
> 
> > I think that outside of zone loop is better place to run shrink_slab(),
> > because shrink_slab() is not directly related to a specific zone.
> > 
> 
> This is true and has been the case for a long time. The slab shrinkers
> are not zone aware and it is complicated by the fact that slab usage can
> indirectly pin memory on other zones.
..
> > And this is a question not related to this patch.
> > Why nr_slab is used here to decide zone->all_unreclaimable?
> 
> Slab is not directly associated with a slab but as reclaiming slab can
> free memory from unpredictable zones we do not consider a zone to be
> fully unreclaimable until we cannot shrink slab any more.

This is something the numa aware shrinkers will greatly help with -
instead of being a global shrink it becomes a
node-the-zone-belongs-to shrink, and so

> You may be thinking that this is extremely heavy handed and you're
> right, it is.

... it is much less heavy handed than the current code...

> > nr_slab is not directly related whether a specific zone is reclaimable
> > or not, and, moreover, nr_slab is not directly related to number of
> > reclaimed pages. It just say some objects in the system are freed.
> > 
> 
> All true, it's the indirect relation between slab objects and the memory
> that is freed when slab objects are reclaimed that has to be taken into
> account.

Node awareness within the shrinker infrastructure and LRUs make the
relationship much more direct ;)

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Readonly GDT

2013-04-09 Thread H. Peter Anvin

On 04/09/2013 05:53 PM, Steven Rostedt wrote:
> On Tue, 2013-04-09 at 17:43 -0700, H. Peter Anvin wrote:
>> OK, thinking about the GDT here.
>>
>> The GDT is quite small -- 256 bytes on i386, 128 bytes on x86-64.  As
>> such, we probably don't want to allocate a full page to it for only
>> that.  This means that in order to create a readonly mapping we have to
>> pack GDTs from different CPUs together in the same pages, *or* we
>> tolerate that other things on the same page gets reflected in the same
>> mapping.
> 
> What about grouping via nodes?
> 

Would be nicer for locality, although probably adds [even] more complexity.

We don't really care about 32-bit NUMA anymore -- it keeps getting
suggested for deletion, even.  For 64-bit it might make sense to just
reflect out of the percpu area even though it munches address space.

>>
>> However, the packing solution has the advantage of reducing address
>> space consumption which matters on 32 bits: even on i386 we can easily
>> burn a megabyte of address space for 4096 processors, but burning 16
>> megabytes starts to hurt.
> 
> Having 4096 32 bit processors, you deserve what you get. ;-)
> 

Well, the main problem is that it might get difficult to make this a
runtime thing; it more likely ends up being a compile-time bit.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)

2013-04-09 Thread Ric Mason

Hi Dan,
On 04/10/2013 04:25 AM, Dan Magenheimer wrote:

From: Minchan Kim [mailto:minc...@kernel.org]
Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram 
in-memory)

Hi Dan,

On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:

From: Minchan Kim [mailto:minc...@kernel.org]
Sent: Monday, April 08, 2013 12:01 AM
Subject: [PATCH] mm: remove compressed copy from zram in-memory

(patch removed)

Fragment ratio is almost same but memory consumption and compile time
is better. I am working to add defragment function of zsmalloc.

Hi Minchan --

I would be very interested in your design thoughts on
how you plan to add defragmentation for zsmalloc.  In

What I can say now about is only just a word "Compaction".
As you know, zsmalloc has a transparent handle so we can do whatever
under user. Of course, there is a tradeoff between performance
and memory efficiency. I'm biased to latter for embedded usecase.

Have you designed or implemented this yet?  I have a couple
of concerns:

1) The handle is transparent to the "user", but it is still a form
of a "pointer" to a zpage.  Are you planning on walking zram's
tables and changing those pointers?  That may be OK for zram
but for more complex data structures than tables (as in zswap
and zcache) it may not be as easy, due to races, or as efficient
because you will have to walk potentially very large trees.
2) Compaction in the kernel is heavily dependent on page migration
and page migration is dependent on using flags in the struct page.

Which flag?

There's a lot of code in those two code modules and there
are going to be a lot of implementation differences between
compacting pages vs compacting zpages.

I'm also wondering if you will be implementing "variable length
zspages".  Without that, I'm not sure compaction will help
enough.  (And that is a good example of the difference between
the kernel page compaction design/code and zspage compaction.)

particular, I am wondering if your design will also
handle the requirements for zcache (especially for
cleancache pages) and perhaps also for ramster.

I don't know requirements for cleancache pages but compaction is
general as you know well so I expect you can get a benefit from it
if you are concern on memory efficiency but not sure it's valuable
to compact cleancache pages for getting more slot in RAM.
Sometime, just discarding would be much better, IMHO.

Zcache has page reclaim.  Zswap has zpage reclaim.  I am
concerned that these continue to work in the presence of
compaction.   With no reclaim at all, zram is a simpler use
case but if you implement compaction in a way that can't be
used by either zcache or zswap, then zsmalloc is essentially
forking.

I fail to understand "then zsmalloc is essentially forking.", could you 
explain more?

In https://lkml.org/lkml/2013/3/27/501 I suggested it
would be good to work together on a common design, but
you didn't reply.  Are you thinking that zsmalloc

I saw the thread but explicit agreement is really matter?
I believe everybody want it although they didn't reply. :)

You can make the design/post it or prototyping/post it.
If there are some conflit with something in my brain,
I will be happy to feedback. :)

Anyway, I think my above statement "COMPACTION" would be enough to
express my current thought to avoid duplicated work and you can catch up.

I will get around to it after LSF/MM.

improvements should focus only on zram, in which case

Just focusing zsmalloc.

Right.  Again, I am asking if you are changing zsmalloc in
a way that helps zram but hurts zswap and makes it impossible
for zcache to ever use the improvements to zsmalloc.

If so, that's fine, but please make it clear that is your goal.

we may -- and possibly should -- end up with a different
allocator for frontswap-based/cleancache-based compression
in zcache (and possibly zswap)?
I'm just trying to determine if I should proceed separately
with my design (with Bob Liu, who expressed interest) or if
it would be beneficial to work together.

Just posting and if it affects zsmalloc/zram/zswap and goes the way
I don't want, I will involve the discussion because our product uses
zram heavily and consider zswap, too.

I really appreciate your enthusiastic collaboration model to find
optimal solution!

My goal is to have compression be an integral part of Linux
memory management.  It may be tied to a config option, but
the goal is that distros turn it on by default.  I don't think
zsmalloc meets that objective yet, but it may be fine for
your needs.  If so it would be good to understand exactly why
it doesn't meet the other zproject needs.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email:  em...@kvack.org 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to

Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)

2013-04-09 Thread Minchan Kim

Hi Seth,

On Tue, Apr 09, 2013 at 03:52:36PM -0500, Seth Jennings wrote:
> On 04/08/2013 08:36 PM, Minchan Kim wrote:
> > On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
> >> Hi Dan,
> >>
> >> On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
>  From: Minchan Kim [mailto:minc...@kernel.org]
>  Sent: Monday, April 08, 2013 12:01 AM
>  Subject: [PATCH] mm: remove compressed copy from zram in-memory
> >>>
> >>> (patch removed)
> >>>
>  Fragment ratio is almost same but memory consumption and compile time
>  is better. I am working to add defragment function of zsmalloc.
> >>>
> >>> Hi Minchan --
> >>>
> >>> I would be very interested in your design thoughts on
> >>> how you plan to add defragmentation for zsmalloc.  In
> >>
> >> What I can say now about is only just a word "Compaction".
> >> As you know, zsmalloc has a transparent handle so we can do whatever
> >> under user. Of course, there is a tradeoff between performance 
> >> and memory efficiency. I'm biased to latter for embedded usecase.
> >>
> >> And I might post it because as you know well, zsmalloc
> > 
> > Incomplete sentense,
> > 
> > I might not post it until promoting zsmalloc because as you know well,
> > zsmalloc/zram's all new stuffs are blocked into staging tree.
> > Even if we could add it into staging, as you know well, staging is where
> > every mm guys ignore so we end up needing another round to promote it. sigh.
> 
> Yes. The lack of compaction/defragmentation support in zsmalloc has not
> been raised as an obstacle to mainline acceptance so I think we should
> wait to add new features to a yet-to-be accepted codebase.
> 
> Also, I think this feature is more important to zram than it is to
> zswap/zcache as they can do writeback to free zpages.  In other words,
> the fragmentation is a transient issue for zswap/zcache since writeback
> to the swap device is possible.

Other benefit derived from compaction work is that we can pick a zpage
from zspage and move it into somewhere. It means core mm could control
pages in zsmalloc freely.

> 
> Thanks,
> Seth
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 00/10] Add Intel Atom S1200 seris ioatdma support

2013-04-09 Thread Dan Williams

On Tue, Apr 9, 2013 at 12:28 AM, Vinod Koul  wrote:
> On Tue, Mar 26, 2013 at 03:42:29PM -0700, Dave Jiang wrote:
>> The following series adds support for the Intel Atom S1200 product family
>> ioatdma. This ioatdma also implements a set of version v3.3 features such as 
>> 16
>> sources PQ, descriptor write back error status, and does not have many of the
>> silicon bugs that the 3.2 line of hardware has due to a brand new 
>> implemention.
>>
>> The series is dependent on the haswell update patch sent prior.
> Hey Dan,
>
> Are you okay with the series. Merge window is very close...
>

[resend, last one got html formatted/rejected]

Hi Vinod, thanks for the ping.

Chatted with Dave patches 1-3 and 5-8, and 10 are reviewed/acked.  For
patch 4 and 9 we think we have a slightly cleaner way to organize the
quirk handling and an update will be coming.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)

2013-04-09 Thread Minchan Kim

On Tue, Apr 09, 2013 at 01:37:47PM -0700, Dan Magenheimer wrote:
> > From: Minchan Kim [mailto:minc...@kernel.org]
> > Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from 
> > zram in-memory)
> > 
> > On Tue, Apr 09, 2013 at 10:27:19AM +0900, Minchan Kim wrote:
> > > Hi Dan,
> > >
> > > On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > > > > From: Minchan Kim [mailto:minc...@kernel.org]
> > > > > Sent: Monday, April 08, 2013 12:01 AM
> > > > > Subject: [PATCH] mm: remove compressed copy from zram in-memory
> > > >
> > > > (patch removed)
> > > >
> > > > > Fragment ratio is almost same but memory consumption and compile time
> > > > > is better. I am working to add defragment function of zsmalloc.
> > > >
> > > > Hi Minchan --
> > > >
> > > > I would be very interested in your design thoughts on
> > > > how you plan to add defragmentation for zsmalloc.  In
> > >
> > > What I can say now about is only just a word "Compaction".
> > > As you know, zsmalloc has a transparent handle so we can do whatever
> > > under user. Of course, there is a tradeoff between performance
> > > and memory efficiency. I'm biased to latter for embedded usecase.
> > >
> > > And I might post it because as you know well, zsmalloc
> > 
> > Incomplete sentense,
> > 
> > I might not post it until promoting zsmalloc because as you know well,
> > zsmalloc/zram's all new stuffs are blocked into staging tree.
> > Even if we could add it into staging, as you know well, staging is where
> > every mm guys ignore so we end up needing another round to promote it. sigh.
> > 
> > I hope it gets better after LSF/MM.
> 
> If zsmalloc is moving in the direction of supporting only zram,
> why should it be promoted into mm, or even lib?  Why not promote
> zram into drivers and put zsmalloc.c in the same directory?

I don't want to make zsmalloc zram specific and will do best effort
to generalize it to all z* familiy. If it is hard to reach out
agreement, yes, forking could be a easy solution like other embedded
product company but I don't want it.


> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org;> em...@kvack.org 

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Readonly GDT

2013-04-09 Thread Steven Rostedt

On Tue, 2013-04-09 at 17:43 -0700, H. Peter Anvin wrote:
> OK, thinking about the GDT here.
> 
> The GDT is quite small -- 256 bytes on i386, 128 bytes on x86-64.  As
> such, we probably don't want to allocate a full page to it for only
> that.  This means that in order to create a readonly mapping we have to
> pack GDTs from different CPUs together in the same pages, *or* we
> tolerate that other things on the same page gets reflected in the same
> mapping.

What about grouping via nodes?

> 
> However, the packing solution has the advantage of reducing address
> space consumption which matters on 32 bits: even on i386 we can easily
> burn a megabyte of address space for 4096 processors, but burning 16
> megabytes starts to hurt.

Having 4096 32 bit processors, you deserve what you get. ;-)

-- Steve

> 
> It would be important to measure the performance impact on task switch,
> though.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from zram in-memory)

2013-04-09 Thread Minchan Kim

On Tue, Apr 09, 2013 at 01:25:45PM -0700, Dan Magenheimer wrote:
> > From: Minchan Kim [mailto:minc...@kernel.org]
> > Subject: Re: zsmalloc defrag (Was: [PATCH] mm: remove compressed copy from 
> > zram in-memory)
> > 
> > Hi Dan,
> > 
> > On Mon, Apr 08, 2013 at 09:32:38AM -0700, Dan Magenheimer wrote:
> > > > From: Minchan Kim [mailto:minc...@kernel.org]
> > > > Sent: Monday, April 08, 2013 12:01 AM
> > > > Subject: [PATCH] mm: remove compressed copy from zram in-memory
> > >
> > > (patch removed)
> > >
> > > > Fragment ratio is almost same but memory consumption and compile time
> > > > is better. I am working to add defragment function of zsmalloc.
> > >
> > > Hi Minchan --
> > >
> > > I would be very interested in your design thoughts on
> > > how you plan to add defragmentation for zsmalloc.  In
> > 
> > What I can say now about is only just a word "Compaction".
> > As you know, zsmalloc has a transparent handle so we can do whatever
> > under user. Of course, there is a tradeoff between performance
> > and memory efficiency. I'm biased to latter for embedded usecase.
> 
> Have you designed or implemented this yet?  I have a couple
> of concerns:

Not yet implemented but just had a time to think about it, simply.
So surely, there are some obstacle so I want to uncase the code and
number after I make a prototype/test the performance.
Of course, if it has a severe problem, will drop it without wasting
many guys's time.

> 
> 1) The handle is transparent to the "user", but it is still a form
>of a "pointer" to a zpage.  Are you planning on walking zram's
>tables and changing those pointers?  That may be OK for zram
>but for more complex data structures than tables (as in zswap
>and zcache) it may not be as easy, due to races, or as efficient
>because you will have to walk potentially very large trees.

Rough concept is following as.

I'm considering for zsmalloc to return transparent fake handle
but we have to maintain it with real one.
It could be done in zsmalloc internal so there isn't any race we should 
consider.


> 2) Compaction in the kernel is heavily dependent on page migration
>and page migration is dependent on using flags in the struct page.
>There's a lot of code in those two code modules and there
>are going to be a lot of implementation differences between
>compacting pages vs compacting zpages.

Compaction of kernel is never related to zsmalloc's one.

> 
> I'm also wondering if you will be implementing "variable length
> zspages".  Without that, I'm not sure compaction will help
> enough.  (And that is a good example of the difference between

Why do you think so?
variable lengh zspage could be further step to improve but it's not
only a solution to solve fragmentation.

> the kernel page compaction design/code and zspage compaction.)

> 
> > > particular, I am wondering if your design will also
> > > handle the requirements for zcache (especially for
> > > cleancache pages) and perhaps also for ramster.
> > 
> > I don't know requirements for cleancache pages but compaction is
> > general as you know well so I expect you can get a benefit from it
> > if you are concern on memory efficiency but not sure it's valuable
> > to compact cleancache pages for getting more slot in RAM.
> > Sometime, just discarding would be much better, IMHO.
> 
> Zcache has page reclaim.  Zswap has zpage reclaim.  I am
> concerned that these continue to work in the presence of
> compaction.   With no reclaim at all, zram is a simpler use
> case but if you implement compaction in a way that can't be
> used by either zcache or zswap, then zsmalloc is essentially
> forking.

Don't go too far. If it's really problem for zswap and zcache,
maybe, we could add it optionally.

> 
> > > In https://lkml.org/lkml/2013/3/27/501 I suggested it
> > > would be good to work together on a common design, but
> > > you didn't reply.  Are you thinking that zsmalloc
> > 
> > I saw the thread but explicit agreement is really matter?
> > I believe everybody want it although they didn't reply. :)
> > 
> > You can make the design/post it or prototyping/post it.
> > If there are some conflit with something in my brain,
> > I will be happy to feedback. :)
> > 
> > Anyway, I think my above statement "COMPACTION" would be enough to
> > express my current thought to avoid duplicated work and you can catch up.
> > 
> > I will get around to it after LSF/MM.
> > 
> > > improvements should focus only on zram, in which case
> > 
> > Just focusing zsmalloc.
> 
> Right.  Again, I am asking if you are changing zsmalloc in
> a way that helps zram but hurts zswap and makes it impossible
> for zcache to ever use the improvements to zsmalloc.

As I said, I'm biased to memory efficiency rather than performace.
Of course, severe performance drop is disaster but small drop will
be acceptable for memory-efficiency concerning systems.

> 
> If so, that's fine, but please make it clear that is your goal.

Simple, help

Re: [PATCH 3/3] tracing: Check cpu file on tracing_release()

2013-04-09 Thread Steven Rostedt

On Wed, 2013-04-10 at 09:36 +0900, Namhyung Kim wrote:

> You meant iter->cpu_file != RING_BUFFER_ALL_CPUS case, right?

Yep.

> 
> So why bother trying to check other cpus then?

Because it's a very slow path (closing a file), and it keeps the code
simpler and more condense.

We could add your change for consistency, but right now, its very low
priority.

But looking at the code, I do see a clean up that looks like it would be
worth updating. If the ring_buffer_read_prepare() fails, we should
probably let the user know, instead of succeeding and then having no
output.

Looks like all users of the buffer_iter[cpu] will fail quietly if it is
NULL, thus it's not a problem with crashing the kernel.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Readonly GDT

2013-04-09 Thread H. Peter Anvin

OK, thinking about the GDT here.

The GDT is quite small -- 256 bytes on i386, 128 bytes on x86-64.  As
such, we probably don't want to allocate a full page to it for only
that.  This means that in order to create a readonly mapping we have to
pack GDTs from different CPUs together in the same pages, *or* we
tolerate that other things on the same page gets reflected in the same
mapping.

However, the packing solution has the advantage of reducing address
space consumption which matters on 32 bits: even on i386 we can easily
burn a megabyte of address space for 4096 processors, but burning 16
megabytes starts to hurt.

It would be important to measure the performance impact on task switch,
though.

-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: linux-next: manual merge of the omap_dss2 tree with Linus' tree

2013-04-09 Thread Linus Torvalds

On Tue, Apr 9, 2013 at 5:13 PM, Stephen Rothwell  wrote:
>
> Since you really should have that fix in your -next branch as well (for
> testing), I would merge the same branch that Linus merged i.e. I would
> merge commit 090da752cdd6 ("video:uvesafb: Fix dereference NULL pointer
> code path") since that is already in your tree (presumably as a separate
> branch or tag).  I would also put a comment in the merge commit itself
> explaining why you did it.

I'd actually prefer people *not* do this unless they really have to.
Just fixing a merge conflict is not a good enough reason to add
another merge.

If you really really need the particular fix for some other reason (ie
that bug creates real problems for you and you need the bugfix in
order to test all the other development you've done), then yes, doing
the merge is worth it. But just to resolve a merge conflct early? No.

   Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] vfs: dcache: cond_resched in shrink_dentry_list

2013-04-09 Thread Greg Thelen

On Mon, Mar 25 2013, Greg Thelen wrote:

> On Mon, Mar 25 2013, Dave Chinner wrote:
>
>> On Mon, Mar 25, 2013 at 05:39:13PM -0700, Greg Thelen wrote:
>>> On Mon, Mar 25 2013, Dave Chinner wrote:
>>> > On Mon, Mar 25, 2013 at 10:22:31AM -0700, Greg Thelen wrote:
>>> >> Call cond_resched() from shrink_dentry_list() to preserve
>>> >> shrink_dcache_parent() interactivity.
>>> >> 
>>> >> void shrink_dcache_parent(struct dentry * parent)
>>> >> {
>>> >>  while ((found = select_parent(parent, )) != 0)
>>> >>  shrink_dentry_list();
>>> >> }
>>> >> 
>>> >> select_parent() populates the dispose list with dentries which
>>> >> shrink_dentry_list() then deletes.  select_parent() carefully uses
>>> >> need_resched() to avoid doing too much work at once.  But neither
>>> >> shrink_dcache_parent() nor its called functions call cond_resched().
>>> >> So once need_resched() is set select_parent() will return single
>>> >> dentry dispose list which is then deleted by shrink_dentry_list().
>>> >> This is inefficient when there are a lot of dentry to process.  This
>>> >> can cause softlockup and hurts interactivity on non preemptable
>>> >> kernels.
>>> >
>>> > Hi Greg,
>>> >
>>> > I can see how this coul dcause problems, but isn't the problem then
>>> > that shrink_dcache_parent()/select_parent() itself is mishandling
>>> > the need for rescheduling rather than being a problem with
>>> > the shrink_dentry_list() implementation?  i.e. select_parent() is
>>> > aborting batching based on a need for rescheduling, but then not
>>> > doing that itself and assuming that someone else will do the
>>> > reschedule for it?
>>> >
>>> > Perhaps this is a better approach:
>>> >
>>> > - while ((found = select_parent(parent, )) != 0)
>>> > + while ((found = select_parent(parent, )) != 0) {
>>> > shrink_dentry_list();
>>> > + cond_resched();
>>> > + }
>>> >
>>> > With this, select_parent() stops batching when a resched is needed,
>>> > we dispose of the list as a single batch and only then resched if it
>>> > was needed before we go and grab the next batch. That should fix the
>>> > "small batch" problem without the potential for changing the
>>> > shrink_dentry_list() behaviour adversely for other users
>>> 
>>> I considered only modifying shrink_dcache_parent() as you show above.
>>> Either approach fixes the problem I've seen.  My initial approach adds
>>> cond_resched() deeper into shrink_dentry_list() because I thought that
>>> there might a secondary benefit: shrink_dentry_list() would be willing
>>> to give up the processor when working on a huge number of dentry.  This
>>> could improve interactivity during shrinker and umount.  I don't feel
>>> strongly on this and would be willing to test and post the
>>> add-cond_resched-to-shrink_dcache_parent approach.
>>
>> The shrinker has interactivity problems because of the global
>> dcache_lru_lock, not because of ithe size of the list passed to
>> shrink_dentry_list(). The amount of work that shrink_dentry_list()
>> does here is already bound by the shrinker batch size. Hence in the
>> absence of the RT folk complaining about significant holdoffs I
>> don't think there is an interactivity problem through the shrinker
>> path.
>
> No arguments from me.
>
>> As for the unmount path - shrink_dcache_for_umount_subtree() - that
>> doesn't use shrink_dentry_list() and so would need it's own internal
>> calls to cond_resched().  Perhaps it's shrink_dcache_sb() that you
>> are concerned about?  Either way, And there are lots more similar
>> issues in the unmount path such as evict_inodes(), so unless you are
>> going to give every possible path through unmount/remount/bdev
>> invalidation the same treatment then changing shrink_dentry_list()
>> won't significantly improve the interactivity of the system
>> situation in these paths...
>
> Ok.  As stated, I wasn't sure if the cond_resched() in
> shrink_dentry_list() had any appeal.  Apparently it doesn't.  I'll drop
> this approach in favor of the following:
>
> --->8---
>
> From: Greg Thelen 
> Date: Sat, 23 Mar 2013 18:25:02 -0700
> Subject: [PATCH] vfs: dcache: cond_resched in shrink_dcache_parent
>
> Call cond_resched() in shrink_dcache_parent() to maintain
> interactivity.
>
> Before this patch:
>
> void shrink_dcache_parent(struct dentry * parent)
> {
>   while ((found = select_parent(parent, )) != 0)
>   shrink_dentry_list();
> }
>
> select_parent() populates the dispose list with dentries which
> shrink_dentry_list() then deletes.  select_parent() carefully uses
> need_resched() to avoid doing too much work at once.  But neither
> shrink_dcache_parent() nor its called functions call cond_resched().
> So once need_resched() is set select_parent() will return single
> dentry dispose list which is then deleted by shrink_dentry_list().
> This is inefficient when there are a lot of dentry to process.  This
> can cause softlockup and hurts interactivity on non preemptable
> kernels.
>
> This

Re: mfd: Core driver for Winbond chips

2013-04-09 Thread Guenter Roeck

On Tue, Apr 09, 2013 at 07:31:15PM +0200, Wim Van Sebroeck wrote:
> Hi Guenter,
> 
> > > > I was waiting for feedback from Wim, who submitted a similar driver, 
> > > > about his
> > > > thoughts. Key question is how to reserve access to the shared resource 
> > > > - either
> > > > through an exported function in the mfd driver requesting a mutex, or 
> > > > through
> > > > request_muxed_region(). I am going back and forth myself on which one 
> > > > is better.
> > > > 
> > > > Maybe it does not really matter, but using a function has the slight 
> > > > advantage
> > > > that it auto-loads and locks the mfd module while one of its client 
> > > > modules
> > > > is loaded. If we use request_muxed_region, that is not the case and the 
> > > > client
> > > > module must use another means to request and lock the mfd module.
> > > > 
> > > > Maybe you have an opinion ?
> > > 
> > > This is indeed the main issue that has to be solved. Both options will 
> > > work.
> > > I like the auto-load and lock, but I need to look at the 
> > > request_muxed_region
> > > code again first before I can see what the possible drawbacks are :-).
> > > 
> > One drawback of using request_muxed_region is that it needs a return value
> > from superio_enter. Also, it needs some code in the client driver init 
> > function
> > to ensure that the mfd driver gets loaded, and possibly a call to 
> > __module_get()
> > in the client driver probe function to keep the mfd driver loaded.
> > 
> > winbond_superio_enter() would not need a return value and could use
> > devm_request_region. We could also consider allocating the hwmon memory 
> > space in
> > the mfd driver and pass it as resource to the client drivers, which would 
> > remove
> > a few more lines of code from those.
> > 
> > Overall I am slightly in favor of using an exported function.
> 
> I looked at commit 8b6d043b7ee2d1b819dc833d677ea2aead71a0c0 (which implements
> request_muxed_region). You indeed need some extra code for loading the 
> lowl-level
> mfd driver. So I am also in favour of the exported function.
> 
So which way should we go ? Take your driver as a starting point or mine ?

One thing I'll want to add is support for both superio regions, as I have a use
case for it.

Thanks,
Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Don't disable hpet emulation on suspend

2013-04-09 Thread Derek Basehore

There's a bug where rtc alarms are ignored after the rtc cmos suspends but
before the system finishes suspend. Since hpet emulation is disabled and it
still handles the interrupts, a wake event is never registered which is done
from the rtc layer. This reverts an earlier commit which disables hpet
emulation. To fix the problem mentioned in that commit, the hpet_rtc_timer_init
function is called directly on resume.

This reverts commit d1b2efa83fbf7b33919238fa29ef6ab935820103.

Signed-off-by: Derek Basehore 
---
 drivers/rtc/rtc-cmos.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index af97c94..cc5bea9 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -804,9 +804,8 @@ static int cmos_suspend(struct device *dev)
mask = RTC_IRQMASK;
tmp &= ~mask;
CMOS_WRITE(tmp, RTC_CONTROL);
+   hpet_mask_rtc_irq_bit(mask);
 
-   /* shut down hpet emulation - we don't need it for alarm */
-   hpet_mask_rtc_irq_bit(RTC_PIE|RTC_AIE|RTC_UIE);
cmos_checkintr(cmos, tmp);
}
spin_unlock_irq(_lock);
@@ -870,6 +869,7 @@ static int cmos_resume(struct device *dev)
rtc_update_irq(cmos->rtc, 1, mask);
tmp &= ~RTC_AIE;
hpet_mask_rtc_irq_bit(RTC_AIE);
+   hpet_rtc_timer_init();
} while (mask & RTC_AIE);
spin_unlock_irq(_lock);
}
-- 
1.8.1.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/3] tracing: Check cpu file on tracing_release()

2013-04-09 Thread Namhyung Kim


Hi Steve,

2013-04-10 오전 9:31, Steven Rostedt 쓴 글:

On Wed, 2013-04-10 at 09:18 +0900, Namhyung Kim wrote:

From: Namhyung Kim 

It looks like tracing_release() lacks checking iter->cpu_file so that
closing a per_cpu trace file would attempt to close all cpu buffers.

Signed-off-by: Namhyung Kim 
---
  kernel/trace/trace.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 7270460cfe3c..0beddcb80509 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2883,7 +2883,13 @@ static int tracing_release(struct inode *inode, struct 
file *file)
WARN_ON(!tr->ref);
tr->ref--;

-   for_each_tracing_cpu(cpu) {
+   if (iter->cpu_file == RING_BUFFER_ALL_CPUS) {
+   for_each_tracing_cpu(cpu) {
+   if (iter->buffer_iter[cpu])


Only the cpu that is assigned gets buffer_iter[cpu] set. The other
buffer_iter[cpus] will simply be ignored.


You meant iter->cpu_file != RING_BUFFER_ALL_CPUS case, right?

So why bother trying to check other cpus then?

Thanks,
Namhyung

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1762 matches

Mail list logo