Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-16 Thread Hanjun Guo
On 2012/8/14 22:14, Christoph Lameter wrote:
> On Tue, 14 Aug 2012, Hanjun Guo wrote:
> 
>> N_NORMAL_MEMORY means !LRU allocs possible.
> 
> Ok. I am fine with that change. However this is a significant change that
> needs to be mentioned prominently in the changelog and there need to be
> some comments explaining the meaning of these flags clearly in the source.

No problem, we will handle it in next version of this patch.

Thanks
Hanjun Guo

> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-16 Thread Hanjun Guo
On 2012/8/14 22:14, Christoph Lameter wrote:
 On Tue, 14 Aug 2012, Hanjun Guo wrote:
 
 N_NORMAL_MEMORY means !LRU allocs possible.
 
 Ok. I am fine with that change. However this is a significant change that
 needs to be mentioned prominently in the changelog and there need to be
 some comments explaining the meaning of these flags clearly in the source.

No problem, we will handle it in next version of this patch.

Thanks
Hanjun Guo

 
 
 .
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-14 Thread Christoph Lameter
On Tue, 14 Aug 2012, Hanjun Guo wrote:

> N_NORMAL_MEMORY means !LRU allocs possible.

Ok. I am fine with that change. However this is a significant change that
needs to be mentioned prominently in the changelog and there need to be
some comments explaining the meaning of these flags clearly in the source.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-14 Thread Hanjun Guo
On 2012/8/10 22:12, Christoph Lameter (Open Source) wrote:
> On Fri, 10 Aug 2012, Hanjun Guo wrote:
> 
>> On 2012/8/9 22:06, Christoph Lameter (Open Source) wrote:
>>> On Thu, 9 Aug 2012, Hanjun Guo wrote:
>>>
 Now, We have node masks for both N_NORMAL_MEMORY and
 N_HIGH_MEMORY to distinguish between normal and highmem on platforms such 
 as x86.
 But we still don't have such a mechanism to distinguish between "normal" 
 and "movable"
 memory.
>>>
>>> What is the exact difference that you want to establish?
>>
>> Hi Christoph,
>> Thanks for your comments very much!
>>
>> We want to identify the node only has ZONE_MOVABLE memory.
>> for example:
>>  node 0: ZONE_DMA, ZONE_DMA32, ZONE_NORMAL--> N_LRU_MEMORY, 
>> N_NORMAL_MEMORY
>>  node 1: ZONE_MOVABLE --> N_LRU_MEMORY
>> thus, in SLUB allocator, will not allocate memory control structures for 
>> node1.
> 
> So this would change the N_NORMAL_MEMORY definition so that N_NORMAL
> means !LRU allocs possible? So far N_NORMAL_MEMORY has a wider scope of
> meaning. We need an accurate definition of the meaning of all these
> attributes.

Hi Christoph,
Sorry for the late reply.

yes, N_LRU_MEMORY means LRU allocs possible,
N_NORMAL_MEMORY means !LRU allocs possible.
node with ZONE_DMA/ZONE_DMA32/ZONE_NORMAL is marked with N_LRU_MEMORY and 
N_NORMAL_MEMORY,
node with ZONE_MOVABLE is *only* marked with N_LRU_MEMORY.

> 
>>> For the slab case that you want to solve here you will need to know if the
>>> node has *only* movable memory and will never have any ZONE_NORMAL memory.
>>> If so then memory control structures for allocators that do not allow
>>> movable memory will not need to be allocated for these node. The node can
>>> be excluded from handling.
>>
>> I think this is what we are trying to do in this patch.
>> did I miss something?
> 
> THe meaning of ZONE_NORMAL seems to change which causes confusion. Please
> describe in detail what each of these attributes mean.
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-14 Thread Hanjun Guo
On 2012/8/10 22:12, Christoph Lameter (Open Source) wrote:
 On Fri, 10 Aug 2012, Hanjun Guo wrote:
 
 On 2012/8/9 22:06, Christoph Lameter (Open Source) wrote:
 On Thu, 9 Aug 2012, Hanjun Guo wrote:

 Now, We have node masks for both N_NORMAL_MEMORY and
 N_HIGH_MEMORY to distinguish between normal and highmem on platforms such 
 as x86.
 But we still don't have such a mechanism to distinguish between normal 
 and movable
 memory.

 What is the exact difference that you want to establish?

 Hi Christoph,
 Thanks for your comments very much!

 We want to identify the node only has ZONE_MOVABLE memory.
 for example:
  node 0: ZONE_DMA, ZONE_DMA32, ZONE_NORMAL-- N_LRU_MEMORY, 
 N_NORMAL_MEMORY
  node 1: ZONE_MOVABLE -- N_LRU_MEMORY
 thus, in SLUB allocator, will not allocate memory control structures for 
 node1.
 
 So this would change the N_NORMAL_MEMORY definition so that N_NORMAL
 means !LRU allocs possible? So far N_NORMAL_MEMORY has a wider scope of
 meaning. We need an accurate definition of the meaning of all these
 attributes.

Hi Christoph,
Sorry for the late reply.

yes, N_LRU_MEMORY means LRU allocs possible,
N_NORMAL_MEMORY means !LRU allocs possible.
node with ZONE_DMA/ZONE_DMA32/ZONE_NORMAL is marked with N_LRU_MEMORY and 
N_NORMAL_MEMORY,
node with ZONE_MOVABLE is *only* marked with N_LRU_MEMORY.

 
 For the slab case that you want to solve here you will need to know if the
 node has *only* movable memory and will never have any ZONE_NORMAL memory.
 If so then memory control structures for allocators that do not allow
 movable memory will not need to be allocated for these node. The node can
 be excluded from handling.

 I think this is what we are trying to do in this patch.
 did I miss something?
 
 THe meaning of ZONE_NORMAL seems to change which causes confusion. Please
 describe in detail what each of these attributes mean.
 
 .
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-14 Thread Christoph Lameter
On Tue, 14 Aug 2012, Hanjun Guo wrote:

 N_NORMAL_MEMORY means !LRU allocs possible.

Ok. I am fine with that change. However this is a significant change that
needs to be mentioned prominently in the changelog and there need to be
some comments explaining the meaning of these flags clearly in the source.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-10 Thread Christoph Lameter (Open Source)
On Fri, 10 Aug 2012, Hanjun Guo wrote:

> On 2012/8/9 22:06, Christoph Lameter (Open Source) wrote:
> > On Thu, 9 Aug 2012, Hanjun Guo wrote:
> >
> >> Now, We have node masks for both N_NORMAL_MEMORY and
> >> N_HIGH_MEMORY to distinguish between normal and highmem on platforms such 
> >> as x86.
> >> But we still don't have such a mechanism to distinguish between "normal" 
> >> and "movable"
> >> memory.
> >
> > What is the exact difference that you want to establish?
>
> Hi Christoph,
> Thanks for your comments very much!
>
> We want to identify the node only has ZONE_MOVABLE memory.
> for example:
>   node 0: ZONE_DMA, ZONE_DMA32, ZONE_NORMAL--> N_LRU_MEMORY, 
> N_NORMAL_MEMORY
>   node 1: ZONE_MOVABLE --> N_LRU_MEMORY
> thus, in SLUB allocator, will not allocate memory control structures for 
> node1.

So this would change the N_NORMAL_MEMORY definition so that N_NORMAL
means !LRU allocs possible? So far N_NORMAL_MEMORY has a wider scope of
meaning. We need an accurate definition of the meaning of all these
attributes.

> > For the slab case that you want to solve here you will need to know if the
> > node has *only* movable memory and will never have any ZONE_NORMAL memory.
> > If so then memory control structures for allocators that do not allow
> > movable memory will not need to be allocated for these node. The node can
> > be excluded from handling.
>
> I think this is what we are trying to do in this patch.
> did I miss something?

THe meaning of ZONE_NORMAL seems to change which causes confusion. Please
describe in detail what each of these attributes mean.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-10 Thread Jiang Liu
Hi Isimatu,
We have worked out a changeset to enable offlinable node, which 
is based on a new ACPI based hotplug framework 
(http://www.spinics.net/lists/linux-pci/msg16826.html).
Now could hot-add/hot-remove a computer node with CPU/memory/PCI host bridge,
but it's still a prototype and we are improving code quality for sending out
for review.
We have noticed Jiangsan's work on the same topic, and it would
be better to cooperate on this topic. 
Regards!
Gerry

On 2012-8-10 17:43, Yasuaki Ishimatsu wrote:
> Hi Guo,
> 
> I have a question. How do you create the offlinable node? The current linux
> cannot offline all memory on node. So we cannot hit the bug.
> 
> Recently Lai sent the following patches which create the movable node.
> I think these patches consider the problem.
> 
> https://lkml.org/lkml/2012/8/6/113
> => Hi Lai,
>I think your patches slove Guo's problem. How do you think?
> 
> Thanks,
> Yasuaki Ishimatu
> 
> 2012/08/09 13:39, Hanjun Guo wrote:
>> From: Wu Jianguo 
>>
>> Hi all,
>> Now, We have node masks for both N_NORMAL_MEMORY and
>> N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
>> x86.
>> But we still don't have such a mechanism to distinguish between "normal" and 
>> "movable"
>> memory.
>>
>> As suggested by Christoph Lameter in threads
>> http://marc.info/?l=linux-mm=134323057602484=2, we introduce 
>> N_LRU_MEMORY to
>> distinguish between "normal" and "movable" memory.
>>
>> And this patch will fix the bug described as follow:
>>
>> When handling a memory node with only movable zone, function
>> early_kmem_cache_node_alloc() will allocate a page from remote node but
>> still increase object count on local node, which will trigger a BUG_ON()
>> as below when hot-removing this memory node. Actually there's no need to
>> create kmem_cache_node for memory node with only movable zone at all.
>>
>> [ cut here ]
>> kernel BUG at mm/slub.c:3590!
>> invalid opcode:  [#1] SMP
>> CPU 61
>> Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table
>> mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
>> ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt
>> iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel
>> ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core 
>> sg
>> lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4
>> mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64
>> aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix
>> megaraid_sas dm_mod [last unloaded: microcode]
>>
>> Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85
>> IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
>> RIP: 0010:[]  []
>> slab_memory_callback+0x1ba/0x1c0
>> RSP: 0018:880efdcb7c68  EFLAGS: 00010202
>> RAX: 0001 RBX: 880f7ec06100 RCX: 00010041
>> RDX: 00010042 RSI: 880f7ec02000 RDI: 880f7ec06100
>> RBP: 880efdcb7c78 R08: 88107b6fb098 R09: 81160a00
>> R10:  R11:  R12: 0019
>> R13: fffb R14:  R15: 81abe930
>> FS:  7f709f342700() GS:880f7f3a() knlGS:
>> CS:  0010 DS:  ES:  CR0: 8005003b
>> CR2: 003b5a874570 CR3: 000f0da2 CR4: 07e0
>> DR0:  DR1:  DR2: 
>> DR3:  DR6: 0ff0 DR7: 0400
>> Process sh (pid: 46287, threadinfo 880efdcb6000, task 880f0fa5)
>> Stack:
>>   0004 880efdcb7da8 880efdcb7cb8 81524af5
>>   0001 81a8b620 81a8b640 0004
>>   880efdcb7da8  880efdcb7d08 8107a89a
>> Call Trace:
>>   [] notifier_call_chain+0x55/0x80
>>   [] __blocking_notifier_call_chain+0x5a/0x80
>>   [] blocking_notifier_call_chain+0x16/0x20
>>   [] memory_notify+0x1b/0x20
>>   [] offline_pages+0x624/0x700
>>   [] remove_memory+0x1e/0x20
>>   [] memory_block_change_state+0x13c/0x2e0
>>   [] ? alloc_pages_current+0xb6/0x120
>>   [] store_mem_state+0xc2/0xd0
>>   [] dev_attr_store+0x20/0x30
>>   [] sysfs_write_file+0xef/0x170
>>   [] vfs_write+0xc8/0x190
>>   [] sys_write+0x51/0x90
>>   [] system_call_fastpath+0x16/0x1b
>> Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7
>> c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff <0f> 0b eb fe 66 90
>> 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
>> RIP  [] slab_memory_callback+0x1ba/0x1c0
>>   RSP 
>> ---[ end trace 749e9e9a67c78c12 ]---
>>
>> Signed-off-by: Wu Jianguo 
>> Signed-off-by: Jiang Liu 
>> ---
>>   arch/alpha/mm/numa.c |2 +-
>>   arch/m32r/mm/discontig.c |2 +-
>>   arch/m68k/mm/motorola.c  |2 +-
>>   arch/parisc/mm/init.c|2 +-
>>   

Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-10 Thread Yasuaki Ishimatsu

Hi Guo,

I have a question. How do you create the offlinable node? The current linux
cannot offline all memory on node. So we cannot hit the bug.

Recently Lai sent the following patches which create the movable node.
I think these patches consider the problem.

https://lkml.org/lkml/2012/8/6/113
=> Hi Lai,
   I think your patches slove Guo's problem. How do you think?

Thanks,
Yasuaki Ishimatu

2012/08/09 13:39, Hanjun Guo wrote:

From: Wu Jianguo 

Hi all,
Now, We have node masks for both N_NORMAL_MEMORY and
N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
x86.
But we still don't have such a mechanism to distinguish between "normal" and 
"movable"
memory.

As suggested by Christoph Lameter in threads
http://marc.info/?l=linux-mm=134323057602484=2, we introduce N_LRU_MEMORY to
distinguish between "normal" and "movable" memory.

And this patch will fix the bug described as follow:

When handling a memory node with only movable zone, function
early_kmem_cache_node_alloc() will allocate a page from remote node but
still increase object count on local node, which will trigger a BUG_ON()
as below when hot-removing this memory node. Actually there's no need to
create kmem_cache_node for memory node with only movable zone at all.

[ cut here ]
kernel BUG at mm/slub.c:3590!
invalid opcode:  [#1] SMP
CPU 61
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table
mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt
iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel
ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core sg
lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4
mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64
aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix
megaraid_sas dm_mod [last unloaded: microcode]

Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85
IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
RIP: 0010:[]  []
slab_memory_callback+0x1ba/0x1c0
RSP: 0018:880efdcb7c68  EFLAGS: 00010202
RAX: 0001 RBX: 880f7ec06100 RCX: 00010041
RDX: 00010042 RSI: 880f7ec02000 RDI: 880f7ec06100
RBP: 880efdcb7c78 R08: 88107b6fb098 R09: 81160a00
R10:  R11:  R12: 0019
R13: fffb R14:  R15: 81abe930
FS:  7f709f342700() GS:880f7f3a() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003b5a874570 CR3: 000f0da2 CR4: 07e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sh (pid: 46287, threadinfo 880efdcb6000, task 880f0fa5)
Stack:
  0004 880efdcb7da8 880efdcb7cb8 81524af5
  0001 81a8b620 81a8b640 0004
  880efdcb7da8  880efdcb7d08 8107a89a
Call Trace:
  [] notifier_call_chain+0x55/0x80
  [] __blocking_notifier_call_chain+0x5a/0x80
  [] blocking_notifier_call_chain+0x16/0x20
  [] memory_notify+0x1b/0x20
  [] offline_pages+0x624/0x700
  [] remove_memory+0x1e/0x20
  [] memory_block_change_state+0x13c/0x2e0
  [] ? alloc_pages_current+0xb6/0x120
  [] store_mem_state+0xc2/0xd0
  [] dev_attr_store+0x20/0x30
  [] sysfs_write_file+0xef/0x170
  [] vfs_write+0xc8/0x190
  [] sys_write+0x51/0x90
  [] system_call_fastpath+0x16/0x1b
Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7
c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff <0f> 0b eb fe 66 90
55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
RIP  [] slab_memory_callback+0x1ba/0x1c0
  RSP 
---[ end trace 749e9e9a67c78c12 ]---

Signed-off-by: Wu Jianguo 
Signed-off-by: Jiang Liu 
---
  arch/alpha/mm/numa.c |2 +-
  arch/m32r/mm/discontig.c |2 +-
  arch/m68k/mm/motorola.c  |2 +-
  arch/parisc/mm/init.c|2 +-
  arch/tile/kernel/setup.c |2 +-
  arch/x86/mm/init_64.c|2 +-
  drivers/base/node.c  |4 +++-
  include/linux/nodemask.h |5 +++--
  mm/page_alloc.c  |   10 --
  9 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c
index 3973ae3..8402b29 100644
--- a/arch/alpha/mm/numa.c
+++ b/arch/alpha/mm/numa.c
@@ -313,7 +313,7 @@ void __init paging_init(void)
zones_size[ZONE_DMA] = dma_local_pfn;
zones_size[ZONE_NORMAL] = (end_pfn - start_pfn) - 
dma_local_pfn;
}
-   node_set_state(nid, N_NORMAL_MEMORY);
+   node_set_state(nid, N_LRU_MEMORY);
free_area_init_node(nid, zones_size, start_pfn, NULL);
}

diff --git a/arch/m32r/mm/discontig.c 

Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-10 Thread Hanjun Guo
On 2012/8/9 22:06, Christoph Lameter (Open Source) wrote:
> On Thu, 9 Aug 2012, Hanjun Guo wrote:
> 
>> Now, We have node masks for both N_NORMAL_MEMORY and
>> N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
>> x86.
>> But we still don't have such a mechanism to distinguish between "normal" and 
>> "movable"
>> memory.
> 
> What is the exact difference that you want to establish?

Hi Christoph,
Thanks for your comments very much!

We want to identify the node only has ZONE_MOVABLE memory.
for example:
node 0: ZONE_DMA, ZONE_DMA32, ZONE_NORMAL--> N_LRU_MEMORY, 
N_NORMAL_MEMORY
node 1: ZONE_MOVABLE --> N_LRU_MEMORY
thus, in SLUB allocator, will not allocate memory control structures for node1.

static int init_kmem_cache_nodes(struct kmem_cache *s)
{
int node;

for_each_node_state(node, N_NORMAL_MEMORY) { /* <-- skip nodes only has 
ZONE_MOVABLE memory */
struct kmem_cache_node *n;

if (slab_state == DOWN) {
early_kmem_cache_node_alloc(node);
continue;
}
n = kmem_cache_alloc_node(kmem_cache_node,
GFP_KERNEL, node);

...
}
...
}

> 
>> As suggested by Christoph Lameter in threads
>> http://marc.info/?l=linux-mm=134323057602484=2, we introduce 
>> N_LRU_MEMORY to
>> distinguish between "normal" and "movable" memory.
> 
> Well seems that I am having second thoughts about this. While is it true
> that current page migration can only move pages on the LRU there are
> already various mechanisms proposed and implemented that can move pages
> not on the LRU (like page table pages). Not sure if this is still a useful
> distinction to make. There is also the issue that segments from
> "N_LRU_MEMORY" may be allocated and then become not movable anymore.

Some kernel pages,like memmap pages,usemap pages are still can not be
migrated.

> 
> For the slab case that you want to solve here you will need to know if the
> node has *only* movable memory and will never have any ZONE_NORMAL memory.
> If so then memory control structures for allocators that do not allow
> movable memory will not need to be allocated for these node. The node can
> be excluded from handling.

I think this is what we are trying to do in this patch.
did I miss something?

> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-10 Thread Hanjun Guo
On 2012/8/9 22:06, Christoph Lameter (Open Source) wrote:
 On Thu, 9 Aug 2012, Hanjun Guo wrote:
 
 Now, We have node masks for both N_NORMAL_MEMORY and
 N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
 x86.
 But we still don't have such a mechanism to distinguish between normal and 
 movable
 memory.
 
 What is the exact difference that you want to establish?

Hi Christoph,
Thanks for your comments very much!

We want to identify the node only has ZONE_MOVABLE memory.
for example:
node 0: ZONE_DMA, ZONE_DMA32, ZONE_NORMAL-- N_LRU_MEMORY, 
N_NORMAL_MEMORY
node 1: ZONE_MOVABLE -- N_LRU_MEMORY
thus, in SLUB allocator, will not allocate memory control structures for node1.

static int init_kmem_cache_nodes(struct kmem_cache *s)
{
int node;

for_each_node_state(node, N_NORMAL_MEMORY) { /* -- skip nodes only has 
ZONE_MOVABLE memory */
struct kmem_cache_node *n;

if (slab_state == DOWN) {
early_kmem_cache_node_alloc(node);
continue;
}
n = kmem_cache_alloc_node(kmem_cache_node,
GFP_KERNEL, node);

...
}
...
}

 
 As suggested by Christoph Lameter in threads
 http://marc.info/?l=linux-mmm=134323057602484w=2, we introduce 
 N_LRU_MEMORY to
 distinguish between normal and movable memory.
 
 Well seems that I am having second thoughts about this. While is it true
 that current page migration can only move pages on the LRU there are
 already various mechanisms proposed and implemented that can move pages
 not on the LRU (like page table pages). Not sure if this is still a useful
 distinction to make. There is also the issue that segments from
 N_LRU_MEMORY may be allocated and then become not movable anymore.

Some kernel pages,like memmap pages,usemap pages are still can not be
migrated.

 
 For the slab case that you want to solve here you will need to know if the
 node has *only* movable memory and will never have any ZONE_NORMAL memory.
 If so then memory control structures for allocators that do not allow
 movable memory will not need to be allocated for these node. The node can
 be excluded from handling.

I think this is what we are trying to do in this patch.
did I miss something?

 
 .
 


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-10 Thread Yasuaki Ishimatsu

Hi Guo,

I have a question. How do you create the offlinable node? The current linux
cannot offline all memory on node. So we cannot hit the bug.

Recently Lai sent the following patches which create the movable node.
I think these patches consider the problem.

https://lkml.org/lkml/2012/8/6/113
= Hi Lai,
   I think your patches slove Guo's problem. How do you think?

Thanks,
Yasuaki Ishimatu

2012/08/09 13:39, Hanjun Guo wrote:

From: Wu Jianguo wujian...@huawei.com

Hi all,
Now, We have node masks for both N_NORMAL_MEMORY and
N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
x86.
But we still don't have such a mechanism to distinguish between normal and 
movable
memory.

As suggested by Christoph Lameter in threads
http://marc.info/?l=linux-mmm=134323057602484w=2, we introduce N_LRU_MEMORY to
distinguish between normal and movable memory.

And this patch will fix the bug described as follow:

When handling a memory node with only movable zone, function
early_kmem_cache_node_alloc() will allocate a page from remote node but
still increase object count on local node, which will trigger a BUG_ON()
as below when hot-removing this memory node. Actually there's no need to
create kmem_cache_node for memory node with only movable zone at all.

[ cut here ]
kernel BUG at mm/slub.c:3590!
invalid opcode:  [#1] SMP
CPU 61
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table
mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt
iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel
ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core sg
lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4
mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64
aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix
megaraid_sas dm_mod [last unloaded: microcode]

Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85
IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
RIP: 0010:[81160b2a]  [81160b2a]
slab_memory_callback+0x1ba/0x1c0
RSP: 0018:880efdcb7c68  EFLAGS: 00010202
RAX: 0001 RBX: 880f7ec06100 RCX: 00010041
RDX: 00010042 RSI: 880f7ec02000 RDI: 880f7ec06100
RBP: 880efdcb7c78 R08: 88107b6fb098 R09: 81160a00
R10:  R11:  R12: 0019
R13: fffb R14:  R15: 81abe930
FS:  7f709f342700() GS:880f7f3a() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003b5a874570 CR3: 000f0da2 CR4: 07e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sh (pid: 46287, threadinfo 880efdcb6000, task 880f0fa5)
Stack:
  0004 880efdcb7da8 880efdcb7cb8 81524af5
  0001 81a8b620 81a8b640 0004
  880efdcb7da8  880efdcb7d08 8107a89a
Call Trace:
  [81524af5] notifier_call_chain+0x55/0x80
  [8107a89a] __blocking_notifier_call_chain+0x5a/0x80
  [8107a8d6] blocking_notifier_call_chain+0x16/0x20
  [81352f0b] memory_notify+0x1b/0x20
  [81507104] offline_pages+0x624/0x700
  [811619de] remove_memory+0x1e/0x20
  [813530cc] memory_block_change_state+0x13c/0x2e0
  [81153e96] ? alloc_pages_current+0xb6/0x120
  [81353332] store_mem_state+0xc2/0xd0
  [8133e190] dev_attr_store+0x20/0x30
  [811e2d4f] sysfs_write_file+0xef/0x170
  [81173e28] vfs_write+0xc8/0x190
  [81173ff1] sys_write+0x51/0x90
  [81528d29] system_call_fastpath+0x16/0x1b
Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7
c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff 0f 0b eb fe 66 90
55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
RIP  [81160b2a] slab_memory_callback+0x1ba/0x1c0
  RSP 880efdcb7c68
---[ end trace 749e9e9a67c78c12 ]---

Signed-off-by: Wu Jianguo wujian...@huawei.com
Signed-off-by: Jiang Liu jiang@huawei.com
---
  arch/alpha/mm/numa.c |2 +-
  arch/m32r/mm/discontig.c |2 +-
  arch/m68k/mm/motorola.c  |2 +-
  arch/parisc/mm/init.c|2 +-
  arch/tile/kernel/setup.c |2 +-
  arch/x86/mm/init_64.c|2 +-
  drivers/base/node.c  |4 +++-
  include/linux/nodemask.h |5 +++--
  mm/page_alloc.c  |   10 --
  9 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c
index 3973ae3..8402b29 100644
--- a/arch/alpha/mm/numa.c
+++ b/arch/alpha/mm/numa.c
@@ -313,7 +313,7 @@ void __init paging_init(void)
zones_size[ZONE_DMA] = 

Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-10 Thread Jiang Liu
Hi Isimatu,
We have worked out a changeset to enable offlinable node, which 
is based on a new ACPI based hotplug framework 
(http://www.spinics.net/lists/linux-pci/msg16826.html).
Now could hot-add/hot-remove a computer node with CPU/memory/PCI host bridge,
but it's still a prototype and we are improving code quality for sending out
for review.
We have noticed Jiangsan's work on the same topic, and it would
be better to cooperate on this topic. 
Regards!
Gerry

On 2012-8-10 17:43, Yasuaki Ishimatsu wrote:
 Hi Guo,
 
 I have a question. How do you create the offlinable node? The current linux
 cannot offline all memory on node. So we cannot hit the bug.
 
 Recently Lai sent the following patches which create the movable node.
 I think these patches consider the problem.
 
 https://lkml.org/lkml/2012/8/6/113
 = Hi Lai,
I think your patches slove Guo's problem. How do you think?
 
 Thanks,
 Yasuaki Ishimatu
 
 2012/08/09 13:39, Hanjun Guo wrote:
 From: Wu Jianguo wujian...@huawei.com

 Hi all,
 Now, We have node masks for both N_NORMAL_MEMORY and
 N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
 x86.
 But we still don't have such a mechanism to distinguish between normal and 
 movable
 memory.

 As suggested by Christoph Lameter in threads
 http://marc.info/?l=linux-mmm=134323057602484w=2, we introduce 
 N_LRU_MEMORY to
 distinguish between normal and movable memory.

 And this patch will fix the bug described as follow:

 When handling a memory node with only movable zone, function
 early_kmem_cache_node_alloc() will allocate a page from remote node but
 still increase object count on local node, which will trigger a BUG_ON()
 as below when hot-removing this memory node. Actually there's no need to
 create kmem_cache_node for memory node with only movable zone at all.

 [ cut here ]
 kernel BUG at mm/slub.c:3590!
 invalid opcode:  [#1] SMP
 CPU 61
 Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table
 mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
 ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt
 iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel
 ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core 
 sg
 lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4
 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64
 aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix
 megaraid_sas dm_mod [last unloaded: microcode]

 Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85
 IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
 RIP: 0010:[81160b2a]  [81160b2a]
 slab_memory_callback+0x1ba/0x1c0
 RSP: 0018:880efdcb7c68  EFLAGS: 00010202
 RAX: 0001 RBX: 880f7ec06100 RCX: 00010041
 RDX: 00010042 RSI: 880f7ec02000 RDI: 880f7ec06100
 RBP: 880efdcb7c78 R08: 88107b6fb098 R09: 81160a00
 R10:  R11:  R12: 0019
 R13: fffb R14:  R15: 81abe930
 FS:  7f709f342700() GS:880f7f3a() knlGS:
 CS:  0010 DS:  ES:  CR0: 8005003b
 CR2: 003b5a874570 CR3: 000f0da2 CR4: 07e0
 DR0:  DR1:  DR2: 
 DR3:  DR6: 0ff0 DR7: 0400
 Process sh (pid: 46287, threadinfo 880efdcb6000, task 880f0fa5)
 Stack:
   0004 880efdcb7da8 880efdcb7cb8 81524af5
   0001 81a8b620 81a8b640 0004
   880efdcb7da8  880efdcb7d08 8107a89a
 Call Trace:
   [81524af5] notifier_call_chain+0x55/0x80
   [8107a89a] __blocking_notifier_call_chain+0x5a/0x80
   [8107a8d6] blocking_notifier_call_chain+0x16/0x20
   [81352f0b] memory_notify+0x1b/0x20
   [81507104] offline_pages+0x624/0x700
   [811619de] remove_memory+0x1e/0x20
   [813530cc] memory_block_change_state+0x13c/0x2e0
   [81153e96] ? alloc_pages_current+0xb6/0x120
   [81353332] store_mem_state+0xc2/0xd0
   [8133e190] dev_attr_store+0x20/0x30
   [811e2d4f] sysfs_write_file+0xef/0x170
   [81173e28] vfs_write+0xc8/0x190
   [81173ff1] sys_write+0x51/0x90
   [81528d29] system_call_fastpath+0x16/0x1b
 Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7
 c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff 0f 0b eb fe 66 90
 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
 RIP  [81160b2a] slab_memory_callback+0x1ba/0x1c0
   RSP 880efdcb7c68
 ---[ end trace 749e9e9a67c78c12 ]---

 Signed-off-by: Wu Jianguo wujian...@huawei.com
 Signed-off-by: Jiang Liu jiang@huawei.com
 ---
   

Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-10 Thread Christoph Lameter (Open Source)
On Fri, 10 Aug 2012, Hanjun Guo wrote:

 On 2012/8/9 22:06, Christoph Lameter (Open Source) wrote:
  On Thu, 9 Aug 2012, Hanjun Guo wrote:
 
  Now, We have node masks for both N_NORMAL_MEMORY and
  N_HIGH_MEMORY to distinguish between normal and highmem on platforms such 
  as x86.
  But we still don't have such a mechanism to distinguish between normal 
  and movable
  memory.
 
  What is the exact difference that you want to establish?

 Hi Christoph,
 Thanks for your comments very much!

 We want to identify the node only has ZONE_MOVABLE memory.
 for example:
   node 0: ZONE_DMA, ZONE_DMA32, ZONE_NORMAL-- N_LRU_MEMORY, 
 N_NORMAL_MEMORY
   node 1: ZONE_MOVABLE -- N_LRU_MEMORY
 thus, in SLUB allocator, will not allocate memory control structures for 
 node1.

So this would change the N_NORMAL_MEMORY definition so that N_NORMAL
means !LRU allocs possible? So far N_NORMAL_MEMORY has a wider scope of
meaning. We need an accurate definition of the meaning of all these
attributes.

  For the slab case that you want to solve here you will need to know if the
  node has *only* movable memory and will never have any ZONE_NORMAL memory.
  If so then memory control structures for allocators that do not allow
  movable memory will not need to be allocated for these node. The node can
  be excluded from handling.

 I think this is what we are trying to do in this patch.
 did I miss something?

THe meaning of ZONE_NORMAL seems to change which causes confusion. Please
describe in detail what each of these attributes mean.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-09 Thread Christoph Lameter (Open Source)
On Thu, 9 Aug 2012, Hanjun Guo wrote:

> Now, We have node masks for both N_NORMAL_MEMORY and
> N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
> x86.
> But we still don't have such a mechanism to distinguish between "normal" and 
> "movable"
> memory.

What is the exact difference that you want to establish?

> As suggested by Christoph Lameter in threads
> http://marc.info/?l=linux-mm=134323057602484=2, we introduce N_LRU_MEMORY 
> to
> distinguish between "normal" and "movable" memory.

Well seems that I am having second thoughts about this. While is it true
that current page migration can only move pages on the LRU there are
already various mechanisms proposed and implemented that can move pages
not on the LRU (like page table pages). Not sure if this is still a useful
distinction to make. There is also the issue that segments from
"N_LRU_MEMORY" may be allocated and then become not movable anymore.

For the slab case that you want to solve here you will need to know if the
node has *only* movable memory and will never have any ZONE_NORMAL memory.
If so then memory control structures for allocators that do not allow
movable memory will not need to be allocated for these node. The node can
be excluded from handling.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-09 Thread Christoph Lameter (Open Source)
On Thu, 9 Aug 2012, Hanjun Guo wrote:

 Now, We have node masks for both N_NORMAL_MEMORY and
 N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
 x86.
 But we still don't have such a mechanism to distinguish between normal and 
 movable
 memory.

What is the exact difference that you want to establish?

 As suggested by Christoph Lameter in threads
 http://marc.info/?l=linux-mmm=134323057602484w=2, we introduce N_LRU_MEMORY 
 to
 distinguish between normal and movable memory.

Well seems that I am having second thoughts about this. While is it true
that current page migration can only move pages on the LRU there are
already various mechanisms proposed and implemented that can move pages
not on the LRU (like page table pages). Not sure if this is still a useful
distinction to make. There is also the issue that segments from
N_LRU_MEMORY may be allocated and then become not movable anymore.

For the slab case that you want to solve here you will need to know if the
node has *only* movable memory and will never have any ZONE_NORMAL memory.
If so then memory control structures for allocators that do not allow
movable memory will not need to be allocated for these node. The node can
be excluded from handling.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-08 Thread Hanjun Guo
From: Wu Jianguo 

Hi all,
Now, We have node masks for both N_NORMAL_MEMORY and
N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
x86.
But we still don't have such a mechanism to distinguish between "normal" and 
"movable"
memory.

As suggested by Christoph Lameter in threads
http://marc.info/?l=linux-mm=134323057602484=2, we introduce N_LRU_MEMORY to
distinguish between "normal" and "movable" memory.

And this patch will fix the bug described as follow:

When handling a memory node with only movable zone, function
early_kmem_cache_node_alloc() will allocate a page from remote node but
still increase object count on local node, which will trigger a BUG_ON()
as below when hot-removing this memory node. Actually there's no need to
create kmem_cache_node for memory node with only movable zone at all.

[ cut here ]
kernel BUG at mm/slub.c:3590!
invalid opcode:  [#1] SMP
CPU 61
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table
mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt
iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel
ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core sg
lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4
mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64
aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix
megaraid_sas dm_mod [last unloaded: microcode]

Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85
IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
RIP: 0010:[]  []
slab_memory_callback+0x1ba/0x1c0
RSP: 0018:880efdcb7c68  EFLAGS: 00010202
RAX: 0001 RBX: 880f7ec06100 RCX: 00010041
RDX: 00010042 RSI: 880f7ec02000 RDI: 880f7ec06100
RBP: 880efdcb7c78 R08: 88107b6fb098 R09: 81160a00
R10:  R11:  R12: 0019
R13: fffb R14:  R15: 81abe930
FS:  7f709f342700() GS:880f7f3a() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003b5a874570 CR3: 000f0da2 CR4: 07e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sh (pid: 46287, threadinfo 880efdcb6000, task 880f0fa5)
Stack:
 0004 880efdcb7da8 880efdcb7cb8 81524af5
 0001 81a8b620 81a8b640 0004
 880efdcb7da8  880efdcb7d08 8107a89a
Call Trace:
 [] notifier_call_chain+0x55/0x80
 [] __blocking_notifier_call_chain+0x5a/0x80
 [] blocking_notifier_call_chain+0x16/0x20
 [] memory_notify+0x1b/0x20
 [] offline_pages+0x624/0x700
 [] remove_memory+0x1e/0x20
 [] memory_block_change_state+0x13c/0x2e0
 [] ? alloc_pages_current+0xb6/0x120
 [] store_mem_state+0xc2/0xd0
 [] dev_attr_store+0x20/0x30
 [] sysfs_write_file+0xef/0x170
 [] vfs_write+0xc8/0x190
 [] sys_write+0x51/0x90
 [] system_call_fastpath+0x16/0x1b
Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7
c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff <0f> 0b eb fe 66 90
55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
RIP  [] slab_memory_callback+0x1ba/0x1c0
 RSP 
---[ end trace 749e9e9a67c78c12 ]---

Signed-off-by: Wu Jianguo 
Signed-off-by: Jiang Liu 
---
 arch/alpha/mm/numa.c |2 +-
 arch/m32r/mm/discontig.c |2 +-
 arch/m68k/mm/motorola.c  |2 +-
 arch/parisc/mm/init.c|2 +-
 arch/tile/kernel/setup.c |2 +-
 arch/x86/mm/init_64.c|2 +-
 drivers/base/node.c  |4 +++-
 include/linux/nodemask.h |5 +++--
 mm/page_alloc.c  |   10 --
 9 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c
index 3973ae3..8402b29 100644
--- a/arch/alpha/mm/numa.c
+++ b/arch/alpha/mm/numa.c
@@ -313,7 +313,7 @@ void __init paging_init(void)
zones_size[ZONE_DMA] = dma_local_pfn;
zones_size[ZONE_NORMAL] = (end_pfn - start_pfn) - 
dma_local_pfn;
}
-   node_set_state(nid, N_NORMAL_MEMORY);
+   node_set_state(nid, N_LRU_MEMORY);
free_area_init_node(nid, zones_size, start_pfn, NULL);
}

diff --git a/arch/m32r/mm/discontig.c b/arch/m32r/mm/discontig.c
index 2c468e8..4d76e19 100644
--- a/arch/m32r/mm/discontig.c
+++ b/arch/m32r/mm/discontig.c
@@ -149,7 +149,7 @@ unsigned long __init zone_sizes_init(void)
zholes_size[ZONE_DMA] = mp->holes;
holes += zholes_size[ZONE_DMA];

-   node_set_state(nid, N_NORMAL_MEMORY);
+   node_set_state(nid, N_LRU_MEMORY);
free_area_init_node(nid, zones_size, start_pfn, zholes_size);
 

[RFC PATCH] mm: introduce N_LRU_MEMORY to distinguish between normal and movable memory

2012-08-08 Thread Hanjun Guo
From: Wu Jianguo wujian...@huawei.com

Hi all,
Now, We have node masks for both N_NORMAL_MEMORY and
N_HIGH_MEMORY to distinguish between normal and highmem on platforms such as 
x86.
But we still don't have such a mechanism to distinguish between normal and 
movable
memory.

As suggested by Christoph Lameter in threads
http://marc.info/?l=linux-mmm=134323057602484w=2, we introduce N_LRU_MEMORY to
distinguish between normal and movable memory.

And this patch will fix the bug described as follow:

When handling a memory node with only movable zone, function
early_kmem_cache_node_alloc() will allocate a page from remote node but
still increase object count on local node, which will trigger a BUG_ON()
as below when hot-removing this memory node. Actually there's no need to
create kmem_cache_node for memory node with only movable zone at all.

[ cut here ]
kernel BUG at mm/slub.c:3590!
invalid opcode:  [#1] SMP
CPU 61
Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table
mperf ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables
ipv6 vfat fat dm_mirror dm_region_hash dm_log uinput iTCO_wdt
iTCO_vendor_support coretemp hwmon kvm_intel kvm crc32c_intel
ghash_clmulni_intel serio_raw pcspkr cdc_ether usbnet mii i2c_i801 i2c_core sg
lpc_ich mfd_core shpchp ioatdma i7core_edac edac_core igb dca bnx2 ext4
mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif aesni_intel cryptd aes_x86_64
aes_generic bfa scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix
megaraid_sas dm_mod [last unloaded: microcode]

Pid: 46287, comm: sh Not tainted 3.5.0-rc4-pgtable-00215-g35f0828-dirty #85
IBM System x3850 X5 -[7143O3G]-/Node 1, Processor Card
RIP: 0010:[81160b2a]  [81160b2a]
slab_memory_callback+0x1ba/0x1c0
RSP: 0018:880efdcb7c68  EFLAGS: 00010202
RAX: 0001 RBX: 880f7ec06100 RCX: 00010041
RDX: 00010042 RSI: 880f7ec02000 RDI: 880f7ec06100
RBP: 880efdcb7c78 R08: 88107b6fb098 R09: 81160a00
R10:  R11:  R12: 0019
R13: fffb R14:  R15: 81abe930
FS:  7f709f342700() GS:880f7f3a() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003b5a874570 CR3: 000f0da2 CR4: 07e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sh (pid: 46287, threadinfo 880efdcb6000, task 880f0fa5)
Stack:
 0004 880efdcb7da8 880efdcb7cb8 81524af5
 0001 81a8b620 81a8b640 0004
 880efdcb7da8  880efdcb7d08 8107a89a
Call Trace:
 [81524af5] notifier_call_chain+0x55/0x80
 [8107a89a] __blocking_notifier_call_chain+0x5a/0x80
 [8107a8d6] blocking_notifier_call_chain+0x16/0x20
 [81352f0b] memory_notify+0x1b/0x20
 [81507104] offline_pages+0x624/0x700
 [811619de] remove_memory+0x1e/0x20
 [813530cc] memory_block_change_state+0x13c/0x2e0
 [81153e96] ? alloc_pages_current+0xb6/0x120
 [81353332] store_mem_state+0xc2/0xd0
 [8133e190] dev_attr_store+0x20/0x30
 [811e2d4f] sysfs_write_file+0xef/0x170
 [81173e28] vfs_write+0xc8/0x190
 [81173ff1] sys_write+0x51/0x90
 [81528d29] system_call_fastpath+0x16/0x1b
Code: 8b 3d cb fd c4 00 be d0 00 00 00 e8 71 de ff ff 48 85 c0 75 9c 48 c7 c7
c0 7f a5 81 e8 c0 89 f1 ff b8 0d 80 00 00 e9 69 fe ff ff 0f 0b eb fe 66 90
55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83
RIP  [81160b2a] slab_memory_callback+0x1ba/0x1c0
 RSP 880efdcb7c68
---[ end trace 749e9e9a67c78c12 ]---

Signed-off-by: Wu Jianguo wujian...@huawei.com
Signed-off-by: Jiang Liu jiang@huawei.com
---
 arch/alpha/mm/numa.c |2 +-
 arch/m32r/mm/discontig.c |2 +-
 arch/m68k/mm/motorola.c  |2 +-
 arch/parisc/mm/init.c|2 +-
 arch/tile/kernel/setup.c |2 +-
 arch/x86/mm/init_64.c|2 +-
 drivers/base/node.c  |4 +++-
 include/linux/nodemask.h |5 +++--
 mm/page_alloc.c  |   10 --
 9 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/alpha/mm/numa.c b/arch/alpha/mm/numa.c
index 3973ae3..8402b29 100644
--- a/arch/alpha/mm/numa.c
+++ b/arch/alpha/mm/numa.c
@@ -313,7 +313,7 @@ void __init paging_init(void)
zones_size[ZONE_DMA] = dma_local_pfn;
zones_size[ZONE_NORMAL] = (end_pfn - start_pfn) - 
dma_local_pfn;
}
-   node_set_state(nid, N_NORMAL_MEMORY);
+   node_set_state(nid, N_LRU_MEMORY);
free_area_init_node(nid, zones_size, start_pfn, NULL);
}

diff --git a/arch/m32r/mm/discontig.c b/arch/m32r/mm/discontig.c
index 2c468e8..4d76e19 100644
--- a/arch/m32r/mm/discontig.c
+++ b/arch/m32r/mm/discontig.c
@@ -149,7