LSF/MM 2018: Call for Proposals

2018-01-15 Thread Johannes Weiner
The annual Linux Storage, Filesystem and Memory Management (LSF/MM)
Summit for 2018 will be held from April 23-25 at the Deer Valley
Lodges in Park City, Utah. LSF/MM is an invitation-only technical
workshop to map out improvements to the Linux storage, filesystem and
memory management subsystems that will make their way into the
mainline kernel within the coming years.


http://events.linuxfoundation.org/events/linux-storage-filesystem-and-mm-summit

LSF/MM 2018 will be a three day, stand-alone conference with three
subsystem-specific tracks, cross-track discussions, as well as BoF and
hacking sessions.

On behalf of the committee I am issuing a call for agenda proposals
that are suitable for cross-track discussion as well as technical
subjects for the breakout sessions.

If advance notice is required for visa applications then please point
that out in your proposal or request to attend, and submit the topic
as soon as possible.

1) Proposals for agenda topics should be sent before January 31st,
2018 to:

lsf...@lists.linux-foundation.org

and CC the mailing lists that are relevant for the topic in question:

FS: linux-fsde...@vger.kernel.org
MM: linux...@kvack.org
Block:  linux-bl...@vger.kernel.org
ATA:linux-...@vger.kernel.org
SCSI:   linux-scsi@vger.kernel.org
NVMe:   linux-n...@lists.infradead.org

Please tag your proposal with [LSF/MM TOPIC] to make it easier to
track. In addition, please make sure to start a new thread for each
topic rather than following up to an existing one. Agenda topics and
attendees will be selected by the program committee, but the final
agenda will be formed by consensus of the attendees on the day.

2) Requests to attend the summit for those that are not proposing a
topic should be sent to:

lsf...@lists.linux-foundation.org

Please summarize what expertise you will bring to the meeting, and
what you would like to discuss. Please also tag your email with
[LSF/MM ATTEND] and send it as a new thread so there is less chance of
it getting lost.

We will try to cap attendance at around 25-30 per track to facilitate
discussions although the final numbers will depend on the room sizes
at the venue.

For discussion leaders, slides and visualizations are encouraged to
outline the subject matter and focus the discussions. Please refrain
from lengthy presentations and talks; the sessions are supposed to be
interactive, inclusive discussions.

There will be no recording or audio bridge. However, we expect that
written minutes will be published as we did in previous years:

2017: https://lwn.net/Articles/lsfmm2017/

2016: https://lwn.net/Articles/lsfmm2016/

2015: https://lwn.net/Articles/lsfmm2015/

2014: http://lwn.net/Articles/LSFMM2014/

2013: http://lwn.net/Articles/548089/

3) If you have feedback on last year's meeting that we can use to
improve this year's, please also send that to:

lsf...@lists.linux-foundation.org

Thank you on behalf of the program committee:

Anna Schumaker (Filesystems)
Jens Axboe (Storage)
Josef Bacik (Filesystems)
Martin K. Petersen (Storage)
Michal Hocko (MM)
Rik van Riel (MM)

Johannes Weiner


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2017-01-09 Thread Johannes Weiner
On Mon, Jan 09, 2017 at 09:30:05PM +0100, Jan Kara wrote:
> On Sat 07-01-17 21:02:00, Johannes Weiner wrote:
> > On Tue, Jan 03, 2017 at 01:28:25PM +0100, Jan Kara wrote:
> > > On Mon 02-01-17 16:11:36, Johannes Weiner wrote:
> > > > On Fri, Dec 23, 2016 at 03:33:29AM -0500, Johannes Weiner wrote:
> > > > > On Fri, Dec 23, 2016 at 02:32:41AM -0500, Johannes Weiner wrote:
> > > > > > On Thu, Dec 22, 2016 at 12:22:27PM -0800, Hugh Dickins wrote:
> > > > > > > On Wed, 21 Dec 2016, Linus Torvalds wrote:
> > > > > > > > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner 
> > > > > > > > <da...@fromorbit.com> wrote:
> > > > > > > > > I unmounted the fs, mkfs'd it again, ran the
> > > > > > > > > workload again and about a minute in this fired:
> > > > > > > > >
> > > > > > > > > [628867.607417] [ cut here ]
> > > > > > > > > [628867.608603] WARNING: CPU: 2 PID: 16925 at 
> > > > > > > > > mm/workingset.c:461 shadow_lru_isolate+0x171/0x220
> > > > > > > > 
> > > > > > > > Well, part of the changes during the merge window were the 
> > > > > > > > shadow
> > > > > > > > entry tracking changes that came in through Andrew's tree. 
> > > > > > > > Adding
> > > > > > > > Johannes Weiner to the participants.
> > 
> > Okay, the below patch should address this problem. Dave Jones managed
> > to reproduce it with the added WARN_ONs, and they made it obvious. He
> > cannot trigger it anymore with this fix applied. Thanks Dave!
> 
> FWIW the patch looks good to me. I'd just note that the need to pass the
> callback to deletion function and the fact that we do it only in cases
> where we think it is needed appears errorprone. With the warning you've
> added it should at least catch the cases where we got it wrong but more
> robust would be if the radix tree root contained a pointer to the callback
> function so that we would not rely on passing the callback to every place
> which can possibly free a node. Also conceptually this would make more
> sense to me since the fact that we may need to do some cleanup on node
> deletion is a property of the particular radix tree and how we use it.
> OTOH that would mean growing radix tree root by one pointer which would be
> unpopular I guess.

The last sentence is the crux, unfortunately. The first iteration of
the shadow shrinker linked up mappings that contained shadow entries,
rather than nodes. The code would have been drastically simpler in
pretty much all regards, without changes to the radix tree API. But
Dave Chinner objected to adding a pointer to the inode when we could
stick them into empty space (slab fragmentation) inside the nodes. A
fair point, I guess, especially when you consider metadata-intense
workloads. But now we're kind of stuck with the complexity of this.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2017-01-07 Thread Johannes Weiner
On Tue, Jan 03, 2017 at 01:28:25PM +0100, Jan Kara wrote:
> On Mon 02-01-17 16:11:36, Johannes Weiner wrote:
> > On Fri, Dec 23, 2016 at 03:33:29AM -0500, Johannes Weiner wrote:
> > > On Fri, Dec 23, 2016 at 02:32:41AM -0500, Johannes Weiner wrote:
> > > > On Thu, Dec 22, 2016 at 12:22:27PM -0800, Hugh Dickins wrote:
> > > > > On Wed, 21 Dec 2016, Linus Torvalds wrote:
> > > > > > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <da...@fromorbit.com> 
> > > > > > wrote:
> > > > > > > I unmounted the fs, mkfs'd it again, ran the
> > > > > > > workload again and about a minute in this fired:
> > > > > > >
> > > > > > > [628867.607417] [ cut here ]
> > > > > > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> > > > > > > shadow_lru_isolate+0x171/0x220
> > > > > > 
> > > > > > Well, part of the changes during the merge window were the shadow
> > > > > > entry tracking changes that came in through Andrew's tree. Adding
> > > > > > Johannes Weiner to the participants.

Okay, the below patch should address this problem. Dave Jones managed
to reproduce it with the added WARN_ONs, and they made it obvious. He
cannot trigger it anymore with this fix applied. Thanks Dave!

Linus? Andrew?

---

>From 503eeb20e68bdf3529bdc14aca1ce564880129f2 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <han...@cmpxchg.org>
Date: Fri, 6 Jan 2017 19:21:43 -0500
Subject: [PATCH] mm: workingset: fix use-after-free in shadow node shrinker

Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked
like use-after-free access from the shadow node shrinker. Dave Jones
managed to reproduce the issue with a debug patch applied, which
confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:

  WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
  CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
  Call Trace:
   dump_stack+0x4f/0x73
   __warn+0xcb/0xf0
   warn_slowpath_null+0x1d/0x20
   delete_node+0x1e4/0x200
   __radix_tree_delete_node+0xd/0x10
   shadow_lru_isolate+0xe6/0x220
   __list_lru_walk_one.isra.4+0x9b/0x190
   ? memcg_drain_all_list_lrus+0x1d0/0x1d0
   list_lru_walk_one+0x23/0x30
   scan_shadow_nodes+0x2e/0x40
   shrink_slab.part.44+0x23d/0x5d0
   ? 0xa023a077
   shrink_node+0x22c/0x330
   kswapd+0x392/0x8f0

This is the WARN_ON_ONCE(!list_empty(>private_list)) placed in
the inlined radix_tree_shrink().

The problem is with 14b468791fa9 ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node. While the reclaimed shadow node itself is unlinked by the
shrinker, its deletion from the tree can cause the left-most leaf node
in the tree to be shrunk. If that happens to be a shadow node as well,
we don't unlink it from the LRU as we should.

Consider this tree, where the s are shadow entries:

 root->rnode
  |
 [0   n]
  |   |
   [s] [s]

Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:

 root->rnode
  |
 [0]
  |
  [s ]

Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:

 root->rnode
  |
 [s]

The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root->rnode directly and free
the node - which is a leaf node on the shadow node LRU.

root->rnode
 |
 s

Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.

Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.

Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.

Fixes: 14b468791fa9 ("mm: workingset: move shadow entry tracking to radix tree 
exceptional tracking")
Reported-by: Dave Chinner <da...@fromorbit.com>
Reported-by: Hugh Dickins <hu...@google.com>
Reported-by: Andrea Arcangeli <aarca...@redhat.com>
Reported-by: Dave Jones <da...@codemonkey.org.uk>
Tested-by: Dave Jones <da...@codemonkey.org.uk>
Signed-off-by: Johannes Weiner <han..

Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2017-01-02 Thread Johannes Weiner
On Fri, Dec 23, 2016 at 03:33:29AM -0500, Johannes Weiner wrote:
> On Fri, Dec 23, 2016 at 02:32:41AM -0500, Johannes Weiner wrote:
> > On Thu, Dec 22, 2016 at 12:22:27PM -0800, Hugh Dickins wrote:
> > > On Wed, 21 Dec 2016, Linus Torvalds wrote:
> > > > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <da...@fromorbit.com> 
> > > > wrote:
> > > > > I unmounted the fs, mkfs'd it again, ran the
> > > > > workload again and about a minute in this fired:
> > > > >
> > > > > [628867.607417] [ cut here ]
> > > > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> > > > > shadow_lru_isolate+0x171/0x220
> > > > 
> > > > Well, part of the changes during the merge window were the shadow
> > > > entry tracking changes that came in through Andrew's tree. Adding
> > > > Johannes Weiner to the participants.
> > > > 
> > > > > Now, this workload does not touch the page cache at all - it's
> > > > > entirely an XFS metadata workload, so it should not really be
> > > > > affecting the working set code.
> > > > 
> > > > Well, I suspect that anything that creates memory pressure will end up
> > > > triggering the working set code, so ..
> > > > 
> > > > That said, obviously memory corruption could be involved and result in
> > > > random issues too, but I wouldn't really expect that in this code.
> > > > 
> > > > It would probably be really useful to get more data points - is the
> > > > problem reliably in this area, or is it going to be random and all
> > > > over the place.
> > > 
> > > Data point: kswapd got WARNING on mm/workingset.c:457 in 
> > > shadow_lru_isolate,
> > > soon followed by NULL pointer deref in list_lru_isolate, one time when
> > > I tried out Sunday's git tree.  Not seen since, I haven't had time to
> > > investigate, just set it aside as something to worry about if it happens
> > > again.  But it looks like shadow_lru_isolate() has issues beyond Dave's
> > > case (I've no XFS and no iscsi), suspect unrelated to his other problems.
> > 
> > This seems consistent with what Dave observed: we encounter regular
> > pages in radix tree nodes on the shadow LRU that should only contain
> > nodes full of exceptional shadow entries. It could be an issue in the
> > new slot replacement code and the node tracking callback.
> 
> Both encounters seem to indicate use-after-free. Dave's node didn't
> warn about an unexpected node->count / node->exceptional state, but
> had entries that were inconsistent with that. Hugh got the counter
> warning but crashed on a list_head that's not NULLed in a live node.
> 
> workingset_update_node() should be called on page cache radix tree
> leaf nodes that go empty. I must be missing an update_node callback
> where a leaf node gets freed somewhere.

Sorry for dropping silent on this. I'm traveling over the holidays
with sporadic access to my emails and no access to real equipment.

The times I managed to sneak away to look at the code didn't turn up
anything useful yet.

Andrea encountered the warning as well and I gave him a debugging
patch (attached below), but he hasn't been able to reproduce this
condition. I've personally never seen the warning trigger, even though
the patches have been running on my main development machine for quite
a while now. Albeit against an older base; I've updated to Linus's
master branch now in case it's an interaction with other new code.

If anybody manages to reproduce this, that would be helpful. Any extra
eyes on this would be much appreciated too until I'm back at my desk.

Thanks

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6f382e07de77..0783af1c0ebb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -640,6 +640,8 @@ static inline void radix_tree_shrink(struct radix_tree_root 
*root,
update_node(node, private);
}
 
+   WARN_ON_ONCE(!list_empty(>private_list));
+
radix_tree_node_free(node);
}
 }
@@ -666,6 +668,8 @@ static void delete_node(struct radix_tree_root *root,
root->rnode = NULL;
}
 
+   WARN_ON_ONCE(!list_empty(>private_list));
+
radix_tree_node_free(node);
 
node = parent;
@@ -767,6 +771,7 @@ static void radix_tree_free_nodes(struct radix_tree_node 
*node)
struct radix_tree_node *old = child;
offset = child->offset + 1;
child = child->parent;
+   WARN_ON_ONCE(!list_empty(>private_list));
radix_tree_node_free(old);
if (old == entry_to_node(node))
return;

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-23 Thread Johannes Weiner
On Fri, Dec 23, 2016 at 02:32:41AM -0500, Johannes Weiner wrote:
> On Thu, Dec 22, 2016 at 12:22:27PM -0800, Hugh Dickins wrote:
> > On Wed, 21 Dec 2016, Linus Torvalds wrote:
> > > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <da...@fromorbit.com> wrote:
> > > > I unmounted the fs, mkfs'd it again, ran the
> > > > workload again and about a minute in this fired:
> > > >
> > > > [628867.607417] [ cut here ]
> > > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> > > > shadow_lru_isolate+0x171/0x220
> > > 
> > > Well, part of the changes during the merge window were the shadow
> > > entry tracking changes that came in through Andrew's tree. Adding
> > > Johannes Weiner to the participants.
> > > 
> > > > Now, this workload does not touch the page cache at all - it's
> > > > entirely an XFS metadata workload, so it should not really be
> > > > affecting the working set code.
> > > 
> > > Well, I suspect that anything that creates memory pressure will end up
> > > triggering the working set code, so ..
> > > 
> > > That said, obviously memory corruption could be involved and result in
> > > random issues too, but I wouldn't really expect that in this code.
> > > 
> > > It would probably be really useful to get more data points - is the
> > > problem reliably in this area, or is it going to be random and all
> > > over the place.
> > 
> > Data point: kswapd got WARNING on mm/workingset.c:457 in shadow_lru_isolate,
> > soon followed by NULL pointer deref in list_lru_isolate, one time when
> > I tried out Sunday's git tree.  Not seen since, I haven't had time to
> > investigate, just set it aside as something to worry about if it happens
> > again.  But it looks like shadow_lru_isolate() has issues beyond Dave's
> > case (I've no XFS and no iscsi), suspect unrelated to his other problems.
> 
> This seems consistent with what Dave observed: we encounter regular
> pages in radix tree nodes on the shadow LRU that should only contain
> nodes full of exceptional shadow entries. It could be an issue in the
> new slot replacement code and the node tracking callback.

Both encounters seem to indicate use-after-free. Dave's node didn't
warn about an unexpected node->count / node->exceptional state, but
had entries that were inconsistent with that. Hugh got the counter
warning but crashed on a list_head that's not NULLed in a live node.

workingset_update_node() should be called on page cache radix tree
leaf nodes that go empty. I must be missing an update_node callback
where a leaf node gets freed somewhere.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

2016-12-22 Thread Johannes Weiner
On Thu, Dec 22, 2016 at 12:22:27PM -0800, Hugh Dickins wrote:
> On Wed, 21 Dec 2016, Linus Torvalds wrote:
> > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <da...@fromorbit.com> wrote:
> > > I unmounted the fs, mkfs'd it again, ran the
> > > workload again and about a minute in this fired:
> > >
> > > [628867.607417] [ cut here ]
> > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 
> > > shadow_lru_isolate+0x171/0x220
> > 
> > Well, part of the changes during the merge window were the shadow
> > entry tracking changes that came in through Andrew's tree. Adding
> > Johannes Weiner to the participants.
> > 
> > > Now, this workload does not touch the page cache at all - it's
> > > entirely an XFS metadata workload, so it should not really be
> > > affecting the working set code.
> > 
> > Well, I suspect that anything that creates memory pressure will end up
> > triggering the working set code, so ..
> > 
> > That said, obviously memory corruption could be involved and result in
> > random issues too, but I wouldn't really expect that in this code.
> > 
> > It would probably be really useful to get more data points - is the
> > problem reliably in this area, or is it going to be random and all
> > over the place.
> 
> Data point: kswapd got WARNING on mm/workingset.c:457 in shadow_lru_isolate,
> soon followed by NULL pointer deref in list_lru_isolate, one time when
> I tried out Sunday's git tree.  Not seen since, I haven't had time to
> investigate, just set it aside as something to worry about if it happens
> again.  But it looks like shadow_lru_isolate() has issues beyond Dave's
> case (I've no XFS and no iscsi), suspect unrelated to his other problems.

This seems consistent with what Dave observed: we encounter regular
pages in radix tree nodes on the shadow LRU that should only contain
nodes full of exceptional shadow entries. It could be an issue in the
new slot replacement code and the node tracking callback.

Looking...
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html