Re: [PATCH 03/29] mm: slb: add knowledge of reserve pages

2007-12-15 Thread Daniel Phillips
On Friday 14 December 2007 14:51, I wrote: On Friday 14 December 2007 07:39, Peter Zijlstra wrote: Note that false sharing of slab pages is still possible between two unrelated writeout processes, both of which obey rules for their own writeout path, but the pinned combination does not. This

Re: [PATCH 16/29] netvm: INET reserves.

2007-12-14 Thread Daniel Phillips
Hi Peter, sysctl_intvec_fragment, proc_dointvec_fragment, sysctl_intvec_fragment seem to suffer from cut-n-pastitis. Regards, Daniel -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at

Re: [PATCH 00/29] Swap over NFS -v15

2007-12-14 Thread Daniel Phillips
Hi Peter, A major feature of this patch set is the network receive deadlock avoidance, but there is quite a bit of stuff bundled with it, the NFS user accounting for a big part of the patch by itself. Is it possible to provide a before and after demonstration case for just the network receive

Re: [PATCH 04/29] mm: kmem_estimate_pages()

2007-12-14 Thread Daniel Phillips
On Friday 14 December 2007 07:39, Peter Zijlstra wrote: Provide a method to get the upper bound on the pages needed to allocate a given number of objects from a given kmem_cache. This lays the foundation for a generic reserve framework as presented in a later patch in this series. This

Re: [PATCH 03/29] mm: slb: add knowledge of reserve pages

2007-12-14 Thread Daniel Phillips
On Friday 14 December 2007 07:39, Peter Zijlstra wrote: Restrict objects from reserve slabs (ALLOC_NO_WATERMARKS) to allocation contexts that are entitled to it. This is done to ensure reserve pages don't leak out and get consumed. Tighter definitions of leak out and get consumed would be

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-09-01 Thread Daniel Phillips
On Friday 31 August 2007 14:41, Alasdair G Kergon wrote: On Thu, Aug 30, 2007 at 04:20:35PM -0700, Daniel Phillips wrote: Resubmitting a bio or submitting a dependent bio from inside a block driver does not need to be throttled because all resources required to guarantee completion must

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-30 Thread Daniel Phillips
On Wednesday 29 August 2007 01:53, Evgeniy Polyakov wrote: Then, if of course you will want, which I doubt, you can reread previous mails and find that it was pointed to that race and possibilities to solve it way too long ago. What still bothers me about your response is that, while you know

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-28 Thread Daniel Phillips
On Tuesday 28 August 2007 02:35, Evgeniy Polyakov wrote: On Mon, Aug 27, 2007 at 02:57:37PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: Say Evgeniy, something I was curious about but forgot to ask you earlier... On Wednesday 08 August 2007 03:17, Evgeniy Polyakov wrote: ...All

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-28 Thread Daniel Phillips
On Tuesday 28 August 2007 10:54, Evgeniy Polyakov wrote: On Tue, Aug 28, 2007 at 10:27:59AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: We do not care about one cpu being able to increase its counter higher than the limit, such inaccuracy (maximum bios in flight thus can be more

Re: [1/1] Block device throttling [Re: Distributed storage.]

2007-08-27 Thread Daniel Phillips
Say Evgeniy, something I was curious about but forgot to ask you earlier... On Wednesday 08 August 2007 03:17, Evgeniy Polyakov wrote: ...All oerations are not atomic, since we do not care about precise number of bios, but a fact, that we are close or close enough to the limit. ... in

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Daniel Phillips
On Tuesday 14 August 2007 01:46, Evgeniy Polyakov wrote: On Mon, Aug 13, 2007 at 06:04:06AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: Perhaps you never worried about the resources that the device mapper mapping function allocates to handle each bio and so did not consider this hole

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Daniel Phillips
On Tuesday 14 August 2007 04:30, Evgeniy Polyakov wrote: And it will not solve the deadlock problem in general. (Maybe it works for your virtual device, but I wonder...) If the virtual device allocates memory during generic_make_request then the memory needs to be throttled. Daniel, if

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Daniel Phillips
On Tuesday 14 August 2007 04:50, Evgeniy Polyakov wrote: On Tue, Aug 14, 2007 at 04:35:43AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: On Tuesday 14 August 2007 04:30, Evgeniy Polyakov wrote: And it will not solve the deadlock problem in general. (Maybe it works for your

Re: Block device throttling [Re: Distributed storage.]

2007-08-14 Thread Daniel Phillips
On Tuesday 14 August 2007 05:46, Evgeniy Polyakov wrote: The throttling of the virtual device must begin in generic_make_request and last to -endio. You release the throttle of the virtual device at the point you remap the bio to an underlying device, which you have convinced yourself is

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Sunday 12 August 2007 22:36, I wrote: Note! There are two more issues I forgot to mention earlier. Oops, and there is also: 3) The bio throttle, which is supposed to prevent deadlock, can itself deadlock. Let me see if I can remember how it goes. * generic_make_request puts a bio in

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 00:28, Jens Axboe wrote: On Sun, Aug 12 2007, Daniel Phillips wrote: Right, that is done by bi_vcnt. I meant bi_max_vecs, which you can derive efficiently from BIO_POOL_IDX() provided the bio was allocated in the standard way. That would only be feasible, if we

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 00:45, Jens Axboe wrote: On Mon, Aug 13 2007, Jens Axboe wrote: You did not comment on the one about putting the bio destructor in the -endio handler, which looks dead simple. The majority of cases just use the default endio handler and the default

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:13, Jens Axboe wrote: On Mon, Aug 13 2007, Daniel Phillips wrote: On Monday 13 August 2007 00:45, Jens Axboe wrote: On Mon, Aug 13 2007, Jens Axboe wrote: You did not comment on the one about putting the bio destructor in the -endio handler, which looks

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 03:06, Jens Axboe wrote: On Mon, Aug 13 2007, Daniel Phillips wrote: Of course not. Nothing I said stops endio from being called in the usual way as well. For this to work, endio just needs to know that one call means end and the other means destroy

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:18, Evgeniy Polyakov wrote: On Mon, Aug 13, 2007 at 02:08:57AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: But that idea fails as well, since reference counts and IO completion are two completely seperate entities. So unless end IO just happens

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 03:22, Jens Axboe wrote: I never compared the bio to struct page, I'd obviously agree that shrinking struct page was a worthy goal and that it'd be ok to uglify some code to do that. The same isn't true for struct bio. I thought I just said that. Regards, Daniel - To

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 01:14, Evgeniy Polyakov wrote: Oops, and there is also: 3) The bio throttle, which is supposed to prevent deadlock, can itself deadlock. Let me see if I can remember how it goes. * generic_make_request puts a bio in flight * the bio gets past the

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 01:23, Evgeniy Polyakov wrote: On Sun, Aug 12, 2007 at 10:36:23PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: (previous incomplete message sent accidentally) On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: On Tue, Aug 07, 2007 at 10:55:38PM

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 04:03, Evgeniy Polyakov wrote: On Mon, Aug 13, 2007 at 03:12:33AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: This is not a very good solution, since it requires all users of the bios to know how to free it. No, only the specific -endio needs to know

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 05:04, Evgeniy Polyakov wrote: On Mon, Aug 13, 2007 at 04:04:26AM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: On Monday 13 August 2007 01:14, Evgeniy Polyakov wrote: Oops, and there is also: 3) The bio throttle, which is supposed to prevent deadlock

Re: Block device throttling [Re: Distributed storage.]

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 05:18, Evgeniy Polyakov wrote: Say you have a device mapper device with some physical device sitting underneath, the classic use case for this throttle code. Say 8,000 threads each submit an IO in parallel. The device mapper mapping function will be called 8,000

Re: Distributed storage.

2007-08-13 Thread Daniel Phillips
On Monday 13 August 2007 02:12, Jens Axboe wrote: It is a system wide problem. Every block device needs throttling, otherwise queues expand without limit. Currently, block devices that use the standard request library get a slipshod form of throttling for free in the form of limiting

Re: Distributed storage.

2007-08-12 Thread Daniel Phillips
On Tuesday 07 August 2007 13:55, Jens Axboe wrote: I don't like structure bloat, but I do like nice design. Overloading is a necessary evil sometimes, though. Even today, there isn't enough room to hold bi_rw and bi_flags in the same variable on 32-bit archs, so that concern can be scratched.

Re: Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe ([EMAIL PROTECTED]) wrote: So, what did we decide? To bloat bio a bit (add a queue pointer) or to use physical device limits? The latter requires to replace all occurence of

Re: Block device throttling [Re: Distributed storage.]

2007-08-12 Thread Daniel Phillips
(previous incomplete message sent accidentally) On Wednesday 08 August 2007 02:54, Evgeniy Polyakov wrote: On Tue, Aug 07, 2007 at 10:55:38PM +0200, Jens Axboe wrote: So, what did we decide? To bloat bio a bit (add a queue pointer) or to use physical device limits? The latter requires to

Re: Distributed storage.

2007-08-07 Thread Daniel Phillips
On Tuesday 07 August 2007 05:05, Jens Axboe wrote: On Sun, Aug 05 2007, Daniel Phillips wrote: A simple way to solve the stable accounting field issue is to add a new pointer to struct bio that is owned by the top level submitter (normally generic_make_request but not always

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Saturday 04 August 2007 09:37, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 06:19:16PM -0700, I wrote: To be sure, I am not very proud of this throttling mechanism for various reasons, but the thing is, _any_ throttling mechanism no matter how sucky solves the deadlock problem. Over

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Saturday 04 August 2007 09:44, Evgeniy Polyakov wrote: On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: * storage can be formed on top of remote nodes and be exported simultaneously (iSCSI is peer-to-peer only, NBD requires device mapper and is synchronous) In fact, NBD

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Sunday 05 August 2007 08:08, Evgeniy Polyakov wrote: If we are sleeping in memory pool, then we already do not have memory to complete previous requests, so we are in trouble. Not at all. Any requests in flight are guaranteed to get the resources they need to complete. This is guaranteed

Re: Distributed storage.

2007-08-05 Thread Daniel Phillips
On Sunday 05 August 2007 08:01, Evgeniy Polyakov wrote: On Sun, Aug 05, 2007 at 01:06:58AM -0700, Daniel Phillips wrote: DST original code worked as device mapper plugin too, but its two additional allocations (io and clone) per block request ended up for me as a show stopper. Ah

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 06:49, Evgeniy Polyakov wrote: ...rx has global reserve (always allocated on startup or sometime way before reclaim/oom)where data is originally received (including skb, shared info and whatever is needed, page is just an exmaple), then it is copied into per-socket

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 07:53, Peter Zijlstra wrote: On Fri, 2007-08-03 at 17:49 +0400, Evgeniy Polyakov wrote: On Fri, Aug 03, 2007 at 02:27:52PM +0200, Peter Zijlstra wrote: ...my main position is to allocate per socket reserve from socket's queue, and copy data there from main

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
Hi Evgeniy, Nit alert: On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: * storage can be formed on top of remote nodes and be exported simultaneously (iSCSI is peer-to-peer only, NBD requires device mapper and is synchronous) In fact, NBD has nothing to do with device

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
Hi Mike, On Thursday 02 August 2007 21:09, Mike Snitzer wrote: But NBD's synchronous nature is actually an asset when coupled with MD raid1 as it provides guarantees that the data has _really_ been mirrored remotely. And bio completion doesn't? Regards, Daniel - To unsubscribe from this

Re: Distributed storage.

2007-08-03 Thread Daniel Phillips
On Friday 03 August 2007 03:26, Evgeniy Polyakov wrote: On Thu, Aug 02, 2007 at 02:08:24PM -0700, I wrote: I see bits that worry me, e.g.: + req = mempool_alloc(st-w-req_pool, GFP_NOIO); which seems to be callable in response to a local request, just the case where NBD

Re: Distributed storage.

2007-08-02 Thread Daniel Phillips
On Tuesday 31 July 2007 10:13, Evgeniy Polyakov wrote: Hi. I'm pleased to announce first release of the distributed storage subsystem, which allows to form a storage on top of remote and local nodes, which in turn can be exported to another storage as a node to form tree-like storages.

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-18 Thread Daniel Phillips
Andrew Morton wrote: Daniel Phillips wrote: Andrew Morton wrote: ...it's runtime configurable. So we default to less than the best because we are too lazy to fix the network starvation issue properly? Maybe we don't really need a mempool for struct bio either, isn't that one rather like

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-18 Thread Daniel Phillips
Andrew Morton wrote: ...in my earlier emails I asked a number of questions regarding whether existing facilities, queued patches or further slight kernel changes could provide a sufficient solution to these problems. The answer may well be no. But diligence requires that we be able to prove

Re: Network receive stall avoidance (was [PATCH 2/9] deadlock prevention core)

2006-08-18 Thread Daniel Phillips
Andrew Morton wrote: handwaving - The mmap(MAP_SHARED)-the-whole-world scenario should be fixed by mm-tracking-shared-dirty-pages.patch. Please test it and if you are still able to demonstrate deadlocks, describe how, and why they are occurring. OK, but please see atomic 0 order

Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD

2006-08-17 Thread Daniel Phillips
Evgeniy Polyakov wrote: Just for clarification - it will be completely impossible to login using openssh or some other priveledge separation protocol to the machine due to the nature of unix sockets. So you will be unable to manage your storage system just because it is in OOM - it is not what

Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD

2006-08-17 Thread Daniel Phillips
Evgeniy Polyakov wrote: On Thu, Aug 17, 2006 at 09:15:14PM +0200, Peter Zijlstra ([EMAIL PROTECTED]) wrote: I got openssh as example of situation when system does not know in advance, what sockets must be marked as critical. OpenSSH works with network and unix sockets in parallel, so you

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-17 Thread Daniel Phillips
Andrew Morton wrote: Daniel Phillips [EMAIL PROTECTED] wrote: What happened to the case where we just fill memory full of dirty file pages backed by a remote disk? Processes which are dirtying those pages throttle at /proc/sys/vm/dirty_ratio% of memory dirty. So it is not possible to fill

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-17 Thread Daniel Phillips
Daniel Phillips wrote: Andrew Morton wrote: Processes which are dirtying those pages throttle at /proc/sys/vm/dirty_ratio% of memory dirty. So it is not possible to fill memory with dirty pages. If the amount of physical memory which is dirty exceeds 40%: bug. So we make 400 MB of a 1 GB

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-16 Thread Daniel Phillips
Andrew Morton wrote: Peter Zijlstra [EMAIL PROTECTED] wrote: Testcase: Mount an NBD device as sole swap device and mmap physical RAM, then loop through touching pages only once. Fix: don't try to swap over the network. Yes, there may be some scenarios where people have no local storage,

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-16 Thread Daniel Phillips
Andrew Morton wrote: What is a socket wait queue and how/why can it consume so much memory? Two things: 1) sk_buffs in flight between device receive interrupt and layer 3 protocol/socket identification. 2) sk_buffs queued onto a particular socket waiting for some task to come

Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD

2006-08-16 Thread Daniel Phillips
Evgeniy Polyakov wrote: On Sun, Aug 13, 2006 at 01:16:15PM -0700, Daniel Phillips ([EMAIL PROTECTED]) wrote: Indeed. The rest of the corner cases like netfilter, layered protocol and so on need to be handled, however they do not need to be handled right now in order to make remote storage

Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD

2006-08-16 Thread Daniel Phillips
Evgeniy Polyakov wrote: On Mon, Aug 14, 2006 at 08:45:43AM +0200, Peter Zijlstra ([EMAIL PROTECTED]) wrote: Just pure openssh for control connection (admin should be able to login). These periods of degenerated functionality should be short and infrequent albeit critical for machine

Re: [RFC][PATCH 8/9] 3c59x driver conversion

2006-08-13 Thread Daniel Phillips
David Miller wrote: I think he's saying that he doesn't think your code is yet a reasonable way to solve the problem, and therefore doesn't belong upstream. That is why it has not yet been submitted upstream. Respectfully, I do not think that jgarzik has yet put in the work to know if this

Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD

2006-08-13 Thread Daniel Phillips
Peter Zijlstra wrote: On Wed, 2006-08-09 at 16:54 -0700, David Miller wrote: People are doing I/O over IP exactly for it's ubiquity and flexibility. It seems a major limitation of the design if you cancel out major components of this flexibility. We're not, that was a bit of my own

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-13 Thread Daniel Phillips
David Miller wrote: From: Peter Zijlstra [EMAIL PROTECTED] Hmm, what does sk_buff::input_dev do? That seems to store the initial device? You can run grep on the tree just as easily as I can which is what I did to answer this question. It only takes a few seconds of your time to grep the

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-13 Thread Daniel Phillips
Rik van Riel wrote: Thomas Graf wrote: skb-dev is not guaranteed to still point to the allocating device once the skb is freed again so reserve/unreserve isn't symmetric. You'd need skb-alloc_dev or something. There's another consequence of this property of the network stack. Every network

Re: rename *MEMALLOC flags

2006-08-13 Thread Daniel Phillips
Peter Zijlstra wrote: Jeff Garzik in his infinite wisdom spake thusly: Peter Zijlstra wrote: Index: linux-2.6/include/linux/gfp.h === --- linux-2.6.orig/include/linux/gfp.h 2006-08-12 12:56:06.0 +0200 +++

Re: [RFC][PATCH 0/4] VM deadlock prevention -v4

2006-08-13 Thread Daniel Phillips
Peter Zijlstra wrote: On Sat, 2006-08-12 at 20:16 +0200, Indan Zupancic wrote: What was missing or wrong in the old approach? Can't you use the new approach, but use alloc_pages() instead of SROG? Sorry if I bug you so, but I'm also trying to increase my knowledge here. ;-) I'm almost sorry

Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD

2006-08-13 Thread Daniel Phillips
Evgeniy Polyakov wrote: One must receive a packet to determine if that packet must be dropped until tricky hardware with header split capabilities or MMIO copying is used. Peter uses special pool to get data from when system is in OOM (at least in his latest patchset), so allocations are

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-13 Thread Daniel Phillips
David Miller wrote: From: Daniel Phillips [EMAIL PROTECTED] David Miller wrote: The reason is that there is no refcounting performed on these devices when they are attached to the skb, for performance reasons, and thus the device can be downed, the module for it removed, etc. long before

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-13 Thread Daniel Phillips
David Miller wrote: I think there is more profitability from a solution that really does something about network memory, and doesn't try to say these devices are special or these sockets are special. Special cases generally suck. We already limit and control TCP socket memory globally in the

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-08 Thread Daniel Phillips
Indan Zupancic wrote: Hello, Saw the patch on lkml, and wondered about some things. On Tue, August 8, 2006 21:33, Peter Zijlstra said: +static inline void dev_unreserve_skb(struct net_device *dev) +{ + if (atomic_dec_return(dev-rx_reserve_used) 0) +

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-08 Thread Daniel Phillips
Stephen Hemminger wrote: How much of this is just building special case support for large allocations for jumbo frames? Wouldn't it make more sense to just fix those drivers to do scatter and add the support hooks for that? Short answer: none of it is. If it happens to handle jumbo frames

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-08 Thread Daniel Phillips
Hi Dave, David Miller wrote: I think the new atomic operation that will seemingly occur on every device SKB free is unacceptable. Alternate suggestion? You also cannot modify netdev-flags in the lockless manner in which you do, it must be done with the appropriate locking, such as holding

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-08 Thread Daniel Phillips
Thomas Graf wrote: skb-dev is not guaranteed to still point to the allocating device once the skb is freed again so reserve/unreserve isn't symmetric. You'd need skb-alloc_dev or something. Can you please characterize the conditions under which skb-dev changes after the alloc? Are there

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-08 Thread Daniel Phillips
David Miller wrote: From: Daniel Phillips [EMAIL PROTECTED] David Miller wrote: I think the new atomic operation that will seemingly occur on every device SKB free is unacceptable. Alternate suggestion? Sorry, I have none. But you're unlikely to get your changes considered seriously

Re: [RFC][PATCH 2/9] deadlock prevention core

2006-08-08 Thread Daniel Phillips
David Miller wrote: From: Daniel Phillips [EMAIL PROTECTED] Can you please characterize the conditions under which skb-dev changes after the alloc? Are there writings on this subtlety? The packet scheduler and classifier can redirect packets to different devices, and can the netfilter

Re: [RFC][PATCH 8/9] 3c59x driver conversion

2006-08-08 Thread Daniel Phillips
Jeff Garzik wrote: Peter Zijlstra wrote: Update the driver to make use of the netdev_alloc_skb() API and the NETIF_F_MEMALLOC feature. NETIF_F_MEMALLOC does not exist in the upstream tree... nor should it, IMO. Elaborate please. Do you think that all drivers should be updated to fix the

Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD

2006-08-08 Thread Daniel Phillips
Evgeniy Polyakov wrote: On Tue, Aug 08, 2006 at 09:33:25PM +0200, Peter Zijlstra ([EMAIL PROTECTED]) wrote: http://lwn.net/Articles/144273/ Kernel Summit 2005: Convergence of network and storage paths We believe that an approach very much like today's patch set is necessary for NBD,

Re: [RFC/PATCH 1/2] in-kernel sockets API

2006-06-14 Thread Daniel Phillips
Hi Harald, You wrote: On Tue, Jun 13, 2006 at 02:12:41PM -0700, I wrote: This has the makings of a nice stable internal kernel api. Why do we want to provide this nice stable internal api to proprietary modules? because there is IMHO legally nothing we can do about it anyway. Speaking as

Re: [RFC/PATCH 1/2] in-kernel sockets API

2006-06-13 Thread Daniel Phillips
Brian F. G. Bidulock wrote: Stephen, On Tue, 13 Jun 2006, Stephen Hemminger wrote: @@ -2176,3 +2279,13 @@ EXPORT_SYMBOL(sock_wake_async); EXPORT_SYMBOL(sockfd_lookup); EXPORT_SYMBOL(kernel_sendmsg); EXPORT_SYMBOL(kernel_recvmsg); +EXPORT_SYMBOL(kernel_bind); +EXPORT_SYMBOL(kernel_listen);

Re: [RFC/PATCH 1/2] in-kernel sockets API

2006-06-13 Thread Daniel Phillips
Chase Venters wrote: can you name some non-GPL non-proprietary modules we should be concerned about? You probably meant non-GPL-compatible non-proprietary. If so, then by definition there are none. Regards, Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body

Re: [stable] [NET] Fix zero-size datagram reception

2005-11-08 Thread Daniel Phillips
On Tuesday 08 November 2005 10:13, Greg KH wrote: On Thu, Nov 03, 2005 at 07:55:38AM +1100, Herbert Xu wrote: The recent rewrite of skb_copy_datagram_iovec broke the reception of zero-size datagrams. This patch fixes it. Signed-off-by: Herbert Xu [EMAIL PROTECTED] Please apply it to

[TESTME][PATCH] Make skb_copy_datagram_iovec nonrecursive

2005-08-25 Thread Daniel Phillips
Hi, I noticed that skb_copy_datagram_iovec calls itself recursively to copy a fragment list. This isn't actually wrong or even inefficient, it is just somehow disturbing. Oh, and it uses an extra stack frame, and is hard to read. Once I got started straightening that out, I couldn't resist

Re: [TESTME][PATCH] Make skb_copy_datagram_iovec nonrecursive

2005-08-25 Thread Daniel Phillips
On Thursday 25 August 2005 02:44, David S. Miller wrote: Frag lists cannot be deeper than one level of nesting, and I think the recursive version is easier to understand, so I really don't see the value of your change. Losing 34 lines of a 74 line function is the value. The real problem with

[TESTME][PATCH] Make skb_copy_datagram_iovec nonrecursive (revised)

2005-08-25 Thread Daniel Phillips
The fragment list handling was wrong in the previous version, now correct I think. datagram.c | 82 +++-- 1 files changed, 26 insertions(+), 56 deletions(-) diff -up --recursive 2.6.12.3.clean/net/core/datagram.c

Re: [TESTME][PATCH] Make skb_copy_datagram_iovec nonrecursive

2005-08-25 Thread Daniel Phillips
On Thursday 25 August 2005 03:30, David S. Miller wrote: From: Daniel Phillips [EMAIL PROTECTED] As far as I can see, it is illegal for any but the first skb to have a non-null skb_shinfo(skb)-frag_list, is this correct? As currently used, yes. That's a relief. I updated the patch

[TESTME][PATCH] Make skb_copy_datagram_iovec nonrecursive (really revised)

2005-08-25 Thread Daniel Phillips
Gah, this time the revised patch is included, not just the diffstat. datagram.c | 82 +++-- 1 files changed, 26 insertions(+), 56 deletions(-) diff -up --recursive 2.6.12.3.clean/net/core/datagram.c 2.6.12.3/net/core/datagram.c ---

[RFC] Net vm deadlock fix, version 6

2005-08-11 Thread Daniel Phillips
Hi, This version corrects a couple of bugs previously noted and ties up some loose ends in the e1000 driver. Some versions of this driver support packet splitting into multiple pages, with just the protocol header in the skb itself. This is a very good thing because it avoids the high order

[RFC] Net vm deadlock fix, version 5

2005-08-08 Thread Daniel Phillips
Hi, This version introduces the idea of having a network driver adjust the global memalloc reserve when it brings an interface up or down. The interface is: int adjust_memalloc_reserve(int bytes) which is just a thin shell over the min_free_kbytes interface that already exists. The

Re: [RFC] Net vm deadlock fix, version 5

2005-08-08 Thread Daniel Phillips
Hi, A couple of goofs. First, the sysctl interface to min_free_kbytes could stomp on any in-kernel adjustments. Now there are two variables, summed in setup_per_zone_pages_min: min_free_kbytes and var_free_kbytes. The adjust_memalloc_reserve operates only the latter, so the user can freely

[RFC] Net vm deadlock fix (take two)

2005-08-06 Thread Daniel Phillips
Hi, This version does not do blatantly stupid things in hardware irq context, is more efficient, and... wow the patch is smaller! (That never happens.) I don't mark skbs as being allocated from reserve any more. That works, but it is slightly bogus, because it doesn't matter which skb came

Re: [PATCH] netpoll can lock up on low memory.

2005-08-06 Thread Daniel Phillips
On Saturday 06 August 2005 12:32, Steven Rostedt wrote: If you need to really get the data out, then the design should be changed. Have some return value showing the failure, check for oops_in_progress or whatever, and try again after turning interrupts back on, and getting to a point

Re: kfree_skb questions

2005-08-06 Thread Daniel Phillips
On Sunday 07 August 2005 06:26, Patrick McHardy wrote: Anyway, do we not want BUG_ON(!atomic_read(skb-users)) at the beginning of kfree_skb, since we rely on it? Why do you care if skb-users is 0 or 1 in __kfree_skb()? Because I am a neatness freak and I like to check things that

[RFC] Net vm deadlock fix, version 4

2005-08-06 Thread Daniel Phillips
Hi, This patch fills in some missing pieces: * Support v4 udp: same as v4 tcp, when in reserve, drop packets on noncritical sockets * Support v4 icmp: when in reserve, drop icmp traffic * Add reserve skb support to e1000 driver * API for dropping packets before delivery

Re: argh... ;/

2005-08-05 Thread Daniel Phillips
On Friday 05 August 2005 13:04, Mateusz Berezecki wrote: I accidentaly posted the patches as MIME attachments... its 5:03 am here already. Sorry guys. I can resubmit if you want. I just dont want do that now and not trash your mailboxes Does anybody still care if patches are posted as

Re: argh... ;/

2005-08-05 Thread Daniel Phillips
On Saturday 06 August 2005 03:49, Dave Jones wrote: On Fri, Aug 05, 2005 at 01:20:59PM -0400, John W. Linville wrote: On Sat, Aug 06, 2005 at 02:41:30AM +1000, Daniel Phillips wrote: On Friday 05 August 2005 13:04, Mateusz Berezecki wrote: I accidentaly posted the patches as MIME

Re: Bypass softnet

2005-08-05 Thread Daniel Phillips
On Saturday 06 August 2005 02:33, David S. Miller wrote: You can't call into the networking packet input path from hardware interrupt context, it simply is not allowed. And that's the context in which netif_rx() gets called. Duh. I assumed we already were in softirq context here (but with

test

2005-08-05 Thread Daniel Phillips
On Saturday 06 August 2005 02:33, David S. Miller wrote: You can't call into the networking packet input path from hardware interrupt context, it simply is not allowed. And that's the context in which netif_rx() gets called. Duh. I assumed we already were in softirq context here (but with

Re: [RFC] Net vm deadlock fix (preliminary)

2005-08-04 Thread Daniel Phillips
Hi, I spent the last day mulling things over and doing research. It seems to me that the patch as first posted is correct and solves the deadlock, except that some uses of __GFP_MEMALLOC in __dev_alloc_skb may escape into contexts where the reserve is not guaranteed to be reclaimed. It may

[RFC] Net vm deadlock fix (preliminary)

2005-08-03 Thread Daniel Phillips
Hi, Here is a preliminary patch, not tested at all, just to give everybody a target to aim bricks at. * A new __GFP_MEMALLOC flag gives access to the memalloc reserve. * In dev_alloc_skb, if GFP_ATOMIC fails then try again with __GFP_MEMALLOC. * We know an skb was allocated from reserve

Re: [RFC] Net vm deadlock fix (preliminary)

2005-08-03 Thread Daniel Phillips
On Wednesday 03 August 2005 16:59, Martin Josefsson wrote: On Wed, 3 Aug 2005, Daniel Phillips wrote: Hi, Here is a preliminary patch, not tested at all, just to give everybody a target to aim bricks at. * A new __GFP_MEMALLOC flag gives access to the memalloc reserve

Re: Network vm deadlock... solution?

2005-08-02 Thread Daniel Phillips
On Wednesday 03 August 2005 07:43, Francois Romieu wrote: Daniel Phillips [EMAIL PROTECTED] : [...] A point on memory pressure: here, we are not talking about the continuous state of running under heavy load, but rather the microscopically short periods where not a single page of memory

Re: Network vm deadlock... solution?

2005-08-02 Thread Daniel Phillips
On Wednesday 03 August 2005 08:39, Martin J. Bligh wrote: --Francois Romieu [EMAIL PROTECTED] wrote (on Tuesday, August 02, 2005 Btw I do not get what the mempool/GFP_CRITICAL idea buys: it seems redundant with the threshold (if (memory_pressure)) used in the Rx path to decide that memory