> On 10 Nov 2016, at 02:43, Glyph Lefkowitz <gl...@twistedmatrix.com> wrote:
> 
> And the problem isn't just with Tubes at the infrastructure level; most 
> applications are fundamentally not stream processing, and it will be 
> idiomatically challenging to express them as such.  HTTP connections are 
> short-lived and interfaces to comprehending them (i.e. 'json.loads') are not 
> themselves stream oriented.  Even if you did have a stream-oriented JSON 
> parser, expressing what you want from it is hard; you want a data structure 
> that you can simultaneously inspect multiple elements from, not a stream of 
> "object began" / "string began" / "string ended" / "list began" / "list 
> ended" events.

[snip]

> 
> More importantly, backpressure at scale in distributed systems often means 
> really weird stuff, like, traffic shaping on a front-end tier by coordinating 
> with a data store or back-end tier to identify problem networks or network 
> ranges.  Tubes operates at a simpler level: connections are individual 
> entities, and backpressure is applied uniformly across all of them.  Granted, 
> this is the basic layer you need in place to make addressing backpressure 
> throughout a system work properly, but it's also not an exciting product that 
> solves a super hard or complex problem.

So these two things here are the bit I’m most interested in focusing on. You’re 
totally right: backpressure in stream oriented systems is most effectively 
managed in the form that tubes allows. However, many systems are not 
stream-oriented but instead focus on quanta of work. However, backpressure is 
still a real part of system design for systems like that, and it would be good 
to have a higher-level API for designing backpressure propagation for quantised 
work.

The biggest problem there I think is that there’s not just one way to do that. 
For example, a common mechanism is to provide something like a token bucket at 
the edges of your system whereby you allow for only N work items to be 
outstanding at any one time. The obvious problem with this is that there is no 
one true value for N: it depends on how intensive your work items are on the 
system and what their latency/throughput characteristics look like. That means 
that, for example, Twisted cannot simply choose a value for this for its users.

At this point we’re probably off into the weeds though. More important, I 
think, is to make a system that is amenable to having something like a token 
bucket attached to it and integrated into the stream-based flow control 
mechanisms.

Cory
_______________________________________________
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/

Reply via email to