> On 10 Nov 2016, at 02:43, Glyph Lefkowitz <gl...@twistedmatrix.com> wrote:
>
> And the problem isn't just with Tubes at the infrastructure level; most
> applications are fundamentally not stream processing, and it will be
> idiomatically challenging to express them as such. HTTP connections are
> short-lived and interfaces to comprehending them (i.e. 'json.loads') are not
> themselves stream oriented. Even if you did have a stream-oriented JSON
> parser, expressing what you want from it is hard; you want a data structure
> that you can simultaneously inspect multiple elements from, not a stream of
> "object began" / "string began" / "string ended" / "list began" / "list
> ended" events.
[snip]
>
> More importantly, backpressure at scale in distributed systems often means
> really weird stuff, like, traffic shaping on a front-end tier by coordinating
> with a data store or back-end tier to identify problem networks or network
> ranges. Tubes operates at a simpler level: connections are individual
> entities, and backpressure is applied uniformly across all of them. Granted,
> this is the basic layer you need in place to make addressing backpressure
> throughout a system work properly, but it's also not an exciting product that
> solves a super hard or complex problem.
So these two things here are the bit I’m most interested in focusing on. You’re
totally right: backpressure in stream oriented systems is most effectively
managed in the form that tubes allows. However, many systems are not
stream-oriented but instead focus on quanta of work. However, backpressure is
still a real part of system design for systems like that, and it would be good
to have a higher-level API for designing backpressure propagation for quantised
work.
The biggest problem there I think is that there’s not just one way to do that.
For example, a common mechanism is to provide something like a token bucket at
the edges of your system whereby you allow for only N work items to be
outstanding at any one time. The obvious problem with this is that there is no
one true value for N: it depends on how intensive your work items are on the
system and what their latency/throughput characteristics look like. That means
that, for example, Twisted cannot simply choose a value for this for its users.
At this point we’re probably off into the weeds though. More important, I
think, is to make a system that is amenable to having something like a token
bucket attached to it and integrated into the stream-based flow control
mechanisms.
Cory
_______________________________________________
Async-sig mailing list
Async-sig@python.org
https://mail.python.org/mailman/listinfo/async-sig
Code of Conduct: https://www.python.org/psf/codeofconduct/