Re: [nodejs] Re: New Streams confusion

Marco Rogers Mon, 25 Mar 2013 14:36:25 -0700

You're absolutely right. I totally forgot about that aspect of the
semantics. It changes my metaphor a little. So stream.read() isn't a blind
blocking call that can get you into lots of trouble, which I think my
previous message implied. Instead it's a peek into the underlying semantics
of how node is managing data off the fd. As I said, managing that data off
the fd now happens under the covers in node and is managed consistently.
Calling stream.read() gives you a hook into that consistent process and
should come with a few expectations.


1) If there is data returned from read(), then the stream is active and you
should keep calling read() until it returns null.
2) Calling read() is a signal to node that if backpressure is being
exerted, e.g. you're in paused state, that should stop and you're ready to
resume pulling off the fd.
3) If read returns null, you can assume you've left the stream in an active
state and node wants to give you more data when it becomes available. The
way node will signal that to you is the next "readable" event. So you
should probably listen for that.
4) By calling read(), you are explicitly pulling out of the in memory
buffer and thus explicitly affecting the high/low watermark calculations.
This is where things still get a little murky for me. I'm not sure what the
implications are here in terms of how application code should react.

Again, please add to this. I like talking to people about streams and may
do more talks soon. So this is helpful to frame my understanding and help
me convey it to other people.

:Marco



On Mon, Mar 25, 2013 at 2:23 PM, Dean Landolt <[email protected]> wrote:

>
>
>
> On Mon, Mar 25, 2013 at 5:01 PM, Marco Rogers <[email protected]>wrote:
>
>> I haven't experimented with streams2 as much as I should have. But I
>> remember talking to Isaac about it early on. The way I think about it is
>> still the same.
>>
>> It feels like the semantics of how node streams produce data is much more
>> consistent and predictable now. Node still starts by reading data off the
>> fd fast. Instead of pushing it through to application code immediately (the
>> old way), it starts buffering that in memory so it's ready for you to read.
>> If you're ready to read that, then you can do it in a pull fashion by
>> called stream.read(). The buffering is now controlled by the high and low
>> water marks. This is a mechanism for making sure we don't fill up memory.
>> We can more efficient with slow and fast consumers/producers. If the in
>> memory boundaries are reached, then node will start exerting back pressure.
>> You don't have to ask for this or manage it. It's consistent and controlled
>> by the high/low water marks.
>>
>> So this makes sense to me. It's really the semantics of the api around
>> this that we need to play around with. So "readable" is an event that says
>> there is data to read. But even if you missed the readable event, or you're
>> early and it hasn't been fired, there may still be data to read. The
>> stream.read() method is decoupled from that convenience event. If you call
>> stream.read() it'll block until there's data or until the stream closes for
>> some other reason.
>>
>
> I like this is a general metaphor, but it's important to note read will
> return `null` if there's no data, not block indefinitely. It's up to you to
> call it again later when it may have data. The best way to do that is to
> listen for the `readable` event, preferably with a `once` handler.
>
>
>> The convenience of "readable" is that it gives you a framework for
>> getting around the blocking nature of read(). Because blocking when there's
>> nothing to do is bad. "Readable" lets us still consume data in a way that's
>> semantically more understandable than push "data" events, but also still
>> efficient in a way that pure blocking calls is not. NOT using "readable" is
>> still viable, but it puts you in a situation where you don't know how to be
>> most efficient with reads.
>>
>> Does that make sense? I think it fits with other answers here as well.
>> I'm sure folks will correct me where I'm off base.
>>
>> :Marco
>>
>>
>>
>> On Mon, Mar 25, 2013 at 1:42 PM, Michael Jackson <[email protected]>wrote:
>>
>>> readable is emitted after you've actually started reading.
>>>>
>>>
>>> That's not what it says in the 
>>> docs<http://nodejs.org/api/stream.html#stream_event_readable>
>>> .
>>>
>>> ###
>>> Event: 'readable'
>>> When there is data ready to be consumed, this event will fire.
>>> When this event emits, call the read() method to consume the data.
>>>  ###
>>>
>>> Calling stream.read *before* you get the "readable" event is totally
>>> counterintuitive.
>>>
>>> --
>>> Michael Jackson
>>> @mjackson
>>>
>>> In your example, you dont ever `response.read()`, so no readable event
>>>> is ever emitted.
>>>>
>>>> As you said, streams start in paused state and ready to be read.
>>>>
>>>> On 03/25/13 22:28, Michael Jackson wrote:
>>>> > Is it correct to assume that a Readable won't emit the "readable"
>>>> event
>>>> > until you're registered for it?
>>>> >
>>>> > Reading through the streams2 docs, I was under the impression that all
>>>> > streams start out paused and don't start emitting data until you add
>>>> > either a "data" (for old streams) or a "readable" listener. For new
>>>> > streams, this should mean that they don't emit "readable" until at
>>>> least
>>>> > one listener is registered. Otherwise we still need to do some
>>>> buffering
>>>> > in order to capture all the data.
>>>> >
>>>> > For example, this code misses the readable event on node 0.10:
>>>> >
>>>> >     var http = require('http');
>>>> >
>>>> >     http.get('http://www.google.com', function (response) {
>>>> >       console.log('got response with status ' + response.statusCode);
>>>> >
>>>> >       setTimeout(function () {
>>>> >         response.on('readable', function () {
>>>> >           console.log('readable');
>>>> >         });
>>>> >
>>>> >         response.on('end', function () {
>>>> >           console.log('end');
>>>> >         });
>>>> >       }, 5);
>>>> >     });
>>>> >
>>>> > Here's my shell session:
>>>> >
>>>> > $ node -v
>>>> > v0.10.0
>>>> > $ node http-test.js
>>>> > got response with status 200
>>>> > $
>>>> >
>>>> > Is this the correct behavior?
>>>> >
>>>> > --
>>>> > Michael Jackson
>>>> > @mjackson
>>>> >
>>>> >
>>>> > On Thu, Mar 21, 2013 at 4:27 PM, Isaac Schlueter <[email protected]
>>>> > <mailto:[email protected]>> wrote:
>>>> >
>>>> >     re old-mode
>>>> >
>>>> >     Yes, that's fine.  If you just want to get all the data asap, use
>>>> >     on('data', handler).  It'll work great, and it's still very fast.
>>>> >     pause()/resume(), the whole bit.  (The difference is that it won't
>>>> >     emit data until you're listening, and pause() will *actually*
>>>> pause.)
>>>> >
>>>> >
>>>> >     Re read(cb)
>>>> >
>>>> >     It's problematic for reasons that I've discussed all of the places
>>>> >     where it's been brought up.  That horse is dead, let's stop
>>>> beating
>>>> >     it.  (There were a few other proposals as well, btw.  Reducibles
>>>> and
>>>> >     some other monadic approaches come to mind.)
>>>> >
>>>> >
>>>> >     Re pipe() vs looping around read() vs custom Writable vs
>>>> on('data')
>>>> >
>>>> >     Whatever works for your case is fine.  It's flexible on purpose,
>>>> and
>>>> >     allows more types of consumption than streams1, and creating
>>>> custom
>>>> >     writables is easier than it was in streams1.
>>>> >
>>>> >     If you find something that the API can't do for you, or find
>>>> yourself
>>>> >     doing a lot of backflips or overriding a lot of methods to get
>>>> your
>>>> >     stuff working, then let's chat about it in a github issue.  You
>>>> might
>>>> >     be missing something, or you might have found a genuine
>>>> shortcoming in
>>>> >     the API.
>>>> >
>>>> >
>>>> >
>>>> >     On Thu, Mar 21, 2013 at 2:01 PM, Sigurgeir Jonsson
>>>> >     <[email protected] <mailto:[email protected]
>>>> >>
>>>> >     wrote:
>>>> >     > Thanks for all the answers. I almost forgot to look back at this
>>>> >     thread as
>>>> >     > the custom writeStreams have exceeded the high expectation I had
>>>> >     already for
>>>> >     > Streams 2.
>>>> >     > For me, the reference manual was a little confusing, as there
>>>> are
>>>> >     complete
>>>> >     > examples on using the read method, no mention of  "reading"
>>>> through a
>>>> >     > writeStream endpoint.
>>>> >     >
>>>> >     > Marco, I agree that that read has more detailed control of
>>>> minimum
>>>> >     incoming
>>>> >     > content.  However I wonder if it would be more efficient to
>>>> default
>>>> >     > pipe.chunkSize to a "lowWatermark" of the receiver (if defined).
>>>> >     This
>>>> >     > lowWatermark could be adjusted dynamically and the callback in
>>>> the
>>>> >     writable
>>>> >     > should keep sequence of events under control?
>>>> >     >
>>>> >     > Anyway, thanks Node team, I'm very impressed!
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     >
>>>> >     > On Wednesday, March 20, 2013 4:45:32 AM UTC-4, Marco Rogers
>>>> wrote:
>>>> >     >>
>>>> >     >> @Nathan's response is right. Creating a writable stream is
>>>> >     preferable in
>>>> >     >> most cases. But I wanted to add a little context to that. If
>>>> >     you're dealing
>>>> >     >> with a base readable stream, it's just pushing chunks of data
>>>> at
>>>> >     you off the
>>>> >     >> wire. Your first task is to collect those chunks into
>>>> meaningful
>>>> >     data. So
>>>> >     >> IMO the reason creating a writable stream is preferable is
>>>> because it
>>>> >     >> prompts you not just read off the stream, but to create
>>>> semantics
>>>> >     around
>>>> >     >> what the new stream is supposed to be. The api reflects this
>>>> >     opinion and
>>>> >     >> that's why creating writable streams feels like the more
>>>> natural
>>>> >     way, and
>>>> >     >> the ugliness of dealing with read() is wrapped up in the pipe()
>>>> >     method. It
>>>> >     >> was kind of designed that way.
>>>> >     >>
>>>> >     >> But the read() api was also designed for a use case. It's meant
>>>> >     to handle
>>>> >     >> low/high water marks effectively, as well as enable more
>>>> >     optimized special
>>>> >     >> parsing by reading off specific lengths of chunks. These were
>>>> >     things that
>>>> >     >> people kept needing, but the old api didn't support well. If
>>>> you were
>>>> >     >> writing a library for a special parser, you might write a
>>>> custom
>>>> >     Writable
>>>> >     >> stream and inside it you would be using the read(n) api to
>>>> >     control *how* you
>>>> >     >> read data off the socket. I hope that makes sense.
>>>> >     >>
>>>> >     >> :Marco
>>>> >     >>
>>>> >     >> On Monday, March 18, 2013 11:06:48 AM UTC-7, Sigurgeir Jonsson
>>>> wrote:
>>>> >     >>>
>>>> >     >>> The new streams have excellent support for high/low
>>>> watermarks and
>>>> >     >>> auto-pausing/resuming, but the documentation confuses me a
>>>> little...
>>>> >     >>> particularly the read method.
>>>> >     >>>
>>>> >     >>> When I read the new docs for the first time I was under the
>>>> >     impression
>>>> >     >>> that the optimal way to become a user of a stream is to write
>>>> >     loops around
>>>> >     >>> the read functio.  However in practice I find myself simply
>>>> >     writing custom
>>>> >     >>> writeStreams and use the callback to control upstream pressure
>>>> >     (in addition
>>>> >     >>> to source Watermarks if needed).   Here is an example where I
>>>> >     move the
>>>> >     >>> output to a queue that executes a custom function in parallel
>>>> (i.e.
>>>> >     >>> uploading to a database)
>>>> https://gist.github.com/ZJONSSON/5189249
>>>> >     >>>
>>>> >     >>> Are there any benefits to using the read method directly on a
>>>> >     stream vs.
>>>> >     >>> piping to a custom Writable stream?
>>>> >     >
>>>> >     > --
>>>> >     > --
>>>> >     > Job Board: http://jobs.nodejs.org/
>>>> >     > Posting guidelines:
>>>> >     >
>>>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>>> >     > You received this message because you are subscribed to the
>>>> Google
>>>> >     > Groups "nodejs" group.
>>>> >     > To post to this group, send email to [email protected]
>>>> >     <mailto:[email protected]>
>>>> >     > To unsubscribe from this group, send email to
>>>> >     > [email protected]
>>>> >     <mailto:nodejs%[email protected]>
>>>> >     > For more options, visit this group at
>>>> >     > http://groups.google.com/group/nodejs?hl=en?hl=en
>>>> >     >
>>>> >     > ---
>>>> >     > You received this message because you are subscribed to the
>>>> Google
>>>> >     Groups
>>>> >     > "nodejs" group.
>>>> >     > To unsubscribe from this group and stop receiving emails from
>>>> it,
>>>> >     send an
>>>> >     > email to [email protected]
>>>> >     <mailto:nodejs%[email protected]>.
>>>> >     > For more options, visit
>>>> https://groups.google.com/groups/opt_out.
>>>> >     >
>>>> >     >
>>>> >
>>>> >     --
>>>> >     --
>>>> >     Job Board: http://jobs.nodejs.org/
>>>> >     Posting guidelines:
>>>> >
>>>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>>> >     You received this message because you are subscribed to the Google
>>>> >     Groups "nodejs" group.
>>>> >     To post to this group, send email to [email protected]
>>>> >     <mailto:[email protected]>
>>>> >     To unsubscribe from this group, send email to
>>>> >     [email protected]
>>>> >     <mailto:nodejs%[email protected]>
>>>> >     For more options, visit this group at
>>>> >     http://groups.google.com/group/nodejs?hl=en?hl=en
>>>> >
>>>> >     ---
>>>> >     You received this message because you are subscribed to the Google
>>>> >     Groups "nodejs" group.
>>>> >     To unsubscribe from this group and stop receiving emails from it,
>>>> >     send an email to [email protected]
>>>> >     <mailto:nodejs%[email protected]>.
>>>> >     For more options, visit https://groups.google.com/groups/opt_out.
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > --
>>>> > Job Board: http://jobs.nodejs.org/
>>>> > Posting guidelines:
>>>> > https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>>> > You received this message because you are subscribed to the Google
>>>> > Groups "nodejs" group.
>>>> > To post to this group, send email to [email protected]
>>>> > To unsubscribe from this group, send email to
>>>> > [email protected]
>>>> > For more options, visit this group at
>>>> > http://groups.google.com/group/nodejs?hl=en?hl=en
>>>> >
>>>> > ---
>>>> > You received this message because you are subscribed to the Google
>>>> > Groups "nodejs" group.
>>>> > To unsubscribe from this group and stop receiving emails from it, send
>>>> > an email to [email protected].
>>>> > For more options, visit https://groups.google.com/groups/opt_out.
>>>> >
>>>> >
>>>>
>>>>
>>>  --
>>> --
>>> Job Board: http://jobs.nodejs.org/
>>> Posting guidelines:
>>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>> You received this message because you are subscribed to the Google
>>> Groups "nodejs" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>>
>>> ---
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "nodejs" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/nodejs/8VGu32aczR0/unsubscribe?hl=en.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>>
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>>
>>
>>
>>
>> --
>> Marco Rogers
>> [email protected] | https://twitter.com/polotek
>>
>> Life is ten percent what happens to you and ninety percent how you
>> respond to it.
>> - Lou Holtz
>>
>> --
>> --
>> Job Board: http://jobs.nodejs.org/
>> Posting guidelines:
>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>> You received this message because you are subscribed to the Google
>> Groups "nodejs" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/nodejs?hl=en?hl=en
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "nodejs" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>  --
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "nodejs" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/nodejs/8VGu32aczR0/unsubscribe?hl=en.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>



-- 
Marco Rogers
[email protected] | https://twitter.com/polotek

Life is ten percent what happens to you and ninety percent how you respond
to it.
- Lou Holtz

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [nodejs] Re: New Streams confusion

Reply via email to