Hi Patrik,

That can be done, but what bothers me is the potential lack of
standardization. Every programmer will solve this problem all over again
for each stream they build. Even if particular Reactive Streams
implementations offer generic tools, they won't be compatible across
implementations or languages, which is the whole point of Reactive Streams.

Any processor in a stream that emits new ByteStrings needs to know the
allowed or preferred min/max size to emit. E.g., an uncompressing processor
that expands a 1kb input to a 1MB input may need to avoid sending the whole
1MB as a single ByteString (or you end up with zip bombs).

The output size to use depends on the processor that comes after it (and
not just on global per-stream settings). Some consumers can process any
size input very quickly, so if you have a big ByteString in memory, you
should send it over to free your own buffer. Others may need to buffer
their input and this can consume a lot of extra memory - e.g. if the
pipeline is based on byte arrays and splitting these means making copies.
(ByteStrings mostly avoid copying, but what about non-Scala implementations
that you need to work with, things like Netty?) Some processors might
require exactly sized input chunks (e.g. encryption), and you may want to
make sure the buffering/chunking is done once (in the producer) and not
twice (in both producer and consumer) for efficiency.

All of this argues for a way for byte stream producers/consumers to
communicate directly about their preferences.

Also, even if the programmer is going to specify the correct sizes manually
for each producer, those producers still have to be configurable in this
way. This is something I feel would benefit greatly from a standardized
approach.

Without it, I'm afraid that a lot of the idioms, examples, and library
functions that will evolve around reactive streams won't be applicable to
byte streams. Generic buffering/chunking idioms, like collecting up to X
items, or buffering and batching them, need a 'flatMap'-like transformation
to apply to byte streams.

I think the way forward is to propose optional 'byte stream' interfaces
extending the Producer/Consumer ones and see what sticks - but this has to
wait for Akka Streams to mature. In particular, I feel akka-http will have
to solve this problem somehow. E.g., if I write a spray server and I have
an HttpResponsePart producer, how do I know what size ByteStrings to send
to the consumer that ships these out to the network?

Daniel


On Mon, Apr 21, 2014 at 2:50 PM, Patrik Nordwall
<[email protected]>wrote:

>
>
> 20 apr 2014 kl. 00:32 skrev Daniel Armak <[email protected]>:
>
> Hi,
>
> How would I use reactive streams in general, and/or akka streams in
> particular, to represent a stream of binary data? I'm thinking of all
> binary streams that don't have a natural division into chunks of a fixed
> (small) size; you can divide them into chunks any way you want and all
> processors in the pipeline will still work. Like downloading a big file
> over HTTP, decompressing and saving it to disk.
>
> The natural representation is a stream of ByteString (or byte[] in Java).
> But each ByteString can be arbitrarily large.
>
>
> Can it? Wouldn't it be good if a ByteString producer has a chunk size
> limit as part of its contract? Then a chunk corresponds to 1 element.
>
> /Patrik
>
> It's no good to tell the producer I'm willing to accept one more element,
> if I have no idea what size it's going to be. Maybe the producer is reading
> from a 100MB ByteString it already has in memory, and the easiest thing for
> it to do (i.e. the easiest way for a programmer to code the producer) is to
> send all of the data as one element. What I really want is to tell it how
> many more bytes (or characters, etc) I'm willing to accept, but in the
> current reactive streams API that would require sending each byte in a
> separate call to onNext.
>
> Some of the implementations might address this scenario, but it seems to
> me that this will be a common use case and so standardization and
> interoperability would be of value.
>
> I'm sure this has all been thought of. What's the recommended usage
> pattern?
>
> -----
>
> Also, some processors do have a natural chunk size (e.g. compression,
> encryption, network packet transmission). This size will rarely match the
> size of the incoming chunks (if only because different processors in a
> pipeline have different chunk sizes). To maintain efficiency it might be
> desirable for some processors to buffer their output and only forward
> chunks of discrete sizes.
>
> This, too, might benefit from typed declarations of the chunk sizes each
> processor prefers or requires as input. Otherwise the programmer will have
> to configure each processor manually to do the right amount of buffering,
> because the processor won't be automatically aware of the preferences of
> the next processor in the pipeline. This is a more complex scenario, so I
> expect it will be left to each implementation to introduce its own patterns.
>
> Daniel Armak
>
> --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to