On Mon, Apr 21, 2014 at 2:26 PM, Daniel Armak <[email protected]> wrote:

> Hi Patrik,
>
> That can be done, but what bothers me is the potential lack of
> standardization. Every programmer will solve this problem all over again
> for each stream they build. Even if particular Reactive Streams
> implementations offer generic tools, they won't be compatible across
> implementations or languages, which is the whole point of Reactive Streams.
>
> Any processor in a stream that emits new ByteStrings needs to know the
> allowed or preferred min/max size to emit. E.g., an uncompressing processor
> that expands a 1kb input to a 1MB input may need to avoid sending the whole
> 1MB as a single ByteString (or you end up with zip bombs).
>

Yes, uncompressing processor must be able to buffer and send smaller chunks
downstream. If it has a compression ratio of 10:1, and it gets demand from
downstream of 1000 chunks it should request 100 chunks from its upstream.
It can not know exactly, but it can do a fairly good guess, and even adapt
dynamically.

I might over-simplify this. Thoughts, anyone else?

Regards,
Patrik


> The output size to use depends on the processor that comes after it (and
> not just on global per-stream settings). Some consumers can process any
> size input very quickly, so if you have a big ByteString in memory, you
> should send it over to free your own buffer. Others may need to buffer
> their input and this can consume a lot of extra memory - e.g. if the
> pipeline is based on byte arrays and splitting these means making copies.
> (ByteStrings mostly avoid copying, but what about non-Scala implementations
> that you need to work with, things like Netty?) Some processors might
> require exactly sized input chunks (e.g. encryption), and you may want to
> make sure the buffering/chunking is done once (in the producer) and not
> twice (in both producer and consumer) for efficiency.
>
> All of this argues for a way for byte stream producers/consumers to
> communicate directly about their preferences.
>
> Also, even if the programmer is going to specify the correct sizes
> manually for each producer, those producers still have to be configurable
> in this way. This is something I feel would benefit greatly from a
> standardized approach.
>
> Without it, I'm afraid that a lot of the idioms, examples, and library
> functions that will evolve around reactive streams won't be applicable to
> byte streams. Generic buffering/chunking idioms, like collecting up to X
> items, or buffering and batching them, need a 'flatMap'-like transformation
> to apply to byte streams.
>
> I think the way forward is to propose optional 'byte stream' interfaces
> extending the Producer/Consumer ones and see what sticks - but this has to
> wait for Akka Streams to mature. In particular, I feel akka-http will have
> to solve this problem somehow. E.g., if I write a spray server and I have
> an HttpResponsePart producer, how do I know what size ByteStrings to send
> to the consumer that ships these out to the network?
>

> Daniel
>
>
> On Mon, Apr 21, 2014 at 2:50 PM, Patrik Nordwall <
> [email protected]> wrote:
>
>>
>>
>> 20 apr 2014 kl. 00:32 skrev Daniel Armak <[email protected]>:
>>
>> Hi,
>>
>> How would I use reactive streams in general, and/or akka streams in
>> particular, to represent a stream of binary data? I'm thinking of all
>> binary streams that don't have a natural division into chunks of a fixed
>> (small) size; you can divide them into chunks any way you want and all
>> processors in the pipeline will still work. Like downloading a big file
>> over HTTP, decompressing and saving it to disk.
>>
>> The natural representation is a stream of ByteString (or byte[] in Java).
>> But each ByteString can be arbitrarily large.
>>
>>
>> Can it? Wouldn't it be good if a ByteString producer has a chunk size
>> limit as part of its contract? Then a chunk corresponds to 1 element.
>>
>> /Patrik
>>
>> It's no good to tell the producer I'm willing to accept one more element,
>> if I have no idea what size it's going to be. Maybe the producer is reading
>> from a 100MB ByteString it already has in memory, and the easiest thing for
>> it to do (i.e. the easiest way for a programmer to code the producer) is to
>> send all of the data as one element. What I really want is to tell it how
>> many more bytes (or characters, etc) I'm willing to accept, but in the
>> current reactive streams API that would require sending each byte in a
>> separate call to onNext.
>>
>> Some of the implementations might address this scenario, but it seems to
>> me that this will be a common use case and so standardization and
>> interoperability would be of value.
>>
>> I'm sure this has all been thought of. What's the recommended usage
>> pattern?
>>
>> -----
>>
>> Also, some processors do have a natural chunk size (e.g. compression,
>> encryption, network packet transmission). This size will rarely match the
>> size of the incoming chunks (if only because different processors in a
>> pipeline have different chunk sizes). To maintain efficiency it might be
>> desirable for some processors to buffer their output and only forward
>> chunks of discrete sizes.
>>
>> This, too, might benefit from typed declarations of the chunk sizes each
>> processor prefers or requires as input. Otherwise the programmer will have
>> to configure each processor manually to do the right amount of buffering,
>> because the processor won't be automatically aware of the preferences of
>> the next processor in the pipeline. This is a more complex scenario, so I
>> expect it will be left to each implementation to introduce its own patterns.
>>
>> Daniel Armak
>>
>> --
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ:
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>  --
>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>> >>>>>>>>>> Check the FAQ:
>> http://doc.akka.io/docs/akka/current/additional/faq.html
>> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Akka User List" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/akka-user.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> >>>>>>>>>> Read the docs: http://akka.io/docs/
> >>>>>>>>>> Check the FAQ:
> http://doc.akka.io/docs/akka/current/additional/faq.html
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
> ---
> You received this message because you are subscribed to the Google Groups
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/akka-user.
> For more options, visit https://groups.google.com/d/optout.
>



-- 

Patrik Nordwall
Typesafe <http://typesafe.com/> -  Reactive apps on the JVM
Twitter: @patriknw
JOIN US. REGISTER TODAY! <http://www.scaladays.org/>
Scala <http://www.scaladays.org/>
Days <http://www.scaladays.org/>
June 16th-18th, <http://www.scaladays.org/>
Berlin <http://www.scaladays.org/>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to