Hi Tim,

sorry again for the delay. You raise a valid question that is not answered in 
the documentation, perhaps it should be.

The handling of substreams has been a concern in our design from day 1, the 
most prominent problem being that users may decide to not consume all of them 
(using combinators like filter() or take() that drop elements containing 
substream Publishers to the floor). This has an associated cost because the 
substreams bind resources that need to be released explicitly, plus it is very 
easy to create deadlocks by never consuming the data that would need to be fed 
into these streams. Another pitfall for users is that the substreams are 
coupled and must therefore be drained in a fashion that is compatible with 
their source, in particular groupBy would deadlock easily when merging the 
resulting streams with a breadth that is exceeded at runtime.

The idea for the change came from the Gearpump team, they mentioned that the 
signature of groupBy in big data analytics does not usually return a stream of 
streams, it returns a restricted substream abstraction that can be handled 
specially by the platform. The issue for these engines is that they are 
designed for flow graphs with a stable (i.e. non-dynamic) stream layout, the 
network of combinators must be known before elements flow through it. This is 
exactly what the new representation of splitWhen/splitAfter/groupBy achieves, 
the materializer knows up-front what shall be materialized for the substreams; 
this allows a wider range of optimizations to be applied.

Looking at the current signatures we see that they prevent users from 
expressing most of the dangerous (deadlocking) stream layouts: groupBy will 
limit the number of open streams and configure the merge such that it will not 
deadlock, the split combinators make it impossible to write code that tries to 
consume the substreams out-of-order. Dynamic processing can still be expressed 
through stateful combinators like fold and scan that configure the contained 
computation according to the first processed element—the normal use of split is 
to factor out boundary detection from the rest of the stream transformation.

We left an escape hatch for those use-cases which require true substreams, but 
we intentionally do not document it as such: using prefixAndTail(0) you can 
lift a stream into a single element that contains a sub-source that you can 
transform to your heart’s content—with all the pitfalls.

Looking ahead I see several more interesting applications of the SubFlow 
infrastructure: we might offer a retry mechanism for pieces of a Flow, or we 
might parallelize a piece of a Flow, or we might dynamically insert processing 
steps depending on the first element of a Flow.

I hope this explains the rationale and gives you some ideas on how to exploit 
the new features, and if you see use-cases that are currently not covered then 
please let us know!

Regards,

Roland

> 4 jan 2016 kl. 06:50 skrev Tim Harper <[email protected]>:
> 
> A big reason why groupBy, splitWhen (and other split* friends) were useful 
> was the ability to apply different behavior to the bifurcated streams. With 
> the recent (and, what seems to been a bit sudden) change to SubFlow, I'm 
> having a difficulty understanding the value of these methods.
> 
> It seems like the only thing it offers you, now, is to provide a grouping for 
> parallelizing the streams, which I'll confess to be somewhat useful for 
> groupBy, but for the split* family of methods there seems to be little 
> tenable benefit.
> 
> I scanned the gitter backlog for more backstory on why this change happened. 
> Being acquainted with Roland Kuhn's work, I entirely trust that it was made 
> thoughtfully and with good reason. Is there anything I can read to understand 
> more of the motivation for the change? Also, are there any examples of 
> circumstances in which split{When,After} are useful, now that bifurcated 
> streams have homogenous processing behavior?
> 
> Thanks!
> 
> Tim
> 
> -- 
> >>>>>>>>>> Read the docs: http://akka.io/docs/ <http://akka.io/docs/>
> >>>>>>>>>> Check the FAQ: 
> >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html 
> >>>>>>>>>> <http://doc.akka.io/docs/akka/current/additional/faq.html>
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user 
> >>>>>>>>>> <https://groups.google.com/group/akka-user>
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at https://groups.google.com/group/akka-user 
> <https://groups.google.com/group/akka-user>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.



Dr. Roland Kuhn
Akka Tech Lead
Typesafe <http://typesafe.com/> – Reactive apps on the JVM.
twitter: @rolandkuhn
 <http://twitter.com/#!/rolandkuhn>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to