HI,
I'm working with a flow that downloads data, parses json and adds ids to a
set (dedupe). It's working just fine however when I modify the flow to run
in parallel, I get different results.
Here's my graph:
val graph: RunnableGraph[Future[HashSet[Long]]] =
Source.fromGraph(new MinuteSource(firstMinuteYesterday,
firstMinuteYesterday.plusDays(1)))
.via(dsl(parallelize = 4))
.toMat(Sink.fold(new HashSet[Long]())((accSet, set) => {
accSet ++ set
}))(Keep.right)
val deduped: Set[Long] = Await.result(graph.run(), Duration.Inf)
println(s"seq size is ${deduped.size} in ${new Duration(start, new
DateTime()).toString}")
The dsl looks like
def dsl(parallelize: Int) = Flow.fromGraph(GraphDSL.create() { implicit builder
=>
import GraphDSL.Implicits._
val dispatcher = builder.add(Balance[DateTime](parallelize))
val merger = builder.add(Merge[Set[Long]](parallelize))
for (i <- 0 to parallelize - 1) {
dispatcher.out(i) ~> consumptionFlow.async ~> merger.in(i)
}
FlowShape(dispatcher.in, merger.out)
})
Here are the results for different parallelize values:
// parallelize 1 -> seq size is 48560 in 175
// parallelize 2 -> seq size is 48531 in 117
// parallelize 4 -> seq size is 48481 in 107
The resulting set size varies based on the parallelize number. What's
interesting is the set size values are consistent, across runs. Does this
make sense to anyone? Thanks!
Andrew
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.