Hi again,
I tried the "Poison Pill" solution but it looks kind of ugly so maybe I
understood it wrong.
First, some background on what I am doing which seems close to Jan project.
I am crawling some web pages extracting data and following url when
encoutering pagination.
Here is a simplified version of the graph I made:
broadcast ~> sink
source ~> merge ~> fetchUrl ~> broadcast ~> extractNextUrl
merge.preferred <~ extractNextUrl
- `source` is a source of initial urls.
- `fetchUrl` is responsible of fetching the received url and send the url
content.
- `extractNextUrl` is responsible of extracting a possible 'next-page' url
when found in the downloaded content.
The graph works but the problem here is that the stream never completes
which is expected as the merge stage is pulling from himself.
Using the eagerClose option on the Merge operation is not applicable here
as it would end up missing all url from the feedback loop upon source
completion.
I tried using the Poison Pill using an Either[T, PoisonPill] as the data
transiting in the graph but It's kind of messy and would force me to handle
the PoisonPill case at each stage. Moreover one of the stage I am using is
a Http.superPool[T]() which only accepts a tupple of (HttpRequest, T) so
I'm stuck.
Now I'm thinking of creating a custom async stage that will wrap this logic
using a Future based HTTP request, but I would prefer sticking to a stream
based approach. If someone has some ideas on how to achieve that, I will be
very happy.
Best Regards,
JP.
On Thursday, November 5, 2015 at 1:48:31 PM UTC+1, Jean-Pierre Thomasset
wrote:
>
> Hi Jan,
>
> Sorry for the necrobump but I ended up in a similar situation and I was
> wondering what was your final implementation to circumvent this deadlock on
> completion ? Did you go for the "Poison Pill" approach suggested on github ?
>
> Best regards,
> Jean-Pierre.
>
> On Monday, May 18, 2015 at 4:46:55 PM UTC+2, Jan Liße wrote:
>>
>> Hallo Endre,
>>
>> thanks for your quick response. I have filed an issue here:
>> https://github.com/akka/akka/issues/17507
>>
>> Best regards,
>> Jan
>>
>> Am Montag, 18. Mai 2015 12:42:05 UTC+2 schrieb drewhk:
>>>
>>>
>>>
>>> On Mon, May 18, 2015 at 12:28 PM, Jan Liße <[email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> i'm currently building a scraper system on top of Akka Streams. I have
>>>> written a Flow that is able to follow paginated sites and scrape them in a
>>>> loop.
>>>> For this i use a feedback merge.
>>>>
>>>> My code: <https://gist.github.com/janlisse/f2672bf8bbee009ef009>
>>>>
>>>> <script src="https://gist.github.com/janlisse/f2672bf8bbee009ef009.js
>>>> "></script>
>>>>
>>>> scrapePaginated takes a function that decides if there are further
>>>> pages to scrape. If there are, it returns as part of the response tuple a
>>>> Some() with the next url.
>>>> And of course a None for the last page.
>>>> The iteration and the feedback loop works and all pages are scraped
>>>> properly. But even when all URL's are processed the stream never
>>>> completes.
>>>> OnComplete never gets invoked.
>>>> Is this an expected behaviour? Or is there an error in my
>>>> scrapePaginated method? I read the doc's chapter on graph deadlocks and
>>>> liveness issues and finally added a buffer step with OverflowStrategy.Fail
>>>> to the feedback loop but to no avail.
>>>> If it helps to clarify the problem i can provide a simple Spec that
>>>> reproduces the issue.
>>>>
>>>
>>> I might be wrong here, but it seems like:
>>>
>>> - merge does not stop, because the feedback loop does not stop
>>> - the feedback loop does not stop, because unzip does not stop
>>> - unzip does not stop, because merge does not stop.
>>>
>>> If I am correct, then this is an interesting twist on the deadlock
>>> scenarios. This does not deadlock on elements/backpressure but it deadlocks
>>> on completion signal.
>>>
>>> Please open a ticket for discussion, I am not sure how to solve this in
>>> a generic fashion, but the collection of deadlock scenarios is growing and
>>> we need to provide an answer eventually and I want this one documented in a
>>> ticket, too.
>>>
>>> -Endre
>>>
>>>
>>>>
>>>> Thanks in advance for any help!
>>>>
>>>> Jan
>>>>
>>>>
>>>> --
>>>> >>>>>>>>>> Read the docs: http://akka.io/docs/
>>>> >>>>>>>>>> Check the FAQ:
>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>> >>>>>>>>>> Search the archives:
>>>> https://groups.google.com/group/akka-user
>>>> ---
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Akka User List" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/akka-user.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.