Hello José,

The use case is finding the frequency of occurrences of clumps of patterns
in a DNA sequence.

For example, given the following sequence:
CGGACTCGACAGATGTGAAGAACGACAATGTGAAGACTCGACACGACAGAGTGAAGAGAAGAGGAAACATTGTAA
and the the numbers l = 50, k = 5 and t = 4

a "clump-finding" algorithm will return all k length patterns that appear
at least t times in all l length sub sequences of a sequence.
The answers for the above sequence are: CGACA and GAAGA.

I have written a solution that begins with essentially two uses of chunk in
a row:

sequence
|> String.to_charlist
|> Enum.chunk(l, 1)
|> Flow.from_enumerable
|> Flow.map(fn e -> Enum.chunk(e, k, 1) end)
...


and it works just fine. If the sequence is coming from a file, I take
advantage of Stream.chunk.

Note, that the sequence can easily be in the thousands, if not millions of
nucleotides long and for that reason I thought that GenStage (Flow) would
be a good tool.

I don't think windows would help for this use case; one has to slide the
window nucleotide by nucleotide and grab, as in the above example, 50
nucleotides at a time.

Having looked at the implementation of chunk in Stream, I can see that it
looks to be a bit complicated.

Thanks for considering this.

Peter





On Wed, Nov 9, 2016 at 2:44 PM, José Valim <[email protected]>
wrote:

> Hi Peter, can you talk a bit more about your use case?
>
> chunk may be very sensitive to ordering so you will be chunking in no
> defined order. Maybe using a window that is based on event count may be a
> better fit semantically?
>
>
>
> *José Valim*
> www.plataformatec.com.br
> Skype: jv.ptec
> Founder and Director of R&D
>
> On Wed, Nov 9, 2016 at 8:13 PM, Peter C. Marks <[email protected]>
> wrote:
>
>> In an application of GenStage that I've been writing, I came across the
>> need for something like Stream.chunk/2 or Stream.chunk/4 but for Flows.
>>
>> Is there a reason why Flow.chunk is not defined?
>>
>> Thanks,
>>
>> Peter
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/elixir-lang-core/7c5e942f-1e55-4f90-bc79-79dc0622d292%
>> 40googlegroups.com
>> <https://groups.google.com/d/msgid/elixir-lang-core/7c5e942f-1e55-4f90-bc79-79dc0622d292%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elixir-lang-core" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/elixir-lang-core/Avea6YFZLRQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elixir-lang-core/CAGnRm4%2BPR4X_oAq4M5o8mA0EZ9Ck32ZEgwhPhtjHCZ
> OWENkC6g%40mail.gmail.com
> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4%2BPR4X_oAq4M5o8mA0EZ9Ck32ZEgwhPhtjHCZOWENkC6g%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Peter C. Marks
@PeterCMarks

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/CA%2BKdhmh-yHQxj0vbFoS9di3tcat-FSarH-46vz4dcXORg0mt3Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to