Hello José,
Before I post to my blog, I want to make sure that I understand what you
meant when you said:
I believe you don't want to call Flow.chunk/2. Calling Enum.chunk(l, 1)
> before Flow.from_enumerable/2 is the way to go in your case as it
> guarantees *chunks*, and not letters, are spread around on Flow.map/2. If
> instead you called Flow.chunk/2 after Flow.from_enumerable/2, the DNA order
> would be lost by the time you get to Flow.chunk/2. You would effectively
> chunk items in random order. I would possibly only suggest to use
> Stream.chunk/2 instead of Enum.chunk/2 (so you don't need to build all
> chunks upfront).
My original code:
sequence
|> String.to_charlist
|> Enum.chunk(l, 1) #CHUNK A
|> Flow.from_enumerable
|> Flow.partition
|> Flow.map(fn e -> Enum.chunk(e, k, 1) end) #CHUNK B
|> Flow.map(
fn e ->
Enum.reduce(e, %{},
fn w, acc ->
Map.update(acc, w, 1, & &1 + 1)
end)
end)
|> Flow.flat_map(
fn e ->
Enum.reject(e, fn({_, n}) -> n < t end)
end)
|> Flow.map(fn({seq, _}) -> seq end)
|> Enum.to_list
Do you mean that if Flow.chunk was implemented that it would behave
differently from Enum or Stream.chunk? Or, are you only talking about where
one places the call to chunk? The first chunk (CHUNK A) typically creates
chunks that are 50 - 500 bases long. The second chunk (CHUNK B) creates
chunks from these chunks that are usually 5 to 9 bases long. I'm not sure I
see how it's possible for single base letters to arise unless the chunk
size was set to a length of one.
Thanks very much,
Peter
On Fri, Nov 11, 2016 at 10:22 AM, José Valim <
[email protected]> wrote:
> Glad to help! If you have any questions, feel free to ask.
>
>
>
> *José Valim*
> www.plataformatec.com.br
> Skype: jv.ptec
> Founder and Director of R&D
>
> On Fri, Nov 11, 2016 at 4:15 PM, Peter C. Marks <[email protected]>
> wrote:
>
>> Thank you José for your critique of my code and your suggested rewrite.
>> Your code works great! (I just needed to add the parameter t to the call to
>> find_sequences) I will need to spend a little more time understanding why
>> you suggested those changes. I plan on blogging about this soon.
>>
>> Thanks again,
>>
>> Peter
>>
>> On Wed, Nov 9, 2016 at 6:57 PM, José Valim <[email protected].
>> br> wrote:
>>
>>> Thanks Peter!
>>>
>>> I believe you don't want to call Flow.chunk/2. Calling Enum.chunk(l, 1)
>>> before Flow.from_enumerable/2 is the way to go in your case as it
>>> guarantees *chunks*, and not letters, are spread around on Flow.map/2. If
>>> instead you called Flow.chunk/2 after Flow.from_enumerable/2, the DNA order
>>> would be lost by the time you get to Flow.chunk/2. You would effectively
>>> chunk items in random order. I would possibly only suggest to use
>>> Stream.chunk/2 instead of Enum.chunk/2 (so you don't need to build all
>>> chunks upfront).
>>>
>>> On the other hand, if you are referring to the inner chunk in
>>> Flow.map/2, it also wouldn't yield the correct results, because you would
>>> be chunking groups of "e" and not a single "e" like now.
>>>
>>> Finally, it doesn't seem you need partitioning at all as well, since you
>>> are not reducing over any state (I may have mislead you on a previous
>>> reply, sorry). My suggestion:
>>>
>>> sequence
>>> |> String.to_charlist
>>> |> Stream.chunk(l, 1)
>>> |> Flow.from_enumerable
>>> |> Flow.flat_map(&find_sequences(&1, k))
>>> |> Enum.to_list
>>>
>>> def find_sequences(e, k) do
>>> e
>>> |> Enum.chunk(k, 1)
>>> |> Enum.reduce(%{}, fn w, acc ->
>>> Map.update(acc, w, 1, & &1 + 1)
>>> end)
>>> |> Enum.reject(fn({_, n}) -> n < t end)
>>> |> Enum.map(fn({seq, _}) -> seq end)
>>> end
>>>
>>>
>>> PS: I haven't tested it.
>>>
>>>
>>>
>>>
>>> *José Valim*
>>> www.plataformatec.com.br
>>> Skype: jv.ptec
>>> Founder and Director of R&D
>>>
>>> On Wed, Nov 9, 2016 at 11:18 PM, Peter C. Marks <[email protected]
>>> > wrote:
>>>
>>>> Yes, I do use partition. The full flow is:
>>>>
>>>> sequence
>>>> |> String.to_charlist
>>>> |> Enum.chunk(l, 1)
>>>> |> Flow.from_enumerable
>>>> |> Flow.partition
>>>> |> Flow.map(fn e -> Enum.chunk(e, k, 1) end)
>>>> |> Flow.map(
>>>> fn e ->
>>>> Enum.reduce(e, %{},
>>>> fn w, acc ->
>>>> Map.update(acc, w, 1, & &1 + 1)
>>>> end)
>>>> end)
>>>> |> Flow.flat_map(
>>>> fn e ->
>>>> Enum.reject(e, fn({_, n}) -> n < t end)
>>>> end)
>>>> |> Flow.map(fn({seq, _}) -> seq end)
>>>> |> Enum.to_list
>>>>
>>>>
>>>>
>>>> On Wed, Nov 9, 2016 at 4:43 PM, José Valim <
>>>> [email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>>> sequence
>>>>>> |> String.to_charlist
>>>>>> |> Enum.chunk(l, 1)
>>>>>> |> Flow.from_enumerable
>>>>>> |> Flow.map(fn e -> Enum.chunk(e, k, 1) end)
>>>>>>
>>>>>
>>>>> Do you call partition at some point in your flow? Otherwise it won't
>>>>> exploit parallelism if you have only one source. Also, if you need to
>>>>> chunk
>>>>> before you partition, you can chunk before calling from_enumerable:
>>>>>
>>>>> sequence
>>>>> |> String.to_charlist
>>>>> |> Stream.chunk(e, k, 1)
>>>>> |> Flow.from_enumerable
>>>>> |> ...
>>>>>
>>>>>
>>>>> I think it will be easy to add chunking to Flow because we can
>>>>> delegate to Stream but I just want to make sure I fully understand your
>>>>> use
>>>>> case and where parallelism is being introduced.
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "elixir-lang-core" group.
>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>>>> pic/elixir-lang-core/Avea6YFZLRQ/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Kp
>>>>> D-tf5p5sAS2nwZusd0reKdti-zL8-wzf%3DHjD8p%3D5qQ%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KpD-tf5p5sAS2nwZusd0reKdti-zL8-wzf%3DHjD8p%3D5qQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Peter C. Marks
>>>> @PeterCMarks
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elixir-lang-core" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/elixir-lang-core/CA%2BKdhmg2EZt6SLgE8g_oH%2B-Jjpx075BmPO
>>>> vfePSvjQZsitXVVg%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/elixir-lang-core/CA%2BKdhmg2EZt6SLgE8g_oH%2B-Jjpx075BmPOvfePSvjQZsitXVVg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "elixir-lang-core" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>> pic/elixir-lang-core/Avea6YFZLRQ/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/elixir-lang-core/CAGnRm4Kv-5pBcqEPLi2ejhfze0yKTkh8FTUieQ
>>> -hon3HB6DsoQ%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Kv-5pBcqEPLi2ejhfze0yKTkh8FTUieQ-hon3HB6DsoQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Peter C. Marks
>> @PeterCMarks
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elixir-lang-core" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit https://groups.google.com/d/ms
>> gid/elixir-lang-core/CA%2BKdhmj8ejinTB502T5nJWqrfpkDdK%
>> 3DN0Ds0LRg78EmOgeHC-w%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elixir-lang-core/CA%2BKdhmj8ejinTB502T5nJWqrfpkDdK%3DN0Ds0LRg78EmOgeHC-w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "elixir-lang-core" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/
> topic/elixir-lang-core/Avea6YFZLRQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/elixir-lang-core/CAGnRm4LHDRJpNpmHKNnw4TRtfZy-
> 0uk5%2B6pNxB-DghzD_gyaSw%40mail.gmail.com
> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4LHDRJpNpmHKNnw4TRtfZy-0uk5%2B6pNxB-DghzD_gyaSw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>
--
Peter C. Marks
@PeterCMarks
--
You received this message because you are subscribed to the Google Groups
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elixir-lang-core/CA%2BKdhmjH%3Deckizk%3DSzerPFrQm9vKGMq48_JT6n-9vUPb5Ugm-g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.