Thank you José for your critique of my code and your suggested rewrite. Your code works great! (I just needed to add the parameter t to the call to find_sequences) I will need to spend a little more time understanding why you suggested those changes. I plan on blogging about this soon.
Thanks again, Peter On Wed, Nov 9, 2016 at 6:57 PM, José Valim <[email protected]> wrote: > Thanks Peter! > > I believe you don't want to call Flow.chunk/2. Calling Enum.chunk(l, 1) > before Flow.from_enumerable/2 is the way to go in your case as it > guarantees *chunks*, and not letters, are spread around on Flow.map/2. If > instead you called Flow.chunk/2 after Flow.from_enumerable/2, the DNA order > would be lost by the time you get to Flow.chunk/2. You would effectively > chunk items in random order. I would possibly only suggest to use > Stream.chunk/2 instead of Enum.chunk/2 (so you don't need to build all > chunks upfront). > > On the other hand, if you are referring to the inner chunk in Flow.map/2, > it also wouldn't yield the correct results, because you would be chunking > groups of "e" and not a single "e" like now. > > Finally, it doesn't seem you need partitioning at all as well, since you > are not reducing over any state (I may have mislead you on a previous > reply, sorry). My suggestion: > > sequence > |> String.to_charlist > |> Stream.chunk(l, 1) > |> Flow.from_enumerable > |> Flow.flat_map(&find_sequences(&1, k)) > |> Enum.to_list > > def find_sequences(e, k) do > e > |> Enum.chunk(k, 1) > |> Enum.reduce(%{}, fn w, acc -> > Map.update(acc, w, 1, & &1 + 1) > end) > |> Enum.reject(fn({_, n}) -> n < t end) > |> Enum.map(fn({seq, _}) -> seq end) > end > > > PS: I haven't tested it. > > > > > *José Valim* > www.plataformatec.com.br > Skype: jv.ptec > Founder and Director of R&D > > On Wed, Nov 9, 2016 at 11:18 PM, Peter C. Marks <[email protected]> > wrote: > >> Yes, I do use partition. The full flow is: >> >> sequence >> |> String.to_charlist >> |> Enum.chunk(l, 1) >> |> Flow.from_enumerable >> |> Flow.partition >> |> Flow.map(fn e -> Enum.chunk(e, k, 1) end) >> |> Flow.map( >> fn e -> >> Enum.reduce(e, %{}, >> fn w, acc -> >> Map.update(acc, w, 1, & &1 + 1) >> end) >> end) >> |> Flow.flat_map( >> fn e -> >> Enum.reject(e, fn({_, n}) -> n < t end) >> end) >> |> Flow.map(fn({seq, _}) -> seq end) >> |> Enum.to_list >> >> >> >> On Wed, Nov 9, 2016 at 4:43 PM, José Valim <[email protected]. >> br> wrote: >> >>> >>> >>>> sequence >>>> |> String.to_charlist >>>> |> Enum.chunk(l, 1) >>>> |> Flow.from_enumerable >>>> |> Flow.map(fn e -> Enum.chunk(e, k, 1) end) >>>> >>> >>> Do you call partition at some point in your flow? Otherwise it won't >>> exploit parallelism if you have only one source. Also, if you need to chunk >>> before you partition, you can chunk before calling from_enumerable: >>> >>> sequence >>> |> String.to_charlist >>> |> Stream.chunk(e, k, 1) >>> |> Flow.from_enumerable >>> |> ... >>> >>> >>> I think it will be easy to add chunking to Flow because we can delegate >>> to Stream but I just want to make sure I fully understand your use case and >>> where parallelism is being introduced. >>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "elixir-lang-core" group. >>> To unsubscribe from this topic, visit https://groups.google.com/d/to >>> pic/elixir-lang-core/Avea6YFZLRQ/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/elixir-lang-core/CAGnRm4KpD-tf5p5sAS2nwZusd0reKdti-zL8-w >>> zf%3DHjD8p%3D5qQ%40mail.gmail.com >>> <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4KpD-tf5p5sAS2nwZusd0reKdti-zL8-wzf%3DHjD8p%3D5qQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Peter C. Marks >> @PeterCMarks >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elixir-lang-core" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/elixir-lang-core/CA%2BKdhmg2EZt6SLgE8g_oH%2B-Jjpx075BmPO >> vfePSvjQZsitXVVg%40mail.gmail.com >> <https://groups.google.com/d/msgid/elixir-lang-core/CA%2BKdhmg2EZt6SLgE8g_oH%2B-Jjpx075BmPOvfePSvjQZsitXVVg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "elixir-lang-core" group. > To unsubscribe from this topic, visit https://groups.google.com/d/ > topic/elixir-lang-core/Avea6YFZLRQ/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/elixir-lang-core/CAGnRm4Kv-5pBcqEPLi2ejhfze0yKTkh8FTUieQ- > hon3HB6DsoQ%40mail.gmail.com > <https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4Kv-5pBcqEPLi2ejhfze0yKTkh8FTUieQ-hon3HB6DsoQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- Peter C. Marks @PeterCMarks -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CA%2BKdhmj8ejinTB502T5nJWqrfpkDdK%3DN0Ds0LRg78EmOgeHC-w%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
