Note: This proposal contains images and rich text that may not display correctly in your email. If so, you can read this proposal in a gist <https://gist.github.com/josevalim/5c6735a4b90acc1bafdafec09acabe4f>.
There is prior art in languages like Common Lisp, Haskell, and even in C# with LINQ on having very powerful comprehensions as part of the language. While Elixir comprehensions are already very expressive, allowing you to map, filter, reduce, and collect over multiple enumerables at the same time, it is still not capable of expressing other constructs, such as map_reduce. The challenge here is how to continue adding more expressive power to comprehensions without making the API feel massive. That's why, 7 years after v1.0, only two new options have been added to comprehensions, :uniq and :reduce, to a total of 3 (:into, :uniq, and :reduce). Imperative loops I have been on the record a couple times saying that, while many problems are more cleanly solved with recursion, there is a category of problems that are much more elegant with imperative loops. One of those problems have been described in the "nested-data-structures-traversal" <https://github.com/josevalim/nested-data-structure-traversal> repository, with solutions available in many different languages. Please read the problem statement in said repository, as I will assume from now on that you are familiar with it. Personally speaking, the most concise and clear solution is the Python one, which I reproduce here: section_counter = 1lesson_counter = 1 for section in sections: if section["reset_lesson_position"]: lesson_counter = 1 section["position"] = section_counter section_counter += 1 for lesson in section["lessons"]: lesson["position"] = lesson_counter lesson_counter += 1 There are many things that make this solution clear: - Reassignment - Mutability - Sensitive whitespace Let's compare it with the Elixir solution I wrote and personally prefer <https://github.com/josevalim/nested-data-structure-traversal/blob/master/elixir/map_reduce.exs>. I am pasting an image below which highlights certain aspects: [image: Screenshot 2021-12-13 at 10 02 48] <https://user-images.githubusercontent.com/9582/145821890-6557ea21-e61f-4813-8c54-53c4ea1a9438.png> - Lack of reassignment: in Elixir, we can't reassign variables, we can only rebind them. The difference is, when you do var = some_value inside a if, for, etc, the value won't "leak" to the outer scope. This implies two things in the snippet above: 1. We need to use Enum.map_reduce/3 and pass the state in and out (highlighted in red) 2. When resetting the lesson counter, we need both sides of the conditional (hihhlighted in yellow) - Lack of mutability: even though we set the lesson counter inside the inner map_reduce, we still need to update the lesson inside the session (highlighted in green) - Lack of sensitive whitespace: we have two additional lines with end in them (highlighted in blue) As you can see, do-end blocks add very litte noise to the final solution compared to sensitive whitespace. In fact, the only reason I brought it up is so we can confidently discard it from the discussion from now on. And also because there is zero chance of the language suddenly becoming whitespace sensitive. There is also zero chance of us introducing reassignment and making mutability first class in Elixir too. The reason for this is because we all agree that, the majority of the time, lack of reassignment and lack of mutability are features that make our code more readable and understandable in the long term. The snippet above is one of the few examples where we are on the wrong end of the trade-offs. Therefore, how can we move forward? Comprehensions Comprehensions in Elixir have always been a syntax sugar to more complex data-structure traversals. Do you want to have the cartesian product between all points in x and y? You could write this: Enum.flat_map(x, fn i -> Enum.map(y, fn j -> {i, j} end)end) Or with a comprehension: for i <- x, j <- y, do: {i, j} Or maybe you want to brute force your way into finding Pythagorean Triples? Enum.flat_map(1..20, fn a -> Enum.flat_map(1..20, fn b -> 1..20 |> Enum.filter(fn c -> a*a + b*b == c*c end) |> Enum.map(fn c -> {a, b, c} end) end)end) Or with a comprehension: for a <- 1..20, b <- 1..20, c <- 1..20, a*a + b*b == c*c, do: {a, b, c} There is no question the comprehensions are more concise and clearer, once you understand their basic syntax elements (which are, at this point, common to many languages). As mentioned in the introduction, we can express map, filter, reduce, and collect inside comprehensions. But how can we represent map_reduce in a clear and concise way? The :map_reduce option Since we have :reduce in comprehensions, we could introduce :map_reduce. The solution above would look like this: {sections, _acc} = for section <- sections, map_reduce: {1, 1} do {section_counter, lesson_counter} -> lesson_counter = if section["reset_lesson_position"], do: 1, else: lesson_counter {lessons, lesson_counter} = for lesson <- section["lessons"], map_reduce: lesson_counter do lesson_counter -> {Map.put(lesson, "position", lesson_counter), lesson_counter + 1} end section = section |> Map.put("lessons", lessons) |> Map.put("position", section_counter) {section, {section_counter + 1, lesson_counter}} end While there is a bit less noise compared to the original solution, the reduction of noise mostly happened by the removal of modules names and a few tokens, such as fn, (, and ). In terms of implementation, there is still a lot of book keeping required to manage the variables. Can we do better? Introducing :let Our goal is to declare variables that are automatically looped within the comprehension. So let's introduce a new option that does exactly that: :let. :let expects one or a tuple of variables that will be reused across the comprehension. At the end, :let returns a tuple with the comprehension elements and the let variables. Here is how the solution would look like: section_counter = 1lesson_counter = 1 {sections, _} = for section <- sections, let: {section_counter, lesson_counter} do lesson_counter = if section["reset_lesson_position"], do: 1, else: lesson_counter {lessons, lesson_counter} = for lesson <- section["lessons"], let: lesson_counter do lesson = Map.put(lesson, "position", lesson_counter) lesson_counter = lesson_counter + 1 lesson end section = section |> Map.put("lessons", lessons) |> Map.put("position", section_counter) section_counter = section_counter + 1 section end The :let option automatically takes care of passing the variables across the comprehension, considerably cutting down the noise, without introducing any mutability into the language. At the end, for+:let returns the result of the comprehension plus the :let variables wrapped in a tuple. Extensions Here are some extensions to the proposal above. Not all of them might be available on the initial implementation. Let initialization You can also initialize the variables within let for convenience: {sections, _} = for section <- sections, let: {section_counter = 1, lesson_counter = 1} do This should be available in the initial implementation. :reduce vs :let With :let, :reduce becomes somewhat redundant. For example, Enum.group_by/2 could be written as: for {k, v} <- Enum.reverse(list), reduce: %{} do acc -> Map.update(acc, k, [v], &[v | &1])end with :let: {_, acc} = for {k, v} <- Enum.reverse(list), let: acc = %{} do acc = Map.update(acc, k, [v], &[v | &1]) end The difference, however, is that :let returns the collection, while :reduce does not. While the Elixir compiler could be smart enough to optimize away building the collection in the :let case if we don't use it, we may want to keep both :let and :reduce options for clarity. If this is the case, I propose to align the syntaxes such that :reduce uses the same semantics as :let. The only difference is the return type: for {k, v} <- Enum.reverse(list), reduce: acc = %{} do acc = Map.update(acc, k, [v], &[v | &1])end This can be done in a backwards compatible fashion. after When you look at our solution to the problem using let, we had to introduce temporary variables in order to update our let variables: {lessons, lesson_counter} = for lesson <- section["lessons"], let: lesson_counter do lesson = Map.put(lesson, "position", lesson_counter) lesson_counter = lesson_counter + 1 lesson end One extension is to add after to the comprehensions, which are computed after the result is returned: {lessons, lesson_counter} = for lesson <- section["lessons"], let: lesson_counter do Map.put(lesson, "position", lesson_counter) after lesson_counter = lesson_counter + 1 end This does not need to be part of the initial implementation. Summary Feedback on the proposal and extensions is welcome! -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4JjyZ2EUcYm1TA747pP2pTgDAD_%3DW%2BM9mSizFHJXFfqnQ%40mail.gmail.com.