Note: This proposal contains images and rich text that may not display
correctly in your email. If so, you can read this proposal in a gist
<https://gist.github.com/josevalim/5c6735a4b90acc1bafdafec09acabe4f>.

There is prior art in languages like Common Lisp, Haskell, and even in C#
with LINQ on having very powerful comprehensions as part of the language.
While Elixir comprehensions are already very expressive, allowing you to map,
filter, reduce, and collect over multiple enumerables at the same time, it
is still not capable of expressing other constructs, such as map_reduce.

The challenge here is how to continue adding more expressive power to
comprehensions without making the API feel massive. That's why, 7 years
after v1.0, only two new options have been added to comprehensions, :uniq
and :reduce, to a total of 3 (:into, :uniq, and :reduce).
Imperative loops

I have been on the record a couple times saying that, while many problems
are more cleanly solved with recursion, there is a category of problems
that are much more elegant with imperative loops. One of those problems
have been described in the "nested-data-structures-traversal"
<https://github.com/josevalim/nested-data-structure-traversal> repository,
with solutions available in many different languages. Please read the
problem statement in said repository, as I will assume from now on that you
are familiar with it.

Personally speaking, the most concise and clear solution is the Python one,
which I reproduce here:

section_counter = 1lesson_counter = 1
for section in sections:
    if section["reset_lesson_position"]:
        lesson_counter = 1

    section["position"] = section_counter
    section_counter += 1

    for lesson in section["lessons"]:
        lesson["position"] = lesson_counter
        lesson_counter += 1

There are many things that make this solution clear:

   - Reassignment
   - Mutability
   - Sensitive whitespace

Let's compare it with the Elixir solution I wrote and personally prefer
<https://github.com/josevalim/nested-data-structure-traversal/blob/master/elixir/map_reduce.exs>.
I am pasting an image below which highlights certain aspects:

[image: Screenshot 2021-12-13 at 10 02 48]
<https://user-images.githubusercontent.com/9582/145821890-6557ea21-e61f-4813-8c54-53c4ea1a9438.png>

   -

   Lack of reassignment: in Elixir, we can't reassign variables, we can
   only rebind them. The difference is, when you do var = some_value inside
   a if, for, etc, the value won't "leak" to the outer scope. This implies
   two things in the snippet above:
   1. We need to use Enum.map_reduce/3 and pass the state in and out
      (highlighted in red)
      2. When resetting the lesson counter, we need both sides of the
      conditional (hihhlighted in yellow)
   -

   Lack of mutability: even though we set the lesson counter inside the
   inner map_reduce, we still need to update the lesson inside the session
   (highlighted in green)
   -

   Lack of sensitive whitespace: we have two additional lines with end in
   them (highlighted in blue)

As you can see, do-end blocks add very litte noise to the final solution
compared to sensitive whitespace. In fact, the only reason I brought it up
is so we can confidently discard it from the discussion from now on. And
also because there is zero chance of the language suddenly becoming
whitespace sensitive.

There is also zero chance of us introducing reassignment and making
mutability first class in Elixir too. The reason for this is because we all
agree that, the majority of the time, lack of reassignment and lack of
mutability are features that make our code more readable and understandable
in the long term. The snippet above is one of the few examples where we are
on the wrong end of the trade-offs.

Therefore, how can we move forward?
Comprehensions

Comprehensions in Elixir have always been a syntax sugar to more complex
data-structure traversals. Do you want to have the cartesian product
between all points in x and y? You could write this:

Enum.flat_map(x, fn i ->
  Enum.map(y, fn j -> {i, j} end)end)

Or with a comprehension:

for i <- x, j <- y, do: {i, j}

Or maybe you want to brute force your way into finding Pythagorean Triples?

Enum.flat_map(1..20, fn a ->
  Enum.flat_map(1..20, fn b ->
    1..20
    |> Enum.filter(fn c -> a*a + b*b == c*c end)
    |> Enum.map(fn c -> {a, b, c} end)
  end)end)

Or with a comprehension:

for a <- 1..20,
    b <- 1..20,
    c <- 1..20,
    a*a + b*b == c*c,
    do: {a, b, c}

There is no question the comprehensions are more concise and clearer, once
you understand their basic syntax elements (which are, at this point,
common to many languages).

As mentioned in the introduction, we can express map, filter, reduce, and
collect inside comprehensions. But how can we represent map_reduce in a
clear and concise way?
The :map_reduce option

Since we have :reduce in comprehensions, we could introduce :map_reduce.
The solution above would look like this:

{sections, _acc} =
  for section <- sections, map_reduce: {1, 1} do
    {section_counter, lesson_counter} ->
      lesson_counter = if section["reset_lesson_position"], do: 1,
else: lesson_counter

      {lessons, lesson_counter} =
        for lesson <- section["lessons"], map_reduce: lesson_counter do
          lesson_counter ->
            {Map.put(lesson, "position", lesson_counter), lesson_counter + 1}
        end

      section =
        section
        |> Map.put("lessons", lessons)
        |> Map.put("position", section_counter)

      {section, {section_counter + 1, lesson_counter}}
  end

While there is a bit less noise compared to the original solution, the
reduction of noise mostly happened by the removal of modules names and a
few tokens, such as fn, (, and ). In terms of implementation, there is
still a lot of book keeping required to manage the variables. Can we do
better?
Introducing :let

Our goal is to declare variables that are automatically looped within the
comprehension. So let's introduce a new option that does exactly that: :let.
:let expects one or a tuple of variables that will be reused across the
comprehension. At the end, :let returns a tuple with the comprehension
elements and the let variables.

Here is how the solution would look like:

section_counter = 1lesson_counter = 1
{sections, _} =
  for section <- sections,
      let: {section_counter, lesson_counter} do
    lesson_counter = if section["reset_lesson_position"], do: 1, else:
lesson_counter

    {lessons, lesson_counter} =
      for lesson <- section["lessons"], let: lesson_counter do
        lesson = Map.put(lesson, "position", lesson_counter)
        lesson_counter = lesson_counter + 1
        lesson
      end

    section =
      section
      |> Map.put("lessons", lessons)
      |> Map.put("position", section_counter)

    section_counter = section_counter + 1
    section
  end

The :let option automatically takes care of passing the variables across
the comprehension, considerably cutting down the noise, without introducing
any mutability into the language. At the end, for+:let returns the result
of the comprehension plus the :let variables wrapped in a tuple.
Extensions

Here are some extensions to the proposal above. Not all of them might be
available on the initial implementation.
Let initialization

You can also initialize the variables within let for convenience:

{sections, _} =
  for section <- sections,
      let: {section_counter = 1, lesson_counter = 1} do

This should be available in the initial implementation.
:reduce vs :let

With :let, :reduce becomes somewhat redundant. For example, Enum.group_by/2
could be written as:

for {k, v} <- Enum.reverse(list), reduce: %{} do
  acc -> Map.update(acc, k, [v], &[v | &1])end

with :let:

{_, acc} =
  for {k, v} <- Enum.reverse(list), let: acc = %{} do
    acc = Map.update(acc, k, [v], &[v | &1])
  end

The difference, however, is that :let returns the collection, while :reduce
does not. While the Elixir compiler could be smart enough to optimize away
building the collection in the :let case if we don't use it, we may want to
keep both :let and :reduce options for clarity. If this is the case, I
propose to align the syntaxes such that :reduce uses the same semantics as
:let. The only difference is the return type:

for {k, v} <- Enum.reverse(list), reduce: acc = %{} do
  acc = Map.update(acc, k, [v], &[v | &1])end

This can be done in a backwards compatible fashion.
after

When you look at our solution to the problem using let, we had to introduce
temporary variables in order to update our let variables:

{lessons, lesson_counter} =
  for lesson <- section["lessons"], let: lesson_counter do
    lesson = Map.put(lesson, "position", lesson_counter)
    lesson_counter = lesson_counter + 1
    lesson
  end

One extension is to add after to the comprehensions, which are computed
after the result is returned:

{lessons, lesson_counter} =
  for lesson <- section["lessons"], let: lesson_counter do
    Map.put(lesson, "position", lesson_counter)
  after
    lesson_counter = lesson_counter + 1
  end

This does not need to be part of the initial implementation.
Summary

Feedback on the proposal and extensions is welcome!

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/CAGnRm4JjyZ2EUcYm1TA747pP2pTgDAD_%3DW%2BM9mSizFHJXFfqnQ%40mail.gmail.com.

Reply via email to