First off, I'm sorry if this has been discussed before - I've searched 
several places and haven't come across anything so that's why I'm here :) 
I'm sure there's a reason why this does not exist, but on the off chance it 
was simply never considered here's the formal request!

I believe there is currently a gap inside the Stream module with regards to 
reductions. As it stands it appears the only option available to someone 
who does not wish to buffer an entire File (for example) whilst running a 
Stream is Stream.run/1. As the documentation explains, this lets you run a 
Stream which may have side effects and ignores the value at the end. I 
believe there should be something similar which returns state at the end, 
namely Stream.reduce/3.

In the past, I have argued against this myself and said that there's no 
"good" use case but I've hit one I feel totally justifies the addition 
(assuming there's nothing I'm missing). Below is an example use case - it's 
a dumb example, but the concept is sound.

Consider you're streaming a file containing line separated numbers. Your 
task is to count the occurrences of even numbers, but the file is 50GB in 
size and you only have 4GB of RAM on your machine (you get the idea). So 
you can't just buffer the whole thing, because your app would crash. 
Instead you have to resort to a Stream, but you only ever need to keep 
track of a single variable; the number of even numbers.

If we look at the Stream library, it might appear that Stream.scan/3 is 
perfect for this as it passes the result of the last function through to 
the next execution - fantastic.

So we devise our call:

require Integer

"my_file.txt"
|> File.stream!
|> Stream.scan(0, fn(line, count) ->
     num = convert_to_number(line)

     if Integer.is_even(num) do
       count + 1
     else
       count
     end
   end)

This looks great because it means the last return value of our Stream is 
going to be the number of even numbers. But we then hit a wall, because 
Stream.run/1 doesn't return anything except :ok - and the only other way to 
get the last return value of the Stream is to do Enum.to_list/1 and walk 
the entire list until the end and then pull it out, which defeats the point 
of what we're trying to do.

It's because of this that I feel we need a Stream.reduce/3 which has the 
only intent of being able to run a Stream whilst returning a value - 
meaning that the last value returned by the callback would be the return 
value of the Stream.reduce/3 call.

Then you could do:

require Integer

"my_file.txt"
|> File.stream!
|> Stream.reduce(0, fn(line, count) ->
     num = convert_to_number(line)

     if Integer.is_even(num) do
       {:cont, count + 1}
     else
       {:cont, count}
     end
   end)
|> IO.inspect # prints a number of even numbers, without ever buffering the 
list

This has the powerful ability to return a single value after processing a 
Stream without ever holding the entire thing in memory, which is awesome 
(imagine all of the neat use cases for analytics apps).

I don't know much about how this would be implemented, but it appears to be 
a small modification of scan to simply keep the last returned value rather 
than add it to a list. That's partly why I feel it should be added - it's 
likely that Stream.scan/3 could re-use Stream.reduce/3, meaning there's 
hopefully only a small amount of code bloat.

Is there a reason this does not/could not exist? The closest thing I've 
found to implementing this appears to be with Stream.transform/4 but I have 
been unsuccessful and even if I got it to work it feels hacky.

Thank you in advance for your replies!

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/4b6a48d2-4855-4877-b00c-d72a26748aac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to