Re: Trouble using r/fold to optimise a reduce

Francis Avila Sat, 02 Apr 2016 20:37:10 -0700

You misunderstand how r/fold works.

My understanding is that reducers preserve the order of operation, so it 
> should give the correct answer.

This is not quite true. There are two fundamental principles behind 
reducers. The first is that the process of reduction, stripped of all 
non-essentials, is "apply this function individually to every item in a 
collection, accumulating the result". Notice that there is no mention of 
order.

The second principle is, when reduction has been thus "reduced" to its 
essentials, it is possible for a *collection* to control how reductions are 
done over itself. Whether a reduction process has a particular order is a 
non-essential property that is provided or not-provided by the collection 
itself. The IReduceInit protocol is purposefully agnostic to order. Vectors 
implement it in an ordered way, but hash maps have no meaningful order.

So that's IReduceInit. The coll-fold protocol takes advantage of delegating 
the reduction implementation to the collection by allowing collections to 
provide a potentially-parallel reduction strategy. How this works is that 
the collection divides itself up into chunks, runs a normal reduction over 
each chunk using the same init value, then applies a "combine" function to 
combine the chunks. In the case of vectors, the chunks are combined in 
order, so the operation you are performing does not need to be commutative. 
However, the combine function still needs to be *associative*, because each 
individual sub-reduction is not initialized with the result of the previous 
reduction.

Even stranger - the parallel version seems to at least produce an output 
> (not sure how correct) if run for the first 50 commands instead of 300.
>

The problem here is misunderstanding the "combine" step of fold. You supply 
the same function apply-cmd for both reducing and combining. However, the 
reducing function is called with a set of "on" lights and a command, but 
the combine function is called with two sets of on lights (i.e., it is 
called with the result of two reductions). So in the combine phase, r/fold 
is calling apply-cmd something like this: (apply-cmd #{[1 1]} #{[2 2]}), 
which is not what it expects, hence your strange exception about Longs.

Anyway, the combine function must be associative, but this advent problem 
is *not* associative, because you cannot rearrange the order in which the 
toggle functions are run. Each toggle function must always have the final 
state of every previous line done. You could compute the commands *in 
between* the toggle commands in parallel (essentially coalescing them into 
a single "turn on" command), but you still need to run the whole sequence 
of toggle commands, from start to finish, in order.

Here is a simple example to illustrate:

; Our commands. :toggle simplified to illustrate the point.
(def cmds [[:on #{1 2}] [:toggle 2] [:on #{3}]])
;=> #'advent.a6/cmds
; Our reduction function, with initializing arity
(defn apply-cmd
  ([] #{})
  ([on [cmd arg]]
   (case cmd
     :on (into on arg)
     :toggle (if (contains? on arg)
               (disj on arg)
               (conj on arg)))))
;=> #'advent.a6/apply-cmd
;; The correct answer, for reference
(r/reduce apply-cmd cmds)
;=> #{1 3}

Now let's perform the steps an r/fold would perform. We have three commands 
and will do two chunks, but we will divide the chunks differently. Here is 
what happens when we split between the second and third command:

; Chunk one
(r/reduce apply-cmd (subvec cmds 0 2))
;=> #{1}
; Chunk two
(r/reduce apply-cmd (subvec cmds 2))
;=> #{3}
; Combine
(into *1 *2)
;=> #{1 3}
; Looks ok...

Now lets try splitting between the first and second command:

; Chunk one
(r/reduce apply-cmd (subvec cmds 0 1))
;=> #{1 2}
; Chunk two
(r/reduce apply-cmd (subvec cmds 1))
;=> #{3 2}
;; Combine
(into *1 *2)
;=> #{1 3 2}
; WRONG ANSWER!!

Notice *how you split the commands up* affects the final answer, which 
means this algorithm can't be safely implemented with r/fold!

On Saturday, April 2, 2016 at 6:45:16 PM UTC-5, Divyansh Prakash wrote:
>
> Just verified - it works and gives the correct answer for shorter 
> collections. 
> No performance boost though. And fails (?) on larger collections.
> Not sure what's happening. The foldvec implementation is also pretty hard 
> to understand.
>
> On Sunday, April 3, 2016 at 2:38:26 AM UTC+5:30, Francis Avila wrote:
>>
>> Your input collection to r/fold is provided by cmds-from-input which 
>> returns a lazy-seq (from map) which is not a parallel-foldable type. You 
>> can try mapv instead: vectors are parallel-foldable. (Note only 
>> PersistentVector and PersistentHashMap have useful coll-fold 
>> implementations: all other objects (including sets) fall back on normal 
>> reduction: 
>> https://github.com/clojure/clojure/blob/d5708425995e8c83157ad49007ec2f8f43d8eac8/src/clj/clojure/core/reducers.clj#L347-L367
>> )
>>
>> Additionally, I'm not sure how this algorithm could be parallelized 
>> because the order in which you apply the toggle operation matters! I 
>> suspect if you make the mapv change I suggest you will get different final 
>> answers.
>>
>>
>>
>>
>>
>>
>> On Saturday, April 2, 2016 at 3:24:47 PM UTC-5, Divyansh Prakash wrote:
>>>
>>> Hi! 
>>> I'm solving the problem described here <http://adventofcode.com/day/6>. 
>>> I've got the solution 
>>> <https://github.com/divs1210/advent-of-code/blob/master/src/aoc/day6.clj>, 
>>> but it takes ~50 s to compute.
>>> I tried optimising it by replacing the main reduce operation with 
>>> r/fold, but it doesn't seem to have any effect on the performance for 
>>> whatever n (batch size) I use.
>>> Any suggestions?
>>>
>>> Note: I'm using a MacBook Pro with 8 cores. JDK 7. Clojure 1.8.
>>>
>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Trouble using r/fold to optimise a reduce

Reply via email to