Re: parallel sequence side-effect processor

Dragan Djuric Fri, 23 Sep 2016 16:49:24 -0700

There are a few typos and defn is missing from fb in my last message, but I 
hope it is still readable. Sorry, I am typing this on a mobile device while 
watching the last episode of The Man in the High Castle :) Also, I am 
talking about the code I wrote years ago from the top of my mind without 
access to the repl :)


On Saturday, September 24, 2016 at 1:44:10 AM UTC+2, Dragan Djuric wrote:
>
> A couple of things:
> 1. How fold/foldmap and any other function works, depends on the actual 
> type. For example, if you look at 
> https://github.com/uncomplicate/neanderthal/blob/master/src/clojure/uncomplicate/neanderthal/impl/fluokitten.clj#L396
>  
> you can see that there no intermediate allocations, and everything is 
> primitive.
> 2. Now, if you give foldmap a sequence (or a vector), it goes to the 
> implementation that you pointed to. Now, the difference from map: if I 
> understand well, map would produce an unnecessary resulting sequence. 
> foldmap does not. your accumulating function does need to return 1, or 
> sequences - why not return nil? Also, the accumulator is nil, and can use 
> any dummy function that just nils everything. There is only a matter of 
> calling first/next. Do they really produce any new instance objects? That 
> depends on the implementation of seq, I believe, but it's the same even if 
> we used loop/recur, I believe? 
> 3. The sequences in your printout results are the result of how clojure 
> treat varargs, or I am missing something. So, if I give it a function such 
> as (fb [_ a b] (println a b)), what is exactly allocated, that is not 
> allocated even when using loop/recur directly with first/next?
>
> On Saturday, September 24, 2016 at 1:25:14 AM UTC+2, tbc++ wrote:
>>
>> Yeah, I have to call you out on this one Dragan. I ran the following 
>> code: 
>>
>> (ns fold-test
>> (:require [uncomplicate.fluokitten.core :refer [foldmap]]))
>>
>> (defn fa [& args]
>> (println "fa " args)
>> 1)
>>
>> (defn fb [& args]
>> (println "fb " args)
>> 1)
>>
>> (defn test-fold []
>> (foldmap fa nil fb [1 2 3] [4 5 6]))
>>
>> (test-fold)
>>
>>
>> This code produced: 
>>
>> fb  (1 4)
>> fa  (nil 1)
>> fb  (2 5)
>> fa  (1 1)
>> fb  (3 6)
>> fa  (1 1)
>>
>> So I put a breakpoint in `fb` and ran it again. The stacktrace says it 
>> ends up in algo/collection-foldmap which we can see here: 
>> https://github.com/uncomplicate/fluokitten/blob/master/src/uncomplicate/fluokitten/algo.clj#L415-L443
>>
>> That function is creating seqs out of all its arguments! So it really is 
>> not better than clojure.core/map as far as allocation is concerned. 
>>
>> Timothy
>>
>> On Fri, Sep 23, 2016 at 5:15 PM, Francis Avila <fav...@breezeehr.com> 
>> wrote:
>>
>>> There are a few intermediate collections here:
>>>
>>>
>>>    1. The source coll may produce a seq object. How costly this is 
>>>    depends on the type of coll and the quality of its iterator/ireduce/seq 
>>>    implementations.
>>>    2. You may need to collect multiple source colls into a tuple-like 
>>>    thing to produce a single object for the side-effecting function
>>>    3. You may have an intermediate seq/coll of these tuple-like things.
>>>    4. You may have a useless seq/coll of "output" from the 
>>>    side-effecting function
>>>
>>> In the single-coll case:
>>>
>>> (map f col1) pays 1,4.
>>> (doseq [x col1] (f x)) pays 1.
>>> (run! f col1) pays 1 if coll has an inefficient IReduce, otherwise it 
>>> pays nothing.
>>> (fold f col1) is the same (using reducers r/fold protocol for vectors, 
>>> which ultimately uses IReduce)
>>>
>>> In the multi-coll case:
>>>
>>> (map f coll1 col2) pays all four. 
>>> (run! (fn [[a b]] (f a b)) (map vector col1 col2)) pays 1, 2, and 3.
>>> (doseq [[a b] (map vector col1 col2)] (f a b)) pays 1, 2, 3.
>>> (fold f col1 col2) pays 1 from what I can see? (It uses first+next to 
>>> walk over the items stepwise? There's a lot of indirection so I'm not 100% 
>>> sure what the impl is for vectors that actually gets used.)
>>>
>>> There is no way to avoid 1 in the multi-step case (or 2 if you are fully 
>>> variadic), all you can do is use the most efficient-possible intermediate 
>>> object to track the traversal. Iterators are typically cheaper than seqs, 
>>> so the ideal case would be a loop-recur over multiple iterators.
>>>
>>> In the multi-coll case there is also no way IReduce can help. IReduce is 
>>> a trade: you give up the power to see each step of iteration in order to 
>>> allow the collection to perform the overall reduction operation more 
>>> efficiently. However with multi-coll you really do need to control the 
>>> iteration so you can get all the items at an index together.
>>>
>>> The ideal for multi-collection would probably be something that 
>>> internally looks like clojure.core/sequence but doesn't accumulate the 
>>> results. (Unfortunately some of the classes necessary to do this 
>>> (MultiIterator) are private.)
>>>
>>> Fluokitten could probably do it with some tweaking to its 
>>> algo/collection-foldmap to use iterators where possible instead of 
>>> first/next.
>>>
>>>
>>> On Friday, September 23, 2016 at 5:23:51 PM UTC-5, Dragan Djuric wrote:
>>>>
>>>> fluokitten's fold is MUCH better than (map f a b) because it does NOT 
>>>> create intermediate collections. just use (fold f a b) and it would fold 
>>>> everything into one thing (in this case nil). If f is a function with side 
>>>> effects, it will invoke them. No intermediate collection is created AND 
>>>> the 
>>>> folding would be optimized per the type of a.
>>>>
>>>> On Friday, September 23, 2016 at 10:56:00 PM UTC+2, tbc++ wrote:
>>>>>
>>>>> How is fluokitten's fold any better than using seqs like (map f a b) 
>>>>> would? Both create intermediate collections.
>>>>>
>>>>> On Fri, Sep 23, 2016 at 11:40 AM, Dragan Djuric <drag...@gmail.com> 
>>>>> wrote:
>>>>>
>>>>>> If you do not insist on vanilla clojure, but can use a library, fold 
>>>>>> from fluokitten might enable you to do this. It is similar to reduce, 
>>>>>> but 
>>>>>> accepts multiple arguments. Give it a vararg folding function that 
>>>>>> prints 
>>>>>> what you need and ignores the first parameter, and you'd get what you 
>>>>>> asked 
>>>>>> for.
>>>>>>
>>>>>>
>>>>>> On Friday, September 23, 2016 at 7:15:42 PM UTC+2, Mars0i wrote:
>>>>>>>
>>>>>>> On Friday, September 23, 2016 at 11:11:07 AM UTC-5, Alan Thompson 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Huh.  I was also unaware of the run! function.
>>>>>>>>
>>>>>>>> I suppose you could always write it like this:
>>>>>>>>
>>>>>>>> (def x (vec (range 3)))
>>>>>>>> (def y (vec (reverse x)))
>>>>>>>>
>>>>>>>> (run!
>>>>>>>>   (fn [[x y]] (println x y))
>>>>>>>>
>>>>>>>>   (map vector x y))
>>>>>>>>
>>>>>>>>
>>>>>>>>  > lein run
>>>>>>>> 0 2
>>>>>>>> 1 1
>>>>>>>> 2 0
>>>>>>>>
>>>>>>>>
>>>>>>> Yes.  But that's got the same problem.  Doesn't matter with a toy 
>>>>>>> example, but the (map vector ...) could be undesirable with large 
>>>>>>> collections in performance-critical code.
>>>>>>>
>>>>>>> although the plain old for loop with dotimes looks simpler:
>>>>>>>>
>>>>>>>> (dotimes [i (count x) ]
>>>>>>>>   (println (x i) (y i)))
>>>>>>>>
>>>>>>>>
>>>>>>>> maybe that is the best answer? It is hard to beat the flexibility 
>>>>>>>> of a a loop and an explicit index.
>>>>>>>>
>>>>>>>
>>>>>>> I agree that this is clearer, but it kind of bothers me to index 
>>>>>>> through a vector sequentially in Clojure.  We need indexing In Clojure 
>>>>>>> because sometimes you need to access a vector more arbitrarily.  If 
>>>>>>> you're 
>>>>>>> just walking the vector in order, we have better methods--as long as we 
>>>>>>> don't want to walk multiple vectors in the same order for side effects.
>>>>>>>
>>>>>>> However, the real drawback of the dotimes method is that it's not 
>>>>>>> efficient for the general case; it could be slow on lists, lazy 
>>>>>>> sequences, 
>>>>>>> etc. (again, on non-toy examples).  Many of the most convenient Clojure 
>>>>>>> functions return lazy sequences.  Even the non-lazy sequences returned 
>>>>>>> by 
>>>>>>> transducers aren't efficiently indexable, afaik.  Of course you can 
>>>>>>> always 
>>>>>>> throw any sequence into 'vec' and get out a vector, but that's an 
>>>>>>> unnecessary transformation if you just want to iterate through the 
>>>>>>> sequences element by element.
>>>>>>>
>>>>>>> If I'm writing a function that will plot points or that will write 
>>>>>>> data to a file, it shouldn't be a requirement for the sake of 
>>>>>>> efficiency 
>>>>>>> that the data come in the form of vectors.  I should be able to pass in 
>>>>>>> the 
>>>>>>> data in whatever form is easiest.  Right now, if I wanted efficiency 
>>>>>>> for 
>>>>>>> walking through sequences in the same order, without creating 
>>>>>>> unnecessary 
>>>>>>> data structures, I'd have to write the function using loop/recur.  On 
>>>>>>> the 
>>>>>>> other hand, if I wanted the cross product of the sequences, I'd use 
>>>>>>> doseq 
>>>>>>> and be done a lot quicker with clearer code.
>>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Clojure" group.
>>>>>> To post to this group, send email to clo...@googlegroups.com
>>>>>> Note that posts from new members are moderated - please be patient 
>>>>>> with your first post.
>>>>>> To unsubscribe from this group, send email to
>>>>>> clojure+u...@googlegroups.com
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/clojure?hl=en
>>>>>> --- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Clojure" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to clojure+u...@googlegroups.com.
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> “One of the main causes of the fall of the Roman Empire was 
>>>>> that–lacking zero–they had no way to indicate successful termination of 
>>>>> their C programs.”
>>>>> (Robert Firth) 
>>>>>
>>>> -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Clojure" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to clojure+u...@googlegroups.com.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>> “One of the main causes of the fall of the Roman Empire was that–lacking 
>> zero–they had no way to indicate successful termination of their C 
>> programs.”
>> (Robert Firth) 
>>
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: parallel sequence side-effect processor

Reply via email to