Michael,
I have been in that position -- let me file the performance issue on your
behalf.
Then you can follow this link
<https://github.com/JuliaLang/julia/issues/17581> to see what I did to make
it an issue.
Welcome,
Jeffrey
On Saturday, July 23, 2016 at 11:36:13 AM UTC-4, Stefan Karpinski wrote:
>
> Go here: https://github.com/JuliaLang/julia/issues/new; describe the
> issue (much as you did here) and submit. Thank you!
>
> On Fri, Jul 22, 2016 at 8:58 PM, Michael Prange <[email protected]
> <javascript:>> wrote:
>
>> I'm new to Julia and do not know how to file a performance issue, but I
>> am happy to do it. Can you point me to the right place?
>>
>> Sent from my phone
>>
>> On Jul 22, 2016, at 18:09, Stefan Karpinski <[email protected]
>> <javascript:>> wrote:
>>
>> Can you file a performance issue? The built-in circshift should not have
>> these performance issues.
>>
>> On Fri, Jul 22, 2016 at 4:18 PM, Michael Prange <[email protected]
>> <javascript:>> wrote:
>>
>>> I just discovered that Julia already has a function for circularly
>>> shifting the data in an array: circshift(A, shifts). However, its
>>> performance is worst of all. Using this new method,
>>>
>>> function fill_W4!{TF}(W::Matrix{TF}, icol::Int, w::Vector{TF}, ishift::
>>> Int)
>>> @assert(size(W,1) == length(w), "Dimension mismatch between W and w"
>>> )
>>> W[:,icol] = circshift(w,[ishift,])
>>> return
>>> end
>>>
>>>
>>> the resulting timings are given by (with new random numbers)
>>>
>>> fill_W!: 0.002918 seconds (4 allocations: 160 bytes)
>>> fill_W1!: 0.006440 seconds (10 allocations: 7.630 MB)
>>> fill_W2!: 0.009244 seconds (8 allocations: 7.630 MB, 21.61% gc time)
>>> fill_W3!: 0.002014 seconds (8 allocations: 352 bytes)
>>> fill_W4!: 0.049601 seconds (19 allocations: 30.518 MB, 3.63% gc time)
>>>
>>>
>>> I would have expected the built-in method circshift to achieve the best
>>> results, but it is worst in all categories: time, allocations and memory.
>>>
>>> Michael
>>>
>>> On Friday, July 22, 2016 at 2:23:16 PM UTC-4, Michael Prange wrote:
>>>>
>>>> Gunnar,
>>>>
>>>> Thank you for your explanation of the extra allocations and the tip
>>>> about sub. I implemented a version with sub as fill_W3!:
>>>>
>>>> function fill_W3!{TF}(W::Matrix{TF}, icol::Int, w::Vector{TF},
>>>> ishift::Int)
>>>> @assert(size(W,1) == length(w), "Dimension mismatch between W and
>>>> w")
>>>> W[(ishift+1):end,icol] = sub(w, 1:(length(w)-ishift))
>>>> W[1:ishift,icol] = sub(w, (length(w)-ishift+1):length(w))
>>>> return
>>>> end
>>>>
>>>> Is this what you had in mind? I reran the tests above (with new random
>>>> numbers) and had the following results:
>>>>
>>>> fill_W!: 0.003234 seconds (4 allocations: 160 bytes)
>>>> fill_W1!: 0.005898 seconds (9 allocations: 7.630 MB)
>>>> fill_W2!: 0.005904 seconds (7 allocations: 7.630 MB)
>>>> fill_W3!: 0.002347 seconds (8 allocations: 352 bytes)
>>>>
>>>> Using sub consistently achieves better times that fill_W!, even through it
>>>> uses twice the number of allocations than fill_W!. This seems to be the
>>>> way to go.
>>>>
>>>>
>>>> Michael
>>>>
>>>>
>>>> On Thursday, July 21, 2016 at 5:35:47 PM UTC-4, Gunnar Farnebäck wrote:
>>>>>
>>>>> fill_W1! allocates memory because it makes copies when constructing
>>>>> the right hand sides. fill_W2 allocates memory in order to construct the
>>>>> comprehensions (that you then discard). In both cases memory allocation
>>>>> could plausibly be avoided by a sufficiently smart compiler, but until
>>>>> Julia becomes that smart, have a look at the sub function to provide
>>>>> views
>>>>> instead of copies for the right hand sides of fill_W1!.
>>>>>
>>>>> On Thursday, July 21, 2016 at 5:07:34 PM UTC+2, Michael Prange wrote:
>>>>>>
>>>>>> I'm a new user, so have mercy in your responses.
>>>>>>
>>>>>> I've written a method that takes a matrix and vector as input and
>>>>>> then fills in column icol of that matrix with the vector of given values
>>>>>> that have been shifted upward by ishift indices with periodic boundary
>>>>>> conditions. To make this clear, given the matrix
>>>>>>
>>>>>> W = [1 2
>>>>>> 3 4
>>>>>> 5 6]
>>>>>>
>>>>>> the vector w = [7 8 9], icol = 2 and ishift = 1, the new value of W
>>>>>> is given by
>>>>>>
>>>>>> W = [1 8
>>>>>> 3 9
>>>>>> 5 7]
>>>>>>
>>>>>> I need a fast way of doing this for large matrices. I wrote three
>>>>>> methods that should (In my naive mind) give the same performance
>>>>>> results,
>>>>>> but @time reports otherwise. The method definitions and the performance
>>>>>> results are given below. Can someone teach me why the results are so
>>>>>> different? The method fill_W! is too wordy for my tastes, but the more
>>>>>> compact notation in fill_W1! and fill_W2! achieve poorer results. Any
>>>>>> why
>>>>>> do these latter two methods allocate so much memory when the whole point
>>>>>> of
>>>>>> these methods is to use already-allocated memory.
>>>>>>
>>>>>> Michael
>>>>>>
>>>>>> ### Definitions
>>>>>>
>>>>>>
>>>>>> function fill_W1!{TF}(W::Matrix{TF}, icol::Int, w::Vector{TF},
>>>>>> ishift::Int)
>>>>>> @assert(size(W,1) == length(w), "Dimension mismatch between W
>>>>>> and w")
>>>>>> W[1:(end-ishift),icol] = w[(ishift+1):end]
>>>>>> W[(end-(ishift-1)):end,icol] = w[1:ishift]
>>>>>> return
>>>>>> end
>>>>>>
>>>>>>
>>>>>> function fill_W2!{TF}(W::Matrix{TF}, icol::Int, w::Vector{TF},
>>>>>> ishift::Int)
>>>>>> @assert(size(W,1) == length(w), "Dimension mismatch between W
>>>>>> and w")
>>>>>> [W[i,icol] = w[i+ishift] for i in 1:(length(w)-ishift)]
>>>>>> [W[end-ishift+i,icol] = w[i] for i in 1:ishift]
>>>>>> return
>>>>>> end
>>>>>>
>>>>>>
>>>>>> function fill_W!{TF}(W::Matrix{TF}, icol::Int, w::Vector{TF},
>>>>>> ishift::Int)
>>>>>> @assert(size(W,1) == length(w), "Dimension mismatch between W
>>>>>> and w")
>>>>>> n = length(w)
>>>>>> for j in 1:(n-ishift)
>>>>>> W[j,icol] = w[j+ishift]
>>>>>> end
>>>>>> for j in (n-(ishift-1)):n
>>>>>> W[j,icol] = w[j-(n-ishift)]
>>>>>> end
>>>>>> end
>>>>>>
>>>>>>
>>>>>> # Performance Results
>>>>>> julia>
>>>>>> W = rand(1000000,2)
>>>>>> w = rand(1000000)
>>>>>> println("fill_W!:")
>>>>>> println(@time fill_W!(W, 2, w, 2))
>>>>>> println("fill_W1!:")
>>>>>> println(@time fill_W1!(W, 2, w, 2))
>>>>>> println("fill_W2!:")
>>>>>> println(@time fill_W2!(W, 2, w, 2))
>>>>>>
>>>>>>
>>>>>> Out>
>>>>>> fill_W!:
>>>>>> 0.002801 seconds (4 allocations: 160 bytes)
>>>>>> nothing
>>>>>> fill_W1!:
>>>>>> 0.007427 seconds (9 allocations: 7.630 MB)
>>>>>> [0.152463397611579,0.6314166578356002]
>>>>>> fill_W2!:
>>>>>> 0.005587 seconds (7 allocations: 7.630 MB)
>>>>>> [0.152463397611579,0.6314166578356002]
>>>>>>
>>>>>>
>>>>>>
>>
>