[julia-users] Re: Need help/direction on how to optimize some Julia code

Harry B Thu, 23 Apr 2015 16:20:07 -0700

I am trying to profile this code, so here is what I have so far. I added 
the following code to the path taken for the single-process mode.
I didn't bother with the multi-process once since I didn't know how to deal 
with @profile and remotecall_wait


    @profile processOneFile(3085, 35649, filename)
    bt, lidict = Profile.retrieve()
    println("Profiling done")
    for (k,v) in lidict
        println(v)
    end

Output is here 
https://github.com/harikb/scratchpad1/blob/master/julia2/run1.txt   (Ran 
with julia 0.3.7)
another run 
https://github.com/harikb/scratchpad1/blob/master/julia2/run2.txt  (Ran 
with julia-debug 0.3.7) - in case it gave better results.

However, there is quite a few lines marked without line or file info.

On Wednesday, April 22, 2015 at 2:44:13 AM UTC-7, Yuuki Soho wrote:

    If I understand correctly now you are doing only 5 requests at the same 
time? It seems to me you could do much more. 

But that hides the inefficiency, whatever level it exists. The Go program 
also uses only 5 parallel connections.

On Wednesday, April 22, 2015 at 1:15:20 PM UTC-7, Stefan Karpinski wrote:

    Honestly, I'm pretty pleased with that performance. This kind of thing 
is Go's bread and butter – being within a factor of 2 of Go at something 
like this is really good. That said, if you do figure out anything that's a 
bottleneck here, please file issues – there's no fundamental reason Julia 
can't be just as fast or faster than any other language at this.

Stefan, yes, it is about 2x if I subtract the 10 seconds or so (whatever it 
appears to me) as the startup time. I am running Julia 0.3.7 on a box with 
a deprecated GnuTLS (RHEL). The deprecation warning msg comes about 8 
seconds into the run and I wait another 2 seconds before I see the first 
print statement from my code ("Started N processes" message). My 
calculations already exclude these 10 seconds. 
I wonder if I would get better startup time with 0.4, but Requests.jl is 
not compatible with it (nor do I find any other library for 0.4). I will 
try 0.4 again and see I can fix Requests.jl

Any help is appreciated on further analysis of the profile output.

Thanks
--
Harry

On Thursday, April 23, 2015 at 7:21:11 AM UTC-7, Seth wrote:
>
> The use of Requests.jl makes this very hard to benchmark accurately since 
> it introduces (non-measurable) dependencies on network resources.
>
> If you @profile the function, can you tell where it's spending most of its 
> time?
>
> On Tuesday, April 21, 2015 at 2:12:52 PM UTC-7, Harry B wrote:
>>
>> Hello,
>>
>> I had the need to take a text file with several million lines, construct 
>> a URL with parameters picked from the tab limited file, and fire them one 
>> after the other. After I read about Julia, I decided to try this in Julia. 
>> However my initial implementation turned out to be slow and I was getting 
>> close to my deadline. I then kept the Julia implementation aside and wrote 
>> the same thing in Go, my other favorite language. Go version is twice (at 
>> least) as fast as the Julia version. Now the task/deadline is over, I am 
>> coming back to the Julia version to see what I did wrong.
>>
>> Go and Julia version are not written alike. In Go, I have just one main 
>> thread reading a file and 5 go-routines waiting in a channel and one of 
>> them will get the 'line/job' and fire off the url, wait for a response, 
>> parse the JSON, and look for an id in a specific place, and go back to wait 
>> for more items from the channel. 
>>
>> Julia code is very similar to the one discussed in the thread quoted 
>> below. I invoke Julia with -p 5 and then have *each* process open the file 
>> and read all lines. However each process is only processing 1/5th of the 
>> lines and skipping others. It is a slight modification of what was 
>> discussed in this thread 
>> https://groups.google.com/d/msg/julia-users/Kr8vGwdXcJA/8ynOghlYaGgJ
>>
>> Julia code (no server URL or source for that though ) : 
>> https://github.com/harikb/scratchpad1/tree/master/julia2
>> Server could be anything that returns a static JSON.
>>
>> Considering the files will entirely fit in filesystem cache and I am 
>> running this on a fairly large system (procinfo says 24 cores, 100G ram, 
>> 50G or free even after removing cached). The input file is only 875K. This 
>> should ideally mean I can read the files several times in any programming 
>> language and not skip a beat. wc -l on the file takes only 0m0.002s . Any 
>> log/output is written to a fusion-io based flash disk. All fairly high end.
>>
>> https://github.com/harikb/scratchpad1/tree/master/julia2
>>
>> At this point, considering the machine is reasonably good, the only 
>> bottleneck should be the time URL firing takes (it is a GET request, but 
>> the other side has some processing to do) or the subsequent JSON parsing.
>>
>> Where do I go from here? How do I find out (a) are HTTP connections being 
>> re-used by the underlying library? I am using this library 
>> https://github.com/JuliaWeb/Requests.jl
>> If not, that could answer this difference. How do I profile this code? I 
>> am using julia 0.3.7 (since Requests.jl does not work with 0.4 nightly)
>>
>> Any help is appreciated.
>> Thanks
>> --
>> Harry
>>
>>

[julia-users] Re: Need help/direction on how to optimize some Julia code

Reply via email to