Re: [julia-users] Re: Need help/direction on how to optimize some Julia code

Tim Holy Thu, 23 Apr 2015 17:52:56 -0700

I think it's fair to say that Profile.print() will be quite a lot more 
informative---all you're getting is the list of lines visited, not anything 
about how much time each one takes.


--Tim

On Thursday, April 23, 2015 04:19:08 PM Harry B wrote:
> I am trying to profile this code, so here is what I have so far. I added
> the following code to the path taken for the single-process mode.
> I didn't bother with the multi-process once since I didn't know how to deal
> with @profile and remotecall_wait
> 
>     @profile processOneFile(3085, 35649, filename)
>     bt, lidict = Profile.retrieve()
>     println("Profiling done")
>     for (k,v) in lidict
>         println(v)
>     end
> 
> Output is here
> https://github.com/harikb/scratchpad1/blob/master/julia2/run1.txt   (Ran
> with julia 0.3.7)
> another run
> https://github.com/harikb/scratchpad1/blob/master/julia2/run2.txt  (Ran
> with julia-debug 0.3.7) - in case it gave better results.
> 
> However, there is quite a few lines marked without line or file info.
> 
> On Wednesday, April 22, 2015 at 2:44:13 AM UTC-7, Yuuki Soho wrote:
> 
>     If I understand correctly now you are doing only 5 requests at the same
> time? It seems to me you could do much more.
> 
> But that hides the inefficiency, whatever level it exists. The Go program
> also uses only 5 parallel connections.
> 
> On Wednesday, April 22, 2015 at 1:15:20 PM UTC-7, Stefan Karpinski wrote:
> 
>     Honestly, I'm pretty pleased with that performance. This kind of thing
> is Go's bread and butter – being within a factor of 2 of Go at something
> like this is really good. That said, if you do figure out anything that's a
> bottleneck here, please file issues – there's no fundamental reason Julia
> can't be just as fast or faster than any other language at this.
> 
> Stefan, yes, it is about 2x if I subtract the 10 seconds or so (whatever it
> appears to me) as the startup time. I am running Julia 0.3.7 on a box with
> a deprecated GnuTLS (RHEL). The deprecation warning msg comes about 8
> seconds into the run and I wait another 2 seconds before I see the first
> print statement from my code ("Started N processes" message). My
> calculations already exclude these 10 seconds.
> I wonder if I would get better startup time with 0.4, but Requests.jl is
> not compatible with it (nor do I find any other library for 0.4). I will
> try 0.4 again and see I can fix Requests.jl
> 
> Any help is appreciated on further analysis of the profile output.
> 
> Thanks
> --
> Harry
> 
> On Thursday, April 23, 2015 at 7:21:11 AM UTC-7, Seth wrote:
> > The use of Requests.jl makes this very hard to benchmark accurately since
> > it introduces (non-measurable) dependencies on network resources.
> > 
> > If you @profile the function, can you tell where it's spending most of its
> > time?
> > 
> > On Tuesday, April 21, 2015 at 2:12:52 PM UTC-7, Harry B wrote:
> >> Hello,
> >> 
> >> I had the need to take a text file with several million lines, construct
> >> a URL with parameters picked from the tab limited file, and fire them one
> >> after the other. After I read about Julia, I decided to try this in
> >> Julia.
> >> However my initial implementation turned out to be slow and I was getting
> >> close to my deadline. I then kept the Julia implementation aside and
> >> wrote
> >> the same thing in Go, my other favorite language. Go version is twice (at
> >> least) as fast as the Julia version. Now the task/deadline is over, I am
> >> coming back to the Julia version to see what I did wrong.
> >> 
> >> Go and Julia version are not written alike. In Go, I have just one main
> >> thread reading a file and 5 go-routines waiting in a channel and one of
> >> them will get the 'line/job' and fire off the url, wait for a response,
> >> parse the JSON, and look for an id in a specific place, and go back to
> >> wait
> >> for more items from the channel.
> >> 
> >> Julia code is very similar to the one discussed in the thread quoted
> >> below. I invoke Julia with -p 5 and then have *each* process open the
> >> file
> >> and read all lines. However each process is only processing 1/5th of the
> >> lines and skipping others. It is a slight modification of what was
> >> discussed in this thread
> >> https://groups.google.com/d/msg/julia-users/Kr8vGwdXcJA/8ynOghlYaGgJ
> >> 
> >> Julia code (no server URL or source for that though ) :
> >> https://github.com/harikb/scratchpad1/tree/master/julia2
> >> Server could be anything that returns a static JSON.
> >> 
> >> Considering the files will entirely fit in filesystem cache and I am
> >> running this on a fairly large system (procinfo says 24 cores, 100G ram,
> >> 50G or free even after removing cached). The input file is only 875K.
> >> This
> >> should ideally mean I can read the files several times in any programming
> >> language and not skip a beat. wc -l on the file takes only 0m0.002s . Any
> >> log/output is written to a fusion-io based flash disk. All fairly high
> >> end.
> >> 
> >> https://github.com/harikb/scratchpad1/tree/master/julia2
> >> 
> >> At this point, considering the machine is reasonably good, the only
> >> bottleneck should be the time URL firing takes (it is a GET request, but
> >> the other side has some processing to do) or the subsequent JSON parsing.
> >> 
> >> Where do I go from here? How do I find out (a) are HTTP connections being
> >> re-used by the underlying library? I am using this library
> >> https://github.com/JuliaWeb/Requests.jl
> >> If not, that could answer this difference. How do I profile this code? I
> >> am using julia 0.3.7 (since Requests.jl does not work with 0.4 nightly)
> >> 
> >> Any help is appreciated.
> >> Thanks
> >> --
> >> Harry

Re: [julia-users] Re: Need help/direction on how to optimize some Julia code

Reply via email to