The use of Requests.jl makes this very hard to benchmark accurately since it introduces (non-measurable) dependencies on network resources.
If you @profile the function, can you tell where it's spending most of its time? On Tuesday, April 21, 2015 at 2:12:52 PM UTC-7, Harry B wrote: > > Hello, > > I had the need to take a text file with several million lines, construct a > URL with parameters picked from the tab limited file, and fire them one > after the other. After I read about Julia, I decided to try this in Julia. > However my initial implementation turned out to be slow and I was getting > close to my deadline. I then kept the Julia implementation aside and wrote > the same thing in Go, my other favorite language. Go version is twice (at > least) as fast as the Julia version. Now the task/deadline is over, I am > coming back to the Julia version to see what I did wrong. > > Go and Julia version are not written alike. In Go, I have just one main > thread reading a file and 5 go-routines waiting in a channel and one of > them will get the 'line/job' and fire off the url, wait for a response, > parse the JSON, and look for an id in a specific place, and go back to wait > for more items from the channel. > > Julia code is very similar to the one discussed in the thread quoted > below. I invoke Julia with -p 5 and then have *each* process open the file > and read all lines. However each process is only processing 1/5th of the > lines and skipping others. It is a slight modification of what was > discussed in this thread > https://groups.google.com/d/msg/julia-users/Kr8vGwdXcJA/8ynOghlYaGgJ > > Julia code (no server URL or source for that though ) : > https://github.com/harikb/scratchpad1/tree/master/julia2 > Server could be anything that returns a static JSON. > > Considering the files will entirely fit in filesystem cache and I am > running this on a fairly large system (procinfo says 24 cores, 100G ram, > 50G or free even after removing cached). The input file is only 875K. This > should ideally mean I can read the files several times in any programming > language and not skip a beat. wc -l on the file takes only 0m0.002s . Any > log/output is written to a fusion-io based flash disk. All fairly high end. > > https://github.com/harikb/scratchpad1/tree/master/julia2 > > At this point, considering the machine is reasonably good, the only > bottleneck should be the time URL firing takes (it is a GET request, but > the other side has some processing to do) or the subsequent JSON parsing. > > Where do I go from here? How do I find out (a) are HTTP connections being > re-used by the underlying library? I am using this library > https://github.com/JuliaWeb/Requests.jl > If not, that could answer this difference. How do I profile this code? I > am using julia 0.3.7 (since Requests.jl does not work with 0.4 nightly) > > Any help is appreciated. > Thanks > -- > Harry > >
