Hello, I had the need to take a text file with several million lines, construct a URL with parameters picked from the tab limited file, and fire them one after the other. After I read about Julia, I decided to try this in Julia. However my initial implementation turned out to be slow and I was getting close to my deadline. I then kept the Julia implementation aside and wrote the same thing in Go, my other favorite language. Go version is twice (at least) as fast as the Julia version. Now the task/deadline is over, I am coming back to the Julia version to see what I did wrong.
Go and Julia version are not written alike. In Go, I have just one main thread reading a file and 5 go-routines waiting in a channel and one of them will get the 'line/job' and fire off the url, wait for a response, parse the JSON, and look for an id in a specific place, and go back to wait for more items from the channel. Julia code is very similar to the one discussed in the thread quoted below. I invoke Julia with -p 5 and then have *each* process open the file and read all lines. However each process is only processing 1/5th of the lines and skipping others. It is a slight modification of what was discussed in this thread https://groups.google.com/d/msg/julia-users/Kr8vGwdXcJA/8ynOghlYaGgJ Julia code (no server URL or source for that though ) : https://github.com/harikb/scratchpad1/tree/master/julia2 Server could be anything that returns a static JSON. Considering the files will entirely fit in filesystem cache and I am running this on a fairly large system (procinfo says 24 cores, 100G ram, 50G or free even after removing cached). The input file is only 875K. This should ideally mean I can read the files several times in any programming language and not skip a beat. wc -l on the file takes only 0m0.002s . Any log/output is written to a fusion-io based flash disk. All fairly high end. https://github.com/harikb/scratchpad1/tree/master/julia2 At this point, considering the machine is reasonably good, the only bottleneck should be the time URL firing takes (it is a GET request, but the other side has some processing to do) or the subsequent JSON parsing. Where do I go from here? How do I find out (a) are HTTP connections being re-used by the underlying library? I am using this library https://github.com/JuliaWeb/Requests.jl If not, that could answer this difference. How do I profile this code? I am using julia 0.3.7 (since Requests.jl does not work with 0.4 nightly) Any help is appreciated. Thanks -- Harry
