The use of Requests.jl makes this very hard to benchmark accurately since 
it introduces (non-measurable) dependencies on network resources.

If you @profile the function, can you tell where it's spending most of its 
time?

On Tuesday, April 21, 2015 at 2:12:52 PM UTC-7, Harry B wrote:
>
> Hello,
>
> I had the need to take a text file with several million lines, construct a 
> URL with parameters picked from the tab limited file, and fire them one 
> after the other. After I read about Julia, I decided to try this in Julia. 
> However my initial implementation turned out to be slow and I was getting 
> close to my deadline. I then kept the Julia implementation aside and wrote 
> the same thing in Go, my other favorite language. Go version is twice (at 
> least) as fast as the Julia version. Now the task/deadline is over, I am 
> coming back to the Julia version to see what I did wrong.
>
> Go and Julia version are not written alike. In Go, I have just one main 
> thread reading a file and 5 go-routines waiting in a channel and one of 
> them will get the 'line/job' and fire off the url, wait for a response, 
> parse the JSON, and look for an id in a specific place, and go back to wait 
> for more items from the channel. 
>
> Julia code is very similar to the one discussed in the thread quoted 
> below. I invoke Julia with -p 5 and then have *each* process open the file 
> and read all lines. However each process is only processing 1/5th of the 
> lines and skipping others. It is a slight modification of what was 
> discussed in this thread 
> https://groups.google.com/d/msg/julia-users/Kr8vGwdXcJA/8ynOghlYaGgJ
>
> Julia code (no server URL or source for that though ) : 
> https://github.com/harikb/scratchpad1/tree/master/julia2
> Server could be anything that returns a static JSON.
>
> Considering the files will entirely fit in filesystem cache and I am 
> running this on a fairly large system (procinfo says 24 cores, 100G ram, 
> 50G or free even after removing cached). The input file is only 875K. This 
> should ideally mean I can read the files several times in any programming 
> language and not skip a beat. wc -l on the file takes only 0m0.002s . Any 
> log/output is written to a fusion-io based flash disk. All fairly high end.
>
> https://github.com/harikb/scratchpad1/tree/master/julia2
>
> At this point, considering the machine is reasonably good, the only 
> bottleneck should be the time URL firing takes (it is a GET request, but 
> the other side has some processing to do) or the subsequent JSON parsing.
>
> Where do I go from here? How do I find out (a) are HTTP connections being 
> re-used by the underlying library? I am using this library 
> https://github.com/JuliaWeb/Requests.jl
> If not, that could answer this difference. How do I profile this code? I 
> am using julia 0.3.7 (since Requests.jl does not work with 0.4 nightly)
>
> Any help is appreciated.
> Thanks
> --
> Harry
>
>

Reply via email to