Honestly, I'm pretty pleased with that performance. This kind of thing is
Go's bread and butter – being within a factor of 2 of Go at something like
this is really good. That said, if you do figure out anything that's a
bottleneck here, please file issues – there's no fundamental reason Julia
can't be just as fast or faster than any other language at this.

On Tue, Apr 21, 2015 at 5:12 PM, Harry B <[email protected]> wrote:

> Hello,
>
> I had the need to take a text file with several million lines, construct a
> URL with parameters picked from the tab limited file, and fire them one
> after the other. After I read about Julia, I decided to try this in Julia.
> However my initial implementation turned out to be slow and I was getting
> close to my deadline. I then kept the Julia implementation aside and wrote
> the same thing in Go, my other favorite language. Go version is twice (at
> least) as fast as the Julia version. Now the task/deadline is over, I am
> coming back to the Julia version to see what I did wrong.
>
> Go and Julia version are not written alike. In Go, I have just one main
> thread reading a file and 5 go-routines waiting in a channel and one of
> them will get the 'line/job' and fire off the url, wait for a response,
> parse the JSON, and look for an id in a specific place, and go back to wait
> for more items from the channel.
>
> Julia code is very similar to the one discussed in the thread quoted
> below. I invoke Julia with -p 5 and then have *each* process open the file
> and read all lines. However each process is only processing 1/5th of the
> lines and skipping others. It is a slight modification of what was
> discussed in this thread
> https://groups.google.com/d/msg/julia-users/Kr8vGwdXcJA/8ynOghlYaGgJ
>
> Julia code (no server URL or source for that though ) :
> https://github.com/harikb/scratchpad1/tree/master/julia2
> Server could be anything that returns a static JSON.
>
> Considering the files will entirely fit in filesystem cache and I am
> running this on a fairly large system (procinfo says 24 cores, 100G ram,
> 50G or free even after removing cached). The input file is only 875K. This
> should ideally mean I can read the files several times in any programming
> language and not skip a beat. wc -l on the file takes only 0m0.002s . Any
> log/output is written to a fusion-io based flash disk. All fairly high end.
>
> https://github.com/harikb/scratchpad1/tree/master/julia2
>
> At this point, considering the machine is reasonably good, the only
> bottleneck should be the time URL firing takes (it is a GET request, but
> the other side has some processing to do) or the subsequent JSON parsing.
>
> Where do I go from here? How do I find out (a) are HTTP connections being
> re-used by the underlying library? I am using this library
> https://github.com/JuliaWeb/Requests.jl
> If not, that could answer this difference. How do I profile this code? I
> am using julia 0.3.7 (since Requests.jl does not work with 0.4 nightly)
>
> Any help is appreciated.
> Thanks
> --
> Harry
>
>

Reply via email to