Here is the output of Profile.print()
https://github.com/harikb/scratchpad1/blob/master/julia2/run3.txt
I don't know how to interpret these results, but I would guess this is
where the most time is spent
10769 stream.jl; stream_wait; line: 263
10774 stream.jl; readavailable; line: 709
10774 stream.jl; wait_readnb; line: 316
Is the issue that stream.jl is reading byte by byte? If a Content-Length is
available in the response header (and I know it is), it should probably
read as one chunk.
Again, I am throwing a dart in the dark. So I should probably stop
speculating.
Any help is appreciated on the next steps
--
Harry
On Thursday, April 23, 2015 at 5:52:09 PM UTC-7, Tim Holy wrote:
>
> I think it's fair to say that Profile.print() will be quite a lot more
> informative---all you're getting is the list of lines visited, not
> anything
> about how much time each one takes.
>
> --Tim
>
> On Thursday, April 23, 2015 04:19:08 PM Harry B wrote:
> > I am trying to profile this code, so here is what I have so far. I added
> > the following code to the path taken for the single-process mode.
> > I didn't bother with the multi-process once since I didn't know how to
> deal
> > with @profile and remotecall_wait
> >
> > @profile processOneFile(3085, 35649, filename)
> > bt, lidict = Profile.retrieve()
> > println("Profiling done")
> > for (k,v) in lidict
> > println(v)
> > end
> >
> > Output is here
> > https://github.com/harikb/scratchpad1/blob/master/julia2/run1.txt
> (Ran
> > with julia 0.3.7)
> > another run
> > https://github.com/harikb/scratchpad1/blob/master/julia2/run2.txt (Ran
> > with julia-debug 0.3.7) - in case it gave better results.
> >
> > However, there is quite a few lines marked without line or file info.
> >
> > On Wednesday, April 22, 2015 at 2:44:13 AM UTC-7, Yuuki Soho wrote:
> >
> > If I understand correctly now you are doing only 5 requests at the
> same
> > time? It seems to me you could do much more.
> >
> > But that hides the inefficiency, whatever level it exists. The Go
> program
> > also uses only 5 parallel connections.
> >
> > On Wednesday, April 22, 2015 at 1:15:20 PM UTC-7, Stefan Karpinski
> wrote:
> >
> > Honestly, I'm pretty pleased with that performance. This kind of
> thing
> > is Go's bread and butter – being within a factor of 2 of Go at something
> > like this is really good. That said, if you do figure out anything
> that's a
> > bottleneck here, please file issues – there's no fundamental reason
> Julia
> > can't be just as fast or faster than any other language at this.
> >
> > Stefan, yes, it is about 2x if I subtract the 10 seconds or so (whatever
> it
> > appears to me) as the startup time. I am running Julia 0.3.7 on a box
> with
> > a deprecated GnuTLS (RHEL). The deprecation warning msg comes about 8
> > seconds into the run and I wait another 2 seconds before I see the first
> > print statement from my code ("Started N processes" message). My
> > calculations already exclude these 10 seconds.
> > I wonder if I would get better startup time with 0.4, but Requests.jl is
> > not compatible with it (nor do I find any other library for 0.4). I will
> > try 0.4 again and see I can fix Requests.jl
> >
> > Any help is appreciated on further analysis of the profile output.
> >
> > Thanks
> > --
> > Harry
> >
> > On Thursday, April 23, 2015 at 7:21:11 AM UTC-7, Seth wrote:
> > > The use of Requests.jl makes this very hard to benchmark accurately
> since
> > > it introduces (non-measurable) dependencies on network resources.
> > >
> > > If you @profile the function, can you tell where it's spending most of
> its
> > > time?
> > >
> > > On Tuesday, April 21, 2015 at 2:12:52 PM UTC-7, Harry B wrote:
> > >> Hello,
> > >>
> > >> I had the need to take a text file with several million lines,
> construct
> > >> a URL with parameters picked from the tab limited file, and fire them
> one
> > >> after the other. After I read about Julia, I decided to try this in
> > >> Julia.
> > >> However my initial implementation turned out to be slow and I was
> getting
> > >> close to my deadline. I then kept the Julia implementation aside and
> > >> wrote
> > >> the same thing in Go, my other favorite language. Go version is twice
> (at
> > >> least) as fast as the Julia version. Now the task/deadline is over, I
> am
> > >> coming back to the Julia version to see what I did wrong.
> > >>
> > >> Go and Julia version are not written alike. In Go, I have just one
> main
> > >> thread reading a file and 5 go-routines waiting in a channel and one
> of
> > >> them will get the 'line/job' and fire off the url, wait for a
> response,
> > >> parse the JSON, and look for an id in a specific place, and go back
> to
> > >> wait
> > >> for more items from the channel.
> > >>
> > >> Julia code is very similar to the one discussed in the thread quoted
> > >> below. I invoke Julia with -p 5 and then have *each* process open the
> > >> file
> > >> and read all lines. However each process is only processing 1/5th of
> the
> > >> lines and skipping others. It is a slight modification of what was
> > >> discussed in this thread
> > >> https://groups.google.com/d/msg/julia-users/Kr8vGwdXcJA/8ynOghlYaGgJ
> > >>
> > >> Julia code (no server URL or source for that though ) :
> > >> https://github.com/harikb/scratchpad1/tree/master/julia2
> > >> Server could be anything that returns a static JSON.
> > >>
> > >> Considering the files will entirely fit in filesystem cache and I am
> > >> running this on a fairly large system (procinfo says 24 cores, 100G
> ram,
> > >> 50G or free even after removing cached). The input file is only 875K.
> > >> This
> > >> should ideally mean I can read the files several times in any
> programming
> > >> language and not skip a beat. wc -l on the file takes only 0m0.002s .
> Any
> > >> log/output is written to a fusion-io based flash disk. All fairly
> high
> > >> end.
> > >>
> > >> https://github.com/harikb/scratchpad1/tree/master/julia2
> > >>
> > >> At this point, considering the machine is reasonably good, the only
> > >> bottleneck should be the time URL firing takes (it is a GET request,
> but
> > >> the other side has some processing to do) or the subsequent JSON
> parsing.
> > >>
> > >> Where do I go from here? How do I find out (a) are HTTP connections
> being
> > >> re-used by the underlying library? I am using this library
> > >> https://github.com/JuliaWeb/Requests.jl
> > >> If not, that could answer this difference. How do I profile this
> code? I
> > >> am using julia 0.3.7 (since Requests.jl does not work with 0.4
> nightly)
> > >>
> > >> Any help is appreciated.
> > >> Thanks
> > >> --
> > >> Harry
>
>