Here is the output of Profile.print()

https://github.com/harikb/scratchpad1/blob/master/julia2/run3.txt

I don't know how to interpret these results, but I would guess this is 
where the most time is spent

              10769 stream.jl; stream_wait; line: 263
            10774 stream.jl; readavailable; line: 709
             10774 stream.jl; wait_readnb; line: 316

Is the issue that stream.jl is reading byte by byte? If a Content-Length is 
available in the response header (and I know it is), it should probably 
read as one chunk.
Again, I am throwing a dart in the dark. So I should probably stop 
speculating.

Any help is appreciated on the next steps

--
Harry

On Thursday, April 23, 2015 at 5:52:09 PM UTC-7, Tim Holy wrote:
>
> I think it's fair to say that Profile.print() will be quite a lot more 
> informative---all you're getting is the list of lines visited, not 
> anything 
> about how much time each one takes. 
>
> --Tim 
>
> On Thursday, April 23, 2015 04:19:08 PM Harry B wrote: 
> > I am trying to profile this code, so here is what I have so far. I added 
> > the following code to the path taken for the single-process mode. 
> > I didn't bother with the multi-process once since I didn't know how to 
> deal 
> > with @profile and remotecall_wait 
> > 
> >     @profile processOneFile(3085, 35649, filename) 
> >     bt, lidict = Profile.retrieve() 
> >     println("Profiling done") 
> >     for (k,v) in lidict 
> >         println(v) 
> >     end 
> > 
> > Output is here 
> > https://github.com/harikb/scratchpad1/blob/master/julia2/run1.txt   
> (Ran 
> > with julia 0.3.7) 
> > another run 
> > https://github.com/harikb/scratchpad1/blob/master/julia2/run2.txt  (Ran 
> > with julia-debug 0.3.7) - in case it gave better results. 
> > 
> > However, there is quite a few lines marked without line or file info. 
> > 
> > On Wednesday, April 22, 2015 at 2:44:13 AM UTC-7, Yuuki Soho wrote: 
> > 
> >     If I understand correctly now you are doing only 5 requests at the 
> same 
> > time? It seems to me you could do much more. 
> > 
> > But that hides the inefficiency, whatever level it exists. The Go 
> program 
> > also uses only 5 parallel connections. 
> > 
> > On Wednesday, April 22, 2015 at 1:15:20 PM UTC-7, Stefan Karpinski 
> wrote: 
> > 
> >     Honestly, I'm pretty pleased with that performance. This kind of 
> thing 
> > is Go's bread and butter – being within a factor of 2 of Go at something 
> > like this is really good. That said, if you do figure out anything 
> that's a 
> > bottleneck here, please file issues – there's no fundamental reason 
> Julia 
> > can't be just as fast or faster than any other language at this. 
> > 
> > Stefan, yes, it is about 2x if I subtract the 10 seconds or so (whatever 
> it 
> > appears to me) as the startup time. I am running Julia 0.3.7 on a box 
> with 
> > a deprecated GnuTLS (RHEL). The deprecation warning msg comes about 8 
> > seconds into the run and I wait another 2 seconds before I see the first 
> > print statement from my code ("Started N processes" message). My 
> > calculations already exclude these 10 seconds. 
> > I wonder if I would get better startup time with 0.4, but Requests.jl is 
> > not compatible with it (nor do I find any other library for 0.4). I will 
> > try 0.4 again and see I can fix Requests.jl 
> > 
> > Any help is appreciated on further analysis of the profile output. 
> > 
> > Thanks 
> > -- 
> > Harry 
> > 
> > On Thursday, April 23, 2015 at 7:21:11 AM UTC-7, Seth wrote: 
> > > The use of Requests.jl makes this very hard to benchmark accurately 
> since 
> > > it introduces (non-measurable) dependencies on network resources. 
> > > 
> > > If you @profile the function, can you tell where it's spending most of 
> its 
> > > time? 
> > > 
> > > On Tuesday, April 21, 2015 at 2:12:52 PM UTC-7, Harry B wrote: 
> > >> Hello, 
> > >> 
> > >> I had the need to take a text file with several million lines, 
> construct 
> > >> a URL with parameters picked from the tab limited file, and fire them 
> one 
> > >> after the other. After I read about Julia, I decided to try this in 
> > >> Julia. 
> > >> However my initial implementation turned out to be slow and I was 
> getting 
> > >> close to my deadline. I then kept the Julia implementation aside and 
> > >> wrote 
> > >> the same thing in Go, my other favorite language. Go version is twice 
> (at 
> > >> least) as fast as the Julia version. Now the task/deadline is over, I 
> am 
> > >> coming back to the Julia version to see what I did wrong. 
> > >> 
> > >> Go and Julia version are not written alike. In Go, I have just one 
> main 
> > >> thread reading a file and 5 go-routines waiting in a channel and one 
> of 
> > >> them will get the 'line/job' and fire off the url, wait for a 
> response, 
> > >> parse the JSON, and look for an id in a specific place, and go back 
> to 
> > >> wait 
> > >> for more items from the channel. 
> > >> 
> > >> Julia code is very similar to the one discussed in the thread quoted 
> > >> below. I invoke Julia with -p 5 and then have *each* process open the 
> > >> file 
> > >> and read all lines. However each process is only processing 1/5th of 
> the 
> > >> lines and skipping others. It is a slight modification of what was 
> > >> discussed in this thread 
> > >> https://groups.google.com/d/msg/julia-users/Kr8vGwdXcJA/8ynOghlYaGgJ 
> > >> 
> > >> Julia code (no server URL or source for that though ) : 
> > >> https://github.com/harikb/scratchpad1/tree/master/julia2 
> > >> Server could be anything that returns a static JSON. 
> > >> 
> > >> Considering the files will entirely fit in filesystem cache and I am 
> > >> running this on a fairly large system (procinfo says 24 cores, 100G 
> ram, 
> > >> 50G or free even after removing cached). The input file is only 875K. 
> > >> This 
> > >> should ideally mean I can read the files several times in any 
> programming 
> > >> language and not skip a beat. wc -l on the file takes only 0m0.002s . 
> Any 
> > >> log/output is written to a fusion-io based flash disk. All fairly 
> high 
> > >> end. 
> > >> 
> > >> https://github.com/harikb/scratchpad1/tree/master/julia2 
> > >> 
> > >> At this point, considering the machine is reasonably good, the only 
> > >> bottleneck should be the time URL firing takes (it is a GET request, 
> but 
> > >> the other side has some processing to do) or the subsequent JSON 
> parsing. 
> > >> 
> > >> Where do I go from here? How do I find out (a) are HTTP connections 
> being 
> > >> re-used by the underlying library? I am using this library 
> > >> https://github.com/JuliaWeb/Requests.jl 
> > >> If not, that could answer this difference. How do I profile this 
> code? I 
> > >> am using julia 0.3.7 (since Requests.jl does not work with 0.4 
> nightly) 
> > >> 
> > >> Any help is appreciated. 
> > >> Thanks 
> > >> -- 
> > >> Harry 
>
>

Reply via email to