Re: [julia-users] Re: Performance difference between running in REPL and calling a script?

Michael Bullman Fri, 27 Mar 2015 10:52:58 -0700

Hi Stefan, 

I'm attaching the code I'm running. I'll type out a description of how the 
code runs.

The code is supposed to simulate two different load balancing algorithms in 
a system between a load balancer and a server cluster. 

I define two user Type: Fyle and node
Fyle has 3 field definitions Size::Float54, OrigSize::Float64, and 
lifetime::Int
Initially it had functions to retrieve these values, but those have been 
commented out and I replaced them to be Fyle.size to get that value. 

Node is to simulate a sever in the cluster, it's fields and functions 
definitions are a bit more in-depth. 
Node has 5 Fields
Threads: which is an Array which stores Fyle's each entry in the Array is 
supposed to simulate a process thread.
MaxThreads::Int, gives the length of the threads Array, I've been thinking 
of using to actually limit the Aray size, but haven't implemented that yet.
complete:: Also an array of Fyle's, after a files is completed it is moved 
to the complete Array and is removed from the threads array
percent::Float64 gets calculated by a length(threads)/maxthreads in a 
function

add_file push!()'s a fyle into the threads array
pop_file pop!() on threads
process_threads::subtracts 0.25 from each Fyle in the threads array, when a 
Fyle's size drops below 0 it is removed from the threads Array then pushed 
inthe completed Array
thread_utilization computes percent and returns it
to_string actually does nothing but is carry over form the Python code so I 
kept it.

so those are the two types which are the basis of the simulation.

The Simulation is initiated in main(seed, time_sim), in the current 
implementation the seed is used to call srand(seed) before generating the 
file inputs for reproducibility. 
For each time-step t we have a number of files which come into the system. 
This is based off of a fit of data + a randomly generated noise term. When 
the Fyle Input is being generated it's an Array of Fyle Array's . So each 
entry can hold a different number of file to be processed. In the original 
python implementation this was done int he actual simulation loop at each 
step a new list of files would be generated. But to try and speed up the 
julia code I moved this generation to before the simulation then use the 
same Fyle input twice to test each algorithm. Rather than generate the same 
input file twice. 

I also define a bunch of path names here for output files so I can very and 
Analyze my results, these are defined in main() but passed to run(), 
currently two are not actually used, to cut down on writes per time-step. 
At each time steps the Fyles are distribute to the nodes in the cluster 
Array based on either the round robin algorithm or a least threads 
utilization algorithm. And then all the nodes process their thread at the 
end of each time-step. 

Every 60 time steps results are averaged, data is written to a .csv file, 
and data structure are reinitialized to begin a fresh 60 time-step 
measurement.  

I think that mostly covers the code from a high level. Only other thing I 
can think of is that every time a Fyle is generated in the initialization 
phase it performs and inverse_transform_sample to randomly generate Fyle 
sizes. This involves a binary search through an Array, in the original 
Python implementation this was a linear search. 

I've really tried to make minor tweaks between the two versions to speed 
them up but so far no luck. 

Thanks for any help

On Wednesday, March 25, 2015 at 3:32:29 AM UTC-4, Stefan Karpinski wrote:
>
> Yes, writing to a file is one of the slower things you can do. So if 
> that's in a performance-critical loop it will very much slow things down. 
> But that would be true for Python and PyPy as well. Are you doing the same 
> thing in that code? 
>
>
> > On Mar 25, 2015, at 4:00 AM, Michael Bullman <[email protected] 
> <javascript:>> wrote: 
> > 
> > Hi Guys, 
> > 
> > So I just went back through my code. I didn't see any global variables. 
> I'm going to try and start using the @time macro tomorrow to try and 
> identify the worse functions. Would writes to file significantly impact 
> speed? I know looking on google writing to files is frowned upon, but what 
> is a better alternative? Hold everything in an Array until the program 
> finishes then write out at the end? Are data bases a viable option when 
> output is very large? Or when records need to be kept? 
> > 
> > I'm also going over the code again and might post a copy if people are 
> interested, but I'm not going to be doing that tonight. 
> > 
> > Thanks again 
>

lb_sim_v3.jl
Description: Binary data

Re: [julia-users] Re: Performance difference between running in REPL and calling a script?

Reply via email to