Your types should not generally have fields of type Function, they should only contain object state. The way you're sticking function objects into types so that you can do Python-style o-o programming in Julia is an antipattern and will result in terrible performance. Instead, you should define methods for external generic functions, that dispatch on your types. If you have a foo::Foo object, you should be doing f(foo) not foo.f().
On Fri, Mar 27, 2015 at 6:52 PM, Michael Bullman <[email protected]> wrote: > Hi Stefan, > > I'm attaching the code I'm running. I'll type out a description of how the > code runs. > > The code is supposed to simulate two different load balancing algorithms > in a system between a load balancer and a server cluster. > > I define two user Type: Fyle and node > Fyle has 3 field definitions Size::Float54, OrigSize::Float64, and > lifetime::Int > Initially it had functions to retrieve these values, but those have been > commented out and I replaced them to be Fyle.size to get that value. > > Node is to simulate a sever in the cluster, it's fields and functions > definitions are a bit more in-depth. > Node has 5 Fields > Threads: which is an Array which stores Fyle's each entry in the Array is > supposed to simulate a process thread. > MaxThreads::Int, gives the length of the threads Array, I've been thinking > of using to actually limit the Aray size, but haven't implemented that yet. > complete:: Also an array of Fyle's, after a files is completed it is moved > to the complete Array and is removed from the threads array > percent::Float64 gets calculated by a length(threads)/maxthreads in a > function > > add_file push!()'s a fyle into the threads array > pop_file pop!() on threads > process_threads::subtracts 0.25 from each Fyle in the threads array, when > a Fyle's size drops below 0 it is removed from the threads Array then > pushed inthe completed Array > thread_utilization computes percent and returns it > to_string actually does nothing but is carry over form the Python code so > I kept it. > > so those are the two types which are the basis of the simulation. > > The Simulation is initiated in main(seed, time_sim), in the current > implementation the seed is used to call srand(seed) before generating the > file inputs for reproducibility. > For each time-step t we have a number of files which come into the system. > This is based off of a fit of data + a randomly generated noise term. When > the Fyle Input is being generated it's an Array of Fyle Array's . So each > entry can hold a different number of file to be processed. In the original > python implementation this was done int he actual simulation loop at each > step a new list of files would be generated. But to try and speed up the > julia code I moved this generation to before the simulation then use the > same Fyle input twice to test each algorithm. Rather than generate the same > input file twice. > > I also define a bunch of path names here for output files so I can very > and Analyze my results, these are defined in main() but passed to run(), > currently two are not actually used, to cut down on writes per time-step. > At each time steps the Fyles are distribute to the nodes in the cluster > Array based on either the round robin algorithm or a least threads > utilization algorithm. And then all the nodes process their thread at the > end of each time-step. > > Every 60 time steps results are averaged, data is written to a .csv file, > and data structure are reinitialized to begin a fresh 60 time-step > measurement. > > I think that mostly covers the code from a high level. Only other thing I > can think of is that every time a Fyle is generated in the initialization > phase it performs and inverse_transform_sample to randomly generate Fyle > sizes. This involves a binary search through an Array, in the original > Python implementation this was a linear search. > > I've really tried to make minor tweaks between the two versions to speed > them up but so far no luck. > > Thanks for any help > > On Wednesday, March 25, 2015 at 3:32:29 AM UTC-4, Stefan Karpinski wrote: >> >> Yes, writing to a file is one of the slower things you can do. So if >> that's in a performance-critical loop it will very much slow things down. >> But that would be true for Python and PyPy as well. Are you doing the same >> thing in that code? >> >> >> > On Mar 25, 2015, at 4:00 AM, Michael Bullman <[email protected]> >> wrote: >> > >> > Hi Guys, >> > >> > So I just went back through my code. I didn't see any global variables. >> I'm going to try and start using the @time macro tomorrow to try and >> identify the worse functions. Would writes to file significantly impact >> speed? I know looking on google writing to files is frowned upon, but what >> is a better alternative? Hold everything in an Array until the program >> finishes then write out at the end? Are data bases a viable option when >> output is very large? Or when records need to be kept? >> > >> > I'm also going over the code again and might post a copy if people are >> interested, but I'm not going to be doing that tonight. >> > >> > Thanks again >> >
