Re: [julia-users] Re: Performance difference between running in REPL and calling a script?

Stefan Karpinski Fri, 27 Mar 2015 11:28:36 -0700

Your types should not generally have fields of type Function, they should
only contain object state. The way you're sticking function objects into
types so that you can do Python-style o-o programming in Julia is an
antipattern and will result in terrible performance. Instead, you should
define methods for external generic functions, that dispatch on your types.
If you have a foo::Foo object, you should be doing f(foo) not foo.f().



On Fri, Mar 27, 2015 at 6:52 PM, Michael Bullman <[email protected]>
wrote:

> Hi Stefan,
>
> I'm attaching the code I'm running. I'll type out a description of how the
> code runs.
>
> The code is supposed to simulate two different load balancing algorithms
> in a system between a load balancer and a server cluster.
>
> I define two user Type: Fyle and node
> Fyle has 3 field definitions Size::Float54, OrigSize::Float64, and
> lifetime::Int
> Initially it had functions to retrieve these values, but those have been
> commented out and I replaced them to be Fyle.size to get that value.
>
> Node is to simulate a sever in the cluster, it's fields and functions
> definitions are a bit more in-depth.
> Node has 5 Fields
> Threads: which is an Array which stores Fyle's each entry in the Array is
> supposed to simulate a process thread.
> MaxThreads::Int, gives the length of the threads Array, I've been thinking
> of using to actually limit the Aray size, but haven't implemented that yet.
> complete:: Also an array of Fyle's, after a files is completed it is moved
> to the complete Array and is removed from the threads array
> percent::Float64 gets calculated by a length(threads)/maxthreads in a
> function
>
> add_file push!()'s a fyle into the threads array
> pop_file pop!() on threads
> process_threads::subtracts 0.25 from each Fyle in the threads array, when
> a Fyle's size drops below 0 it is removed from the threads Array then
> pushed inthe completed Array
> thread_utilization computes percent and returns it
> to_string actually does nothing but is carry over form the Python code so
> I kept it.
>
> so those are the two types which are the basis of the simulation.
>
> The Simulation is initiated in main(seed, time_sim), in the current
> implementation the seed is used to call srand(seed) before generating the
> file inputs for reproducibility.
> For each time-step t we have a number of files which come into the system.
> This is based off of a fit of data + a randomly generated noise term. When
> the Fyle Input is being generated it's an Array of Fyle Array's . So each
> entry can hold a different number of file to be processed. In the original
> python implementation this was done int he actual simulation loop at each
> step a new list of files would be generated. But to try and speed up the
> julia code I moved this generation to before the simulation then use the
> same Fyle input twice to test each algorithm. Rather than generate the same
> input file twice.
>
> I also define a bunch of path names here for output files so I can very
> and Analyze my results, these are defined in main() but passed to run(),
> currently two are not actually used, to cut down on writes per time-step.
> At each time steps the Fyles are distribute to the nodes in the cluster
> Array based on either the round robin algorithm or a least threads
> utilization algorithm. And then all the nodes process their thread at the
> end of each time-step.
>
> Every 60 time steps results are averaged, data is written to a .csv file,
> and data structure are reinitialized to begin a fresh 60 time-step
> measurement.
>
> I think that mostly covers the code from a high level. Only other thing I
> can think of is that every time a Fyle is generated in the initialization
> phase it performs and inverse_transform_sample to randomly generate Fyle
> sizes. This involves a binary search through an Array, in the original
> Python implementation this was a linear search.
>
> I've really tried to make minor tweaks between the two versions to speed
> them up but so far no luck.
>
> Thanks for any help
>
> On Wednesday, March 25, 2015 at 3:32:29 AM UTC-4, Stefan Karpinski wrote:
>>
>> Yes, writing to a file is one of the slower things you can do. So if
>> that's in a performance-critical loop it will very much slow things down.
>> But that would be true for Python and PyPy as well. Are you doing the same
>> thing in that code?
>>
>>
>> > On Mar 25, 2015, at 4:00 AM, Michael Bullman <[email protected]>
>> wrote:
>> >
>> > Hi Guys,
>> >
>> > So I just went back through my code. I didn't see any global variables.
>> I'm going to try and start using the @time macro tomorrow to try and
>> identify the worse functions. Would writes to file significantly impact
>> speed? I know looking on google writing to files is frowned upon, but what
>> is a better alternative? Hold everything in an Array until the program
>> finishes then write out at the end? Are data bases a viable option when
>> output is very large? Or when records need to be kept?
>> >
>> > I'm also going over the code again and might post a copy if people are
>> interested, but I'm not going to be doing that tonight.
>> >
>> > Thanks again
>>
>

Re: [julia-users] Re: Performance difference between running in REPL and calling a script?

Reply via email to