Cool to see more Bayesian inference in Julia! Those are the generic tips in case you haven't gone through them:
http://julia.readthedocs.org/en/latest/manual/performance-tips/ I particularly recommend profiling your code with Profile.clear() @profile ...some_function_call... ProfileView.view() # You'll have to Pkg.add it The red boxes will show memory allocation. Also, if you @time your code, it'll tell you what fraction of the time is spent in GC (most likely a lot if it's 20 GB). That's quite a bit of code, if you can tell us which part is the bottleneck, it'll be easier to help out. Best, Cédric On Wednesday, September 9, 2015 at 9:17:34 AM UTC-4, Adham Beyki wrote: > > Well Julia newbie here! I intend to implement a number of Bayesian > hierarchical clustering models (more specifically topic models) in Julia > and here is my implementation for Latent Dirichlet Allocation > <https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation> as a gist: > https://gist.github.com/odinay/3e49d50ba580a9bff8e3 > > > I shall say my Julia implementation is almost 100 times faster than my > Python(NumPy) implementation. For instance for a simulated dataset from 5 > clusters with 1000 observations each containing 100 points: > > > true_kk = 5 > n_groups = 1000 > n_group_j = 100 * ones(Int64, n_groups) > > > Julia spends nearly 0.1 sec for each LDA Gibbs sampling iteration while it > takes almost 9.5 sec in Python on my machine. But the code is still slow > for real datasets. I know that Gibbs Inference for these models is > expensive in nature. But how can I make sure I have optimised the > performance of my code to the best. For example for a slightly bigger > dataset such as > > > true_kk =20 > n_groups = 1000 > n_group_j =1000 *ones(Int64, n_groups) > > > the output is: > > > iteration: 98, number of components: 20, elapsed time: 3.209459973 > > iteration: 99, number of components: 20, elapsed time: 3.265090272 > > iteration: 100, number of components: 20, elapsed time: 3.204902689 > > elapsed time: 332.600401208 seconds (20800255280 bytes allocated, 12.87% > gc time) > > > As I move to more complex models, optimizing the code to the best becomes > a bigger concern. How can I make sure *without changing the algorithm *(I > don't want to use other Bayesian approaches like variational methods or > so), this is the best performance I can get? Also parallelization is not > the answer. Although efficient parallel Gibbs sampling for LDA has been > proposed (e.g. here <http://lccc.eecs.berkeley.edu/Slides/Gonzalez10.pdf>), > it is not the case for more complex statistical models. Thus I want to know > if I am tuning the loops and passing vars and types correctly or it can be > done more efficiently. > > > What made me unsure of my work is the huge amount of data that is > allocated, almost 20 GB. I am aware that since numbers are immutable types, > Julia has to copy them for manipulation and calculations. But considering > the complexity of my problem (3 nested loops) and size of my data, maybe > based on your experience you can tell if moving around 20 GB is normal or I > am doing something wrong? > > > Best, > > Adham > > > julia> versioninfo() > Julia Version 0.3.11 > Commit 483dbf5* (2015-07-27 06:18 UTC) > Platform Info: > System: Windows (x86_64-w64-mingw32) > CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz > WORD_SIZE: 64 > BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge) > LAPACK: libopenblas > LIBM: libopenlibm > LLVM: libLLVM-3.3 >
