Well Julia newbie here! I intend to implement a number of Bayesian
hierarchical clustering models (more specifically topic models) in Julia
and here is my implementation for Latent Dirichlet Allocation
<https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation> as a gist:
https://gist.github.com/odinay/3e49d50ba580a9bff8e3
I shall say my Julia implementation is almost 100 times faster than my
Python(NumPy) implementation. For instance for a simulated dataset from 5
clusters with 1000 observations each containing 100 points:
true_kk = 5
n_groups = 1000
n_group_j = 100 * ones(Int64, n_groups)
Julia spends nearly 0.1 sec for each LDA Gibbs sampling iteration while it
takes almost 9.5 sec in Python on my machine. But the code is still slow
for real datasets. I know that Gibbs Inference for these models is
expensive in nature. But how can I make sure I have optimised the
performance of my code to the best. For example for a slightly bigger
dataset such as
true_kk =20
n_groups = 1000
n_group_j =1000 *ones(Int64, n_groups)
the output is:
iteration: 98, number of components: 20, elapsed time: 3.209459973
iteration: 99, number of components: 20, elapsed time: 3.265090272
iteration: 100, number of components: 20, elapsed time: 3.204902689
elapsed time: 332.600401208 seconds (20800255280 bytes allocated, 12.87% gc
time)
As I move to more complex models, optimizing the code to the best becomes a
bigger concern. How can I make sure *without changing the algorithm *(I
don't want to use other Bayesian approaches like variational methods or
so), this is the best performance I can get? Also parallelization is not
the answer. Although efficient parallel Gibbs sampling for LDA has been
proposed (e.g. here <http://lccc.eecs.berkeley.edu/Slides/Gonzalez10.pdf>),
it is not the case for more complex statistical models. Thus I want to know
if I am tuning the loops and passing vars and types correctly or it can be
done more efficiently.
What made me unsure of my work is the huge amount of data that is
allocated, almost 20 GB. I am aware that since numbers are immutable types,
Julia has to copy them for manipulation and calculations. But considering
the complexity of my problem (3 nested loops) and size of my data, maybe
based on your experience you can tell if moving around 20 GB is normal or I
am doing something wrong?
Best,
Adham
julia> versioninfo()
Julia Version 0.3.11
Commit 483dbf5* (2015-07-27 06:18 UTC)
Platform Info:
System: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3