Cool to see more Bayesian inference in Julia! Those are the generic tips in 
case you haven't gone through them:

http://julia.readthedocs.org/en/latest/manual/performance-tips/

I particularly recommend profiling your code with

Profile.clear()
@profile ...some_function_call...
ProfileView.view()   # You'll have to Pkg.add it

The red boxes will show memory allocation. Also, if you @time your code, 
it'll tell you what fraction of the time is spent in GC (most likely a lot 
if it's 20 GB). 

That's quite a bit of code, if you can tell us which part is the 
bottleneck, it'll be easier to help out.

Best,

Cédric


On Wednesday, September 9, 2015 at 9:17:34 AM UTC-4, Adham Beyki wrote:
>
> Well Julia newbie here! I intend to implement a number of Bayesian 
> hierarchical clustering models (more specifically topic models) in Julia 
> and here is my implementation for Latent Dirichlet Allocation 
> <https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation> as a gist:
> https://gist.github.com/odinay/3e49d50ba580a9bff8e3
>
>
> I shall say my Julia implementation is almost 100 times faster than my 
> Python(NumPy) implementation. For instance for a simulated dataset from 5 
> clusters with 1000 observations each containing 100 points:
>
>
> true_kk = 5
> n_groups = 1000
> n_group_j = 100 * ones(Int64, n_groups)
>
>
> Julia spends nearly 0.1 sec for each LDA Gibbs sampling iteration while it 
> takes almost 9.5 sec in Python on my machine. But the code is still slow 
> for real datasets. I know that Gibbs Inference for these models is 
> expensive in nature. But how can I make sure I have optimised the 
> performance of my code to the best. For example for a slightly bigger 
> dataset such as
>
>
> true_kk =20
> n_groups = 1000
> n_group_j =1000 *ones(Int64, n_groups)
>
>
> the output is:
>
>
> iteration: 98, number of components: 20, elapsed time: 3.209459973         
>            
> iteration: 99, number of components: 20, elapsed time: 3.265090272         
>            
> iteration: 100, number of components: 20, elapsed time: 3.204902689       
>             
> elapsed time: 332.600401208 seconds (20800255280 bytes allocated, 12.87% 
> gc time)     
>
>
> As I move to more complex models, optimizing the code to the best becomes 
> a bigger concern. How can I make sure *without changing the algorithm *(I 
> don't want to use other Bayesian approaches like variational methods or 
> so), this is the best performance I can get?  Also parallelization is not 
> the answer. Although efficient parallel Gibbs sampling for LDA has been 
> proposed (e.g. here <http://lccc.eecs.berkeley.edu/Slides/Gonzalez10.pdf>), 
> it is not the case for more complex statistical models. Thus I want to know 
> if I am tuning the loops and passing vars and types correctly or it can be 
> done more efficiently.
>
>
> What made me unsure of my work is the huge amount of data that is 
> allocated, almost 20 GB. I am aware that since numbers are immutable types, 
> Julia has to copy them for manipulation and calculations. But considering 
> the complexity of my problem (3 nested loops) and size of my data, maybe 
> based on your experience you can tell if moving around 20 GB is normal or I 
> am doing something wrong?
>
>
> Best, 
>
> Adham
>
>
> julia> versioninfo()
> Julia Version 0.3.11
> Commit 483dbf5* (2015-07-27 06:18 UTC)
> Platform Info:
>   System: Windows (x86_64-w64-mingw32)
>   CPU: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
>   WORD_SIZE: 64
>   BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
>   LAPACK: libopenblas
>   LIBM: libopenlibm
>   LLVM: libLLVM-3.3
>

Reply via email to