[ 
https://issues.apache.org/jira/browse/MADLIB-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan updated MADLIB-945:
-----------------------------------
    Fix Version/s:     (was: v1.9.1)
                   v1.9.2

> Matrix operations performance
> -----------------------------
>
>                 Key: MADLIB-945
>                 URL: https://issues.apache.org/jira/browse/MADLIB-945
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v1.9.2
>
>         Attachments: MemoryTest.ipynb, MemoryTest.pdf
>
>
> Opened on behalf of Gautam Muralidhar:
> Hi Guys,
> So, I carried out some analysis for the memory bottlenecks last evening on 
> the single node GPDB Sandbox VM . I stripped down the MCMC iteration code to 
> a single table deletion and creation. Basically, I just drop a madlib 
> matrix_product result table and create it again and repeat this drop and 
> create for 3,000 iterations. Please see the attached code and look at the 
> function mcmc.MemTest (you can focus on the loop for ite in 
> range(0,num_iter):).
> What I noticed was that the first 400 or so iterations go through quickly, 
> but then it slows down immensely. For instance, I started a 3,000 iteration 
> run last evening at around 6 PM and it was still running this morning at 10 
> AM! Now, this coupled with a couple of other operations needed for MCMC 
> causes the query to error out with a 'failed to allocate memory' error.
> Questions:
> 1. By any chance, is the MADlib matrix operation not freeing memory 
> completely?
> 2. What overhead is introduced by repeated drop/create table?
> 3. I also observe that the MADlib matrix operations produce a lot of logging 
> on the console. Can we disable this? Do you think this could be a problem, 
> though, console write should be I/O. It will certainly slow down operations, 
> though.
> 4. Lastly, what best practices do you have for creating tables in a loop? Do 
> you guys do something like this in any of the current iterative algorithms in 
> MADlib?
> Lastly, this might vanish on the cluster.
> Best,
> Gautam



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to