[
https://issues.apache.org/jira/browse/MADLIB-945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank McQuillan updated MADLIB-945:
-----------------------------------
Fix Version/s: (was: v1.9.1)
v1.9.2
> Matrix operations performance
> -----------------------------
>
> Key: MADLIB-945
> URL: https://issues.apache.org/jira/browse/MADLIB-945
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Fix For: v1.9.2
>
> Attachments: MemoryTest.ipynb, MemoryTest.pdf
>
>
> Opened on behalf of Gautam Muralidhar:
> Hi Guys,
> So, I carried out some analysis for the memory bottlenecks last evening on
> the single node GPDB Sandbox VM . I stripped down the MCMC iteration code to
> a single table deletion and creation. Basically, I just drop a madlib
> matrix_product result table and create it again and repeat this drop and
> create for 3,000 iterations. Please see the attached code and look at the
> function mcmc.MemTest (you can focus on the loop for ite in
> range(0,num_iter):).
> What I noticed was that the first 400 or so iterations go through quickly,
> but then it slows down immensely. For instance, I started a 3,000 iteration
> run last evening at around 6 PM and it was still running this morning at 10
> AM! Now, this coupled with a couple of other operations needed for MCMC
> causes the query to error out with a 'failed to allocate memory' error.
> Questions:
> 1. By any chance, is the MADlib matrix operation not freeing memory
> completely?
> 2. What overhead is introduced by repeated drop/create table?
> 3. I also observe that the MADlib matrix operations produce a lot of logging
> on the console. Can we disable this? Do you think this could be a problem,
> though, console write should be I/O. It will certainly slow down operations,
> though.
> 4. Lastly, what best practices do you have for creating tables in a loop? Do
> you guys do something like this in any of the current iterative algorithms in
> MADlib?
> Lastly, this might vanish on the cluster.
> Best,
> Gautam
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)