[
https://issues.apache.org/jira/browse/MADLIB-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092313#comment-15092313
]
Frank McQuillan commented on MADLIB-945:
----------------------------------------
perf test by Gautam:
"Thanks for creating the JIRA for this. I had a chance to test my algorithm on
a full rack GPDB cluster. The matrix operations appear a lot slower on the
cluster than on the sandbox. For instance, for a matrix of dimension 8525 x 11
(stored as table mcmc.x in the code below), the matrix multiplication takes
approximately 30 seconds. As a result, if there are 5 matrix operations within
an iteration, then each iteration takes approximately 2.5 - 3 minutes. So, for
an MCMC simulation comprising of 2000 iterations, the run time is in the order
of 3-4 days! There might be something going on in the matrix operations, which
needs to be further investigated."
{code:sql}
gmurlidhar=# select
gmurlidhar-# madlib.matrix_mult(
gmurlidhar(# 'mcmc.x',
gmurlidhar(# 'row=rowid,val=row_vec, trans=true',
gmurlidhar(# 'mcmc.x',
gmurlidhar(# 'row=rowid, val=row_vec',
gmurlidhar(# 'mcmc.x_t_x',
gmurlidhar(# 'row=rowid, val=row_vec'
gmurlidhar(# );
NOTICE: table "matrix_out1" does not exist, skipping
CONTEXT: SQL statement "DROP TABLE IF EXISTS matrix_out1"
PL/Python function "matrix_mult"
NOTICE: table "matrix_out2" does not exist, skipping
CONTEXT: SQL statement "DROP TABLE IF EXISTS matrix_out2"
PL/Python function "matrix_mult"
NOTICE: table "matrix_out3" does not exist, skipping
CONTEXT: SQL statement "DROP TABLE IF EXISTS matrix_out3"
PL/Python function "matrix_mult"
NOTICE: table "matrix_out4" does not exist, skipping
CONTEXT: SQL statement "DROP TABLE IF EXISTS matrix_out4"
PL/Python function "matrix_mult"
NOTICE: table "matrix_out5" does not exist, skipping
CONTEXT: SQL statement "DROP TABLE IF EXISTS matrix_out5"
PL/Python function "matrix_mult"
-[ RECORD 1 ]-------------
matrix_mult | (mcmc.x_t_x)
Time: 28409.857 ms
{code}
> Investigate potential matrix memory leak
> ----------------------------------------
>
> Key: MADLIB-945
> URL: https://issues.apache.org/jira/browse/MADLIB-945
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Fix For: v1.9
>
> Attachments: MemoryTest.ipynb, MemoryTest.pdf
>
>
> Opened on behalf of Gautam Muralidhar:
> Hi Guys,
> So, I carried out some analysis for the memory bottlenecks last evening on
> the single node GPDB Sandbox VM . I stripped down the MCMC iteration code to
> a single table deletion and creation. Basically, I just drop a madlib
> matrix_product result table and create it again and repeat this drop and
> create for 3,000 iterations. Please see the attached code and look at the
> function mcmc.MemTest (you can focus on the loop for ite in
> range(0,num_iter):).
> What I noticed was that the first 400 or so iterations go through quickly,
> but then it slows down immensely. For instance, I started a 3,000 iteration
> run last evening at around 6 PM and it was still running this morning at 10
> AM! Now, this coupled with a couple of other operations needed for MCMC
> causes the query to error out with a 'failed to allocate memory' error.
> Questions:
> 1. By any chance, is the MADlib matrix operation not freeing memory
> completely?
> 2. What overhead is introduced by repeated drop/create table?
> 3. I also observe that the MADlib matrix operations produce a lot of logging
> on the console. Can we disable this? Do you think this could be a problem,
> though, console write should be I/O. It will certainly slow down operations,
> though.
> 4. Lastly, what best practices do you have for creating tables in a loop? Do
> you guys do something like this in any of the current iterative algorithms in
> MADlib?
> Lastly, this might vanish on the cluster.
> Best,
> Gautam
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)