[ 
https://issues.apache.org/jira/browse/MADLIB-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092313#comment-15092313
 ] 

Frank McQuillan commented on MADLIB-945:
----------------------------------------

perf test by Gautam:

"Thanks for creating the JIRA for this. I had a chance to test my algorithm on 
a full rack GPDB cluster. The matrix operations appear a lot slower on the 
cluster than on the sandbox. For instance, for a matrix of dimension 8525 x 11 
(stored as table mcmc.x in the code below), the matrix multiplication takes 
approximately 30 seconds. As a result, if there are 5 matrix operations within 
an iteration, then each iteration takes approximately 2.5 - 3 minutes. So, for 
an MCMC simulation comprising of 2000 iterations, the run time is in the order 
of 3-4 days! There might be something going on in the matrix operations, which 
needs to be further investigated."

{code:sql}
gmurlidhar=# select 

gmurlidhar-#     madlib.matrix_mult(

gmurlidhar(#         'mcmc.x',

gmurlidhar(#         'row=rowid,val=row_vec, trans=true', 

gmurlidhar(#         'mcmc.x',

gmurlidhar(#         'row=rowid, val=row_vec', 

gmurlidhar(#         'mcmc.x_t_x',

gmurlidhar(#         'row=rowid, val=row_vec'

gmurlidhar(#     );

NOTICE:  table "matrix_out1" does not exist, skipping

CONTEXT:  SQL statement "DROP TABLE IF EXISTS matrix_out1"

PL/Python function "matrix_mult"

NOTICE:  table "matrix_out2" does not exist, skipping

CONTEXT:  SQL statement "DROP TABLE IF EXISTS matrix_out2"

PL/Python function "matrix_mult"

NOTICE:  table "matrix_out3" does not exist, skipping

CONTEXT:  SQL statement "DROP TABLE IF EXISTS matrix_out3"

PL/Python function "matrix_mult"

NOTICE:  table "matrix_out4" does not exist, skipping

CONTEXT:  SQL statement "DROP TABLE IF EXISTS matrix_out4"

PL/Python function "matrix_mult"

NOTICE:  table "matrix_out5" does not exist, skipping

CONTEXT:  SQL statement "DROP TABLE IF EXISTS matrix_out5"

PL/Python function "matrix_mult"

-[ RECORD 1 ]-------------

matrix_mult | (mcmc.x_t_x)

Time: 28409.857 ms
{code}

> Investigate potential matrix memory leak
> ----------------------------------------
>
>                 Key: MADLIB-945
>                 URL: https://issues.apache.org/jira/browse/MADLIB-945
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Module: Utilities
>            Reporter: Frank McQuillan
>             Fix For: v1.9
>
>         Attachments: MemoryTest.ipynb, MemoryTest.pdf
>
>
> Opened on behalf of Gautam Muralidhar:
> Hi Guys,
> So, I carried out some analysis for the memory bottlenecks last evening on 
> the single node GPDB Sandbox VM . I stripped down the MCMC iteration code to 
> a single table deletion and creation. Basically, I just drop a madlib 
> matrix_product result table and create it again and repeat this drop and 
> create for 3,000 iterations. Please see the attached code and look at the 
> function mcmc.MemTest (you can focus on the loop for ite in 
> range(0,num_iter):).
> What I noticed was that the first 400 or so iterations go through quickly, 
> but then it slows down immensely. For instance, I started a 3,000 iteration 
> run last evening at around 6 PM and it was still running this morning at 10 
> AM! Now, this coupled with a couple of other operations needed for MCMC 
> causes the query to error out with a 'failed to allocate memory' error.
> Questions:
> 1. By any chance, is the MADlib matrix operation not freeing memory 
> completely?
> 2. What overhead is introduced by repeated drop/create table?
> 3. I also observe that the MADlib matrix operations produce a lot of logging 
> on the console. Can we disable this? Do you think this could be a problem, 
> though, console write should be I/O. It will certainly slow down operations, 
> though.
> 4. Lastly, what best practices do you have for creating tables in a loop? Do 
> you guys do something like this in any of the current iterative algorithms in 
> MADlib?
> Lastly, this might vanish on the cluster.
> Best,
> Gautam



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to