[ 
https://issues.apache.org/jira/browse/SYSTEMML-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nakul Jindal updated SYSTEMML-1396:
-----------------------------------
    Description: 
The current version of deallocating cuda memory chunks is done asynchronously. 
That came about as a result of the {{cudaFree}} operations being expensive and 
so the thought process of doing cudaFree asynchronously was that the cudaFree 
could happen when the CPU was busy with other work. In tight loops where most 
operations are done on the GPU, the asynchronous cudaFree weren't really 
asynchronous. Operations waiting to use the GPU would pay the penalty for the 
cudaFree operation.

After adding extra instrumentation, it was determined that {{cudaAlloc}} 
operations were fairly expensive as well. 
Most GPU operations are done in loops with constantly allocating and 
deallocating the same size of memory chunks per loop. What would be more 
efficient is to "clear out" or set the memory to 0 instead.

  was:
The current version of deallocating cuda memory chunks is done lazily. That 
came about as a result of the {{cudaFree}} operations being expensive. After 
adding extra instrumentation, it was determined that {{cudaAlloc}} operations 
were fairly expensive as well. 
Most GPU operations are done in loops with constantly allocating and 
deallocating the same size of memory chunks per loop. What would be more 
efficient is to "clear out" or set the memory to 0 instead.


> Enable lazily freeing cuda allocated memory chunks
> --------------------------------------------------
>
>                 Key: SYSTEMML-1396
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1396
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Runtime
>            Reporter: Nakul Jindal
>            Assignee: Nakul Jindal
>             Fix For: SystemML 1.0
>
>
> The current version of deallocating cuda memory chunks is done 
> asynchronously. That came about as a result of the {{cudaFree}} operations 
> being expensive and so the thought process of doing cudaFree asynchronously 
> was that the cudaFree could happen when the CPU was busy with other work. In 
> tight loops where most operations are done on the GPU, the asynchronous 
> cudaFree weren't really asynchronous. Operations waiting to use the GPU would 
> pay the penalty for the cudaFree operation.
> After adding extra instrumentation, it was determined that {{cudaAlloc}} 
> operations were fairly expensive as well. 
> Most GPU operations are done in loops with constantly allocating and 
> deallocating the same size of memory chunks per loop. What would be more 
> efficient is to "clear out" or set the memory to 0 instead.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to