[ 
https://issues.apache.org/jira/browse/MADLIB-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikhil updated MADLIB-1337:
---------------------------
    Description: 
We support the use case when no of gpus < no of segments however we noticed 
that sometimes this causes gpdb failures like
{code:java}
could not connect to segment: initialization of segworker group failed
{code}
 # We should give a meaningful warning to the user to make them aware that this 
feature may or may not work and also make a recommendation
 # We should also come up with a better heuristic for the memory fraction 
value. Currently we default to using 90% of the available memory and distribute 
it evenly among the segments.

Possible recommendations
 1. Use as many gpus as segments (this may not be practical)
 2. May be a smaller buffer size will help. Use minibatch preprocessor dl to 
pack less images. (we need to test this before we recommend it)

  was:
We support the use case when no of gpus < no of segments however we noticed 
that sometimes this causes gpdb failures like
{code}
could not connect to segment: initialization of segworker group failed
{code}

We should give a meaningful warning to the user to make them aware that this 
feature may or may not work and also make a recommendation

Possible recommendations
1. Use as many gpus as segments (this may not be practical)
2. May be a smaller buffer size will help. Use minibatch preprocessor dl to 
pack less images. (we need to test this before we recommend it)

        Summary: DL: Better warning and default for gpu memory fraction when no 
of gpus < no of segments  (was: DL: Better warning when no of gpus < no of 
segments)

> DL: Better warning and default for gpu memory fraction when no of gpus < no 
> of segments
> ---------------------------------------------------------------------------------------
>
>                 Key: MADLIB-1337
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1337
>             Project: Apache MADlib
>          Issue Type: Improvement
>          Components: Deep Learning
>            Reporter: Nikhil
>            Priority: Major
>             Fix For: v1.16
>
>
> We support the use case when no of gpus < no of segments however we noticed 
> that sometimes this causes gpdb failures like
> {code:java}
> could not connect to segment: initialization of segworker group failed
> {code}
>  # We should give a meaningful warning to the user to make them aware that 
> this feature may or may not work and also make a recommendation
>  # We should also come up with a better heuristic for the memory fraction 
> value. Currently we default to using 90% of the available memory and 
> distribute it evenly among the segments.
> Possible recommendations
>  1. Use as many gpus as segments (this may not be practical)
>  2. May be a smaller buffer size will help. Use minibatch preprocessor dl to 
> pack less images. (we need to test this before we recommend it)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to