Bimlesh759-AI opened a new issue #52:
URL: https://github.com/apache/incubator-bluemarlin/issues/52


   Training scenario:
   
           Following datasets details include users with minimum one click 
count with step = 10.
           test_dataset_count = 110755727,
           train_dataset_count = 517801469,
           user_count = 94315979,
           item_count = 19
           
           EPOCH = 250
           train_batch_size = 20480
              test_batch_size = 2048
           
           Current model takes around 12 hours to train 1 epoch if we use all 
datasets. If we use around 50% datasets by randomly selecting
           then also model takes around 7-8 hours to train for 1 epoch.
           
           By this analogy, If we want to train the model for complete 250 
epochs on full datasets, then it will take around 125 days.
           
           Currently we are using Tensorflow 1.15, Two GPU are there in 
training but only one GPU is used.
           
   Target about model.
        
           1. It is required that model should not take more than 24 hours to 
train.
           2. Model should be able to use all the available GPU.
           3. Is it possible to further reduce the datasets with regard to size 
without losing insights.
           4. Is it possible to get DIN-Lookalike model and trainer code in 
Tensorflow 2.0 version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@bluemarlin.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to