Bimlesh759-AI opened a new issue #52: URL: https://github.com/apache/incubator-bluemarlin/issues/52
Training scenario: Following datasets details include users with minimum one click count with step = 10. test_dataset_count = 110755727, train_dataset_count = 517801469, user_count = 94315979, item_count = 19 EPOCH = 250 train_batch_size = 20480 test_batch_size = 2048 Current model takes around 12 hours to train 1 epoch if we use all datasets. If we use around 50% datasets by randomly selecting then also model takes around 7-8 hours to train for 1 epoch. By this analogy, If we want to train the model for complete 250 epochs on full datasets, then it will take around 125 days. Currently we are using Tensorflow 1.15, Two GPU are there in training but only one GPU is used. Target about model. 1. It is required that model should not take more than 24 hours to train. 2. Model should be able to use all the available GPU. 3. Is it possible to further reduce the datasets with regard to size without losing insights. 4. Is it possible to get DIN-Lookalike model and trainer code in Tensorflow 2.0 version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@bluemarlin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org