Bimlesh759-AI commented on issue #52: URL: https://github.com/apache/incubator-bluemarlin/issues/52#issuecomment-1060821616
**Update on Horovod Implementation** Below is the few lines of log. Currently 2 GPUs are used, using horovod implementation. Around 8-9 hours will be taken in one complete epoch for half of total training datasets. 2022-03-07T14:37:31.14694774Z Current epoch is :0 out of total epoch size : 250 2022-03-07T14:37:31.151764703Z Current epoch is :0 out of total epoch size : 250 2022-03-07T14:37:34.149059123Z 2022-03-07 22:37:34.147682: Successfully opened dynamic library libcublas.so.10.0 2022-03-07T14:37:34.195508001Z 2022-03-07 22:37:34.194611: Successfully opened dynamic library libcublas.so.10.0 2022-03-07T14:49:41.36024786Z Current batch is :500 out of total batch size : 25284 2022-03-07T14:49:41.360453056Z Current GPU is 0 2022-03-07T14:49:53.371842581Z Current batch is :500 out of total batch size : 25284 2022-03-07T14:49:53.371892941Z Current GPU is 1 2022-03-07T15:00:15.9504008Z Current batch is :1000 out of total batch size : 25284 2022-03-07T15:00:15.950470233Z Current GPU is 1 2022-03-07T15:00:16.496543414Z Current batch is :1000 out of total batch size : 25284 2022-03-07T15:00:16.496598614Z Current GPU is 0 2022-03-07T15:12:15.966611817Z Current batch is :1500 out of total batch size : 25284 2022-03-07T15:12:15.966671727Z Current GPU is 0 2022-03-07T15:13:11.603205931Z Current batch is :1500 out of total batch size : 25284 2022-03-07T15:13:11.603255662Z Current GPU is 1 2022-03-07T15:26:53.688137055Z Current batch is :2000 out of total batch size : 25284 2022-03-07T15:26:53.688267245Z Current GPU is 0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@bluemarlin.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org