[ https://issues.apache.org/jira/browse/IGNITE-7456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16329234#comment-16329234 ]
Oleg Ignatenko commented on IGNITE-7456: ---------------------------------------- As of now it looks good to go to master and to 2.4 branch: code changes are OK, unit tests pass and examples run correctly. One thing worth doing prior to merge is to get rid of double-whitespace in the text of the message in MLPGroupTrainerExample: {{">>> Distributed multilayer perceptron example started."}} (it's between words Distributed and multilayer). Another interesting thing I noticed when I tried MLP group example with number of steps larger that set in code (100 instead of 20): there were confusing exceptions in example log. Note I only managed to get it on my machine; when Artem tried it on his machine example run without exceptions. Because of that I attached my execution log here: [^IGNITE-7456.NPE.MLPGroupTrainerExample.tweaked.log]. This issue is out of scope of this ticket since it was with settings that aren't there but after this change is merged to masted we better open a separate ticket to investigate what could go wrong in my trial change. > Fix wrong batch logic in distributed MLP training. > -------------------------------------------------- > > Key: IGNITE-7456 > URL: https://issues.apache.org/jira/browse/IGNITE-7456 > Project: Ignite > Issue Type: Bug > Components: ml > Affects Versions: 2.4 > Reporter: Artem Malykh > Assignee: Artem Malykh > Priority: Major > Fix For: 2.4 > > Attachments: IGNITE-7456.NPE.MLPGroupTrainerExample.tweaked.log > > > Batch for training is created outside of training loop, therefore in each > local step we work with the same batch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)