mboehm7 commented on pull request #1195: URL: https://github.com/apache/systemds/pull/1195#issuecomment-788225280
First of all (now with a bit more time), thanks for the initiative fixing the comments and other potential issues. Few comments on the code though: * Please leave the `Y = ifelse(Y <= 0, max_y + 1, Y)` as it is (converting all <=0 labels to a new maximum label). That way it naturally applied to binary classification where labels are often given as -1 and 1 * Remove the call `LT = decompress(LT)` (that is exactly why we should disable its use at script level other than in tests) * Personally, I would remove the support for one-hot encoded labels y in order to reduce the surface of the API. If kept, please move the checks for min/max y inside the respective branch * The isNaN(X) returns an empty block if there are no NaN (without allocation) - so it's overhead is actually very small. The sum can be faster due to multi-threading, but if that is true we might want to introduce the rewrite `sum(isNaN(X))!=0 --> isNan(sum(X))` instead of modifying such scripts. * Fix the introduced typos (e.g., witch, alreaddy) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
