Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/79#issuecomment-37224328 @manishamde Thanks for updating the code style and adding more docs! I made a first pass over the code. For the code style, we do not have a good style checker for Scala. @rxin can tell more about style checking. However, it is easy to learn Spark's code style through the code review and make your code style consistent in the next update. Please see my comments for some examples and update similar code in other places. For the implementation, I have the following suggestions: 1. Regression or Classification is checked in many places. It would be nice to create a DecisionTree base class and make RegressionTree and ClassificationTree two subclasses of it. 2. For loops are used in some performance critical code. This should be replaced by "while", which is much faster than "for" in Scala. 3. Several nested methods are used in findBestSplits. It feels safe to see some unit tests for them. 4. The threshold for classification is set at 0.5. This should be configurable. I will try to make a second pass focusing on the algorithm later today. In the meanwhile, would you please update the remaining code style problems and the for loops? Thanks!
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---