Greetings, I worked with the theory of SVMs during my Graduate studies and I’m relatively new to existing ML software. Assuming that I want to create new scalable ML algorithms starting with the Math, the question is: how do scikit-learn, Mahout Samsara and SystemML compare to each other?
I see interesting Python-based frameworks such as scikit-learn, but then I read SystemML's article on Wikipedia that made me question the distributive scalability of (“pure") Python for large amounts of data: "[...] It was observed that data scientists would write machine learning algorithms in languages such as R and Python for small data. When it came time to scale to big data, a systems programmer would be needed to scale the algorithm in a language such as Scala. This process typically involved days or weeks per iteration, and errors would occur translating the algorithms to operate on big data. " ( https://en.wikipedia.org/wiki/Apache_SystemML ) And the article starts stating that Apache SystemML has "algorithm customizability via [...] Python-like languages”. Mahout Samsara is based on Scala. PredictionIO (predictionio.incubator.apache.org) algorithms are based on Mahout Samsara and Scala. I asked Mr. Matthias Boehm at a conference how one could compare Mahout Samsara to SystemML. From what I understood, Samsara needs "explicit declarations” in expressions for distributed computing, while SystemML doesn’t — please correct me if I’m wrong. Also, SystemML will optimize the entire script, while Samsara will optimize expressions — again, please correct me if I’m wrong. While my main criterion is scalability (cluster, GPU support etc), other criteria to evaluate these frameworks may be: a) public adoption, b) active dev community, c) quality of tools for development, d) backing of big companies e) simplicity working with clusters (delegating the complexities of clustering to the framework, “hiding” them from the user), f) quality of documentation, g) quality of the software itself ( My question was deleted from stats.stackexchange.com for being off-topic and deleted from Stack Overflow for being bound to get answers with "opinions rather than facts” [sic]. I’m very much interested in hearing balanced and insightful comments from the list. ) Thank you, Gustavo