On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharc wrote:
https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow
Andrei suggested posting more widely.

I am coming at D by way of R, C++, Python etc. so I speak as a statistician who is interested in data science applications.

It's about programmer time. You have to weight the time it takes you to do the task in each programming language, if you are doing statistical analysis now, R and Python come out streaks ahead.

The scope roughly speaking is Research -> Deployment. R and Python sit on the research side, and Python/JVM technologies sit on the deployment side (broadly speaking). The question is where does D sit? What should D's data science strategy be?

To sit on the deployment side, D needs to grow it's big data/noSQL infrastructure for a start, then hook into a whole ecosystem of analytic tools in an easy and straightforward manner. This will take a lot of work!

I believe it is easier and more effective to start on the research side. D will need:

1. A data table structure like R's data.frame or data.table. This is a dynamic data structure that represents a table that can have lots of operations applied to it. It is the data structure that separates R from most programming languages. It is what pandas tries to emulate. This includes text file and database i/o from mySQL and ODBC for a start.

2. Formula class : the ability to talk about statistical models using formulas e.g. y ~ x1 + x2 + x3 etc and then use these formulas to generate model matrices for input into statistical algorithms.

3. Solid interface to a big data database, that allows a D data table <-> database easily

4. Functional programming: especially around data table and array structures. R's apply(), lapply(), tapply(), plyr and now data.table(,, by = list()) provides powerful tools for data manipulation.

5. A factor data type:for categorical variables. This is easy to implement! This ties into the creation of model matrices.

6. Nullable types makes talking about missing data more straightforward and gives you the opportunity to code them into a set value in your analysis. D is streaks ahead of Python here, but this is built into R at a basic level.

If D can get points 1, 2, 3 many people would be all over D because it is a fantastic programming language and is wicked fast.

Reply via email to