Re: OT: why do people use python when it is slow?

data pulverizer via Digitalmars-d-learn Sun, 18 Oct 2015 04:55:36 -0700

On Thursday, 15 October 2015 at 21:16:18 UTC, Laeeth Isharc wrote:

On Wednesday, 14 October 2015 at 22:11:56 UTC, data pulverizerwrote:
On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharcwrote:
https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow
Andrei suggested posting more widely.
I am coming at D by way of R, C++, Python etc. so I speak as astatistician who is interested in data science applications.
Welcome...  Looks like we have similar interests.


That's good to know

To sit on the deployment side, D needs to grow it's bigdata/noSQL infrastructure for a start, then hook into a wholeecosystem of analytic tools in an easy and straightforwardmanner. This will take a lot of work!
Indeed. The dlangscience project managed by John Colvin isvery interesting. It is not a pure stats project, but therewill be many shared areas of need. He has some v interestingideas, and being able to mix Python and D in a Jupyter notebookis rather nice (you can do this already).


Thanks for bringing my attention to this, this looks interesting.

Sounds interesting. Take a look at Colvin's dlang sciencedraft white paper, and see what you would add. It's a chanceto shape things whilst they are still fluid.


Good suggestion.

3. Solid interface to a big data database, that allows a Ddata table <-> database easily
Which ones do you have in mind for stats? The differentchoices seem to serve quite different needs. And when you saybig data, how big do you typically mean ?

What I mean is to start by tapping into current big datatechnologies. HDFS and Cassandra have C APIs which we can wrapfor D.

4. Functional programming: especially around data table andarray structures. R's apply(), lapply(), tapply(), plyr andnow data.table(,, by = list()) provides powerful tools fordata manipulation.
Any thoughts on what the design should look like?

Yes, I think this is easy to implement but still important. Thereal devil is my point #1 the dynamic data table object.

To an extent there is a balance between wanting to explore dataiteratively (when you don't know where you will end up), andwanting to build a robust process for production. I have beenwondering myself about using LuaJIT to strap together Dbuilding blocks for the exploration (and calling it based on acustom console built around Adam Ruppe's terminal).


Sounds interesting

6. Nullable types makes talking about missing data morestraightforward and gives you the opportunity to code theminto a set value in your analysis. D is streaks ahead ofPython here, but this is built into R at a basic level.
So matrices with nullable types within? Is nan enough for you? If not then could be quite expensive if back end is C.

I am not suggesting that we pass nullable matrices to Calgorithms, yes nan is how this is done in practice but youwouldn't have nans in your matrix at the point of modeling -they'll just propagate and trash your answer. Nullable types areuseful in data acquisition and exploration - the more practicalside of data handling. I was quite shocked to see them in D, whenthey are essentially absent from "high level" programminglanguages like Python. Real data is messy and having nullabletypes is useful in processing, storing and summarizing raw data.I put in as #6 because I think it is possible to do practicalstatistics working around them by using notional hacks. Nullablesare something that C#, and R have and Python's pandas hasstruggled with. The great news is that they are available in D sowe can use them.

If D can get points 1, 2, 3 many people would be all over Dbecause it is a fantastic programming language and is wickedfast.
What do you like best about it ? And in your own domain, whathave the biggest payoffs been in practice?

I am playing with D at the moment. To become useful to me thedata table structure is a must. I previously said points 1, 2,and 3 would get data scientists sucked into D. But the data tablestructure is the seed. A dynamic structure like that in D wouldcatalyze the rest. Everything else is either wrappers, routineand maybe a lot of work but straightforward to implement. Thedata table structure for me is the real enigma.

The way that R's data types are structured around SEXPs is thekey to all of this. I am currently reading through R's internaldocumentation to get my head around this.


https://cran.r-project.org/doc/manuals/r-release/R-ints.html

Re: OT: why do people use python when it is slow?

Reply via email to