Re: OT: why do people use python when it is slow?

data pulverizer via Digitalmars-d-learn Thu, 15 Oct 2015 06:30:35 -0700

On Thursday, 15 October 2015 at 07:57:51 UTC, Russel Winder wrote:

On Thu, 2015-10-15 at 06:48 +0000, data pulverizer viaDigitalmars-d- learn wrote:Just because D doesn't have this now doesn't mean it cannot. Cdoesn't have such capability but R and Python do even though Rand CPython are just C codes.

I think the way R does this is that its dynamic runtimeenvironment is used bind together native C basic type arrays. Iwander if we could simulate dynamic behaviour by leveraging D'sshort compilation time to dynamically write/update data tablesource file(s) containing the structure of new/modified datatables?

Pandas data structures rely on the NumPy n-dimensional arrayimplementation, it is not beyond the bounds of possibility thatthat data structure could be realized as a D module.

Julia's DArray object is an interested take on this:https://github.com/JuliaParallel/DistributedArrays.jl

I believe that parallelism on arrays and data tables aredifferent challenges. Data tables are easier since we canparallelise by row, thus the preference of having row-basedtuples.

The core issue is to have a seriously efficient n-dimensionalarray that is amenable to data parallelism and is extensible.As far as I am aware currently (I will investigate more) theNumPy array is a good native code array, but has some issueswith data parallelism and Pandas has to do quite a lot of workto get the extensibility. I wonder how the R data.table works.


R's data table is not currently parallelised

I have this nagging feeling that like NumPy, data.table seems alot better than it could be. From small experiments D is (andalso Chapel is even more) hugely faster than Python/NumPy atthings Python people think NumPy is brilliant for. Expectationsof Python programmers are set by the scale of Pythonperformance, so NumPy seems brilliant. Compared to the scaleset by D and Chapel, NumPy is very disappointing. I bet thesame is true of R (I have never really used R).

Thanks for notifying me about Chapel - something else interestingto investigate. When it comes to speed R is very strange. Basicmath (e.g. *, +, /) operation on an R array can be fast butfor-looping will kill speed by hundreds of times - most thingsare slow in R unless they are directly baked into its baseoperations. You can write code in C and C++ can call it veryeasily in R though using its Rcpp interface.

This is therefore an opportunity for D to step in. However itis a journey of a thousand miles to get something productionworthy. Python/NumPy/Pandas have had a very large number ofprogrammer hours expended on them. Doing this poorly as a Dmodules is likely worse than not doing it at all.


I think D has a lot to offer the world of data science.

Re: OT: why do people use python when it is slow?

Reply via email to