+1 for scientific programming in Nim

There are layers of functionality/complexity required (as I see it), based on 
comparison to Python and R:

  1. vector/matrix algebra
  2. dataframes
  3. groupby of dataframes (aggregated manipulation of dataframes)
  4. easy graphing and display of data
  5. memory-limit agnostic structures



As I see it, the development and provision of libraries is at stage 1 (and is 
what you are primarily discussing, although Numpy has been mentioned a number 
of times.) There are quite a few linear algebra libraries currently available 
in nimble.

To provide the equivalent of Numpy requires (among other things) a dataframe 
mechanism, which is more than hard-wiring your own seq[]. A dataframe needs to 
easily handle multiple fields of different types (dates, strings, ints, floats, 
....), and easily display the data, .....

I have listed the third option (GroupBy) as distinct to a dataframe, because 
the result of grouping a dataframe is a datafrsame on steroids (a superset of a 
dataframe). So I assume dataframes are made up of sequences, and GroupBy 
thingies are made up of dataframes. 
    
    
    sequence -> dataframe -> groupby
    

Point 4 would be good if it was easy to port pyplot to Nim (but it is highly 
Python dependent, IIRC). Someone has already provided a library for accessing 
gnuPlot, so that might be an option (although its output is not quite as nice)

Point 5 is for handling "large" datasets. Not many people will take it 
seriously when the program dies because it couldn't fit the data into memory. 
It needs to seamlessly page data to disk as an option for analysing large 
datsets. The Spills library does this, but that functionality would need to be 
included in the dataframe and groupby functionality. 

Reply via email to