@krux: Thanks for the feedback! I was thinking about your `dataLanguage` idea
as well. But I don't know if this would allow good composability in larger
projects, i.e., you want data frames to be something that can be passed around
etc. What I can probably do in the end is to write a macro which takes the
iteration body:
iterate(dataFrame):
echo x
and the macro would analyze the processing pipeline and generate a single for
loop internally. The good thing is that the user side of the API only exposes
transformations and actions. So it doesn't really matter for now if I use
closure iterators, and I can still switch the iteration logic internally later.
And the good news is that I get exceptionally good performance with the closure
iterator approach already (I pushed a few first [benchmarks
results](https://github.com/bluenote10/NimData#benchmarks)). I have optimized
the CSV parsing macro a little bit, and CSV parsing is now a factor of 2 faster
than Pandas, which is known to have a very fast parser. As expected for data
which is still too small for Spark to shine, Nim is faster by a factor of 10
(although Spark already runs on 4 cores).
I also made some good progress in terms of features and updated the
documentation a lot, so this is reaching a state where it is actually pretty
much usable.
@perturbation2: Yes very good point, and thanks for the link. If Nim will not
feature multi dispatch this might become a problem. For now I don't need multi
dispatch for the higher order functions like `map` though (and maybe I never
will), so I will simply stick to using a proc (which I hope will continue to
work).
I also understand issue 3 now, for the record: The compiler is basically
complaining that the base method has a provable lock level of 0, while one of
the overloaded methods calls a procvar, and potentially, this procvar can lock.
The locking system needs to know lock levels at compile time though, which is
ruined by the dynamic dispatching in this case. There are two ways out:
Convince the compiler that the procvar does not lock, or tell the compiler in
the base method that there will be overloads of unknown lock level. The former
would be nicer, but I can't really get it to work for now -- for the latter I
have opened a tiny PR to allow doing that in Nim.