On 10/21/2015 09:01 AM, Russel Winder via Digitalmars-d wrote:
I believe (currently, I am open to being re-educated) that most of the
rest of the world calls these persistent data structures, despite the
title of Okasaki's thesis. I am not sure I like either of the labels
"functional" or "persistent", there is little affordance with the
concept underneath.
OK. We should go with the more widespread name.
By default eager value containers are mutable. They should support
immutable and const meaningfully.
Is there a need for separate eager structures explicity? Is it not
enough for all structures to be lazy with there being a pull operation?
Well it is but I wanted to clarify where STL-style containers fit.
N-dimensional arrays, N-dimensional dynamic array, singly-linked list,
doubly-linked list, hash map (aka dictionary, associative array,…),
red-black tree, 2-3 tree, doubly-linked tree, tree map, b-tree. And no
doubt a few others will be proposed since they are classic, appear in
undergraduate courses, and are actually used for real.
These are orthogonal on the kinds I discussed. So we have container
topologies (which you enumerate a few of) and container kinds, which
concerns copying semantics of the entire containers.
However these are classic sequential data structures.
Hashes and some of the trees would actually be associative data structures.
We should be
looking to support data structures that can be used in a multi-threaded
context without having explicit locks. The obvious exemplars to
indicate the way are ConcurrentHashMap in Java and NumPy array. (cf.
other thread talking about these quasi-weird array data structures.)
NumPy array is an opaque data structure that should never be accessed
in any way other than the functions and higher-order functions from the
library. It is a nice design, works well enough for data scientists,
quants, bioinformatics people to think it is great, but fundamentally
is seriously slow. If D could provide data structures that did
NumPy/SciPy/Pandas things significantly faster than NumPy/SciPy/Pandas,
there would be traction.
Cool thoughts, thanks. Those would have their own APIs, separate from
mainstream containers.
Then there is the question whether to try and do the sort of thing
Chapel and X10 are doing. They have the ideas of domain and places
which allow for partitioned arrays that can be mapped to clusters of
multicore, multiprocessor machines — Partitioned Global Address Space.
IBM have now released their APGAS stuff over Hazelcast for Java users.
This will begin to challenge Apache Spark (and Hadoop).
We don't support PGAS at language level, but I'll need to look into how
APGAS works with Java.
Obviously I am conflicted here: I use Go for networking (but mostly
because I earn money running learning workshops), quite like Rust
because it is the fastest language currently on my microbenchmarks,
like D because it's nice, love Chapel because it is great (better than
D in so many ways), sort of like X10 (because it is a bit Scala like
and can be JVM or native), but end up having to work on some C++
codebases. I am not entirely sure D can compete with Chapel in HPC and
increasingly the Python-verse. Laeeth and others are championing D
instead of Python, but really the competition is Python/Chapel vs.
Python/NumPy/SciPy/Pandas. Unless that is D can get the Pandas style
data structures. Which brings my ramblings back on point.
My finance folks also rave about Pandas. I wish I could fork myself to
look into it.
Nevertheless, what I'm working on now should help libraries such as
Pandas for D. We have a large language but are unclear on recipe-style
idioms for writing library code.
Andrei