First of all, apologies for my lack of input over the last
week--unfortunately I went on holidays pretty much as this list went online,
so I've been pretty quiet. Anyhow, for those who don't know me, I'm the
chair of perl6-language-data. For the remainder of its existance I'll be
more active.

This message is designed to provide an overview of this list's role,
potential inspirations, and key RFCs. If you reply with a comment on a
particular issue, please change the subject line accordingly.

The role of this list is to try and create the features necessary to make
Perl the best language for data crunching around, while keeping it Perlish.
These two goals should be complimentary, not conflicting.

'Data crunching' involves the input, manipulation, and output of large
enough amounts of data that memory usage and speed of the operation is a
very important factor. This includes numeric data crunching such as image
processing or scientific data processing, data-intensive modelling such as
neural network training or stochastic modelling, and text processing such as
dictionary analysis or large transaction log processing.

PDL provides a source of many important lessons for us. It shows the kinds
of features that are required for numeric data crunching, and it also shows
the kinds of obstacles we need to remove from Perl 6.

Other sources of inspiration for syntax, paradigms, and implementation ideas
include:

 - Mathematica (combines functional, declarative, and procedural styles;
implements memoization, lazy lists, and array notation)
 - Matlab (fast and simple array language)
 - C++ expression templates such as POOMA and Blitz++ (implicit looping and
generalised slicing; loops unrolled and parse trees walked completely at
compile time resulting in zero run-time overhead)
 - FORTRAN (still the most widely used numeric programming language)
 - Haskell (effective data crunching in a purely functional paradigm)

RFCs that are important for data crunching are (available from
http://dev.perl.org/rfc/):

 - 23 (v4): Higher order functions--frequently used in reduce() and list
generation/slicing
 - 24 (v1): Semi-finite (lazy) lists--not active, but maybe still required
in a revised form
 - 45 (v2): C<||> and C<&&> should propagate result context to both
sides--need to fix incompatibility with RFC 82
 - 76 (v1): Builtin: reduce--provides the generalised mechanism to reduce
lists/arrays
 - 81 (v2): Lazily evaluated list generation functions--key platform for
generalised slicing
 - 82 (v2): Apply operators component-wise in a list context--makes @a=@b+@c
DWIM
 - 90 (v1): Builtins: zip() and unzip() & 91 (v1): Builtin:
partition--allows 1d arrays to be split and combined flexibly; can be used
to generalise 1d arrays to act as n-dim tensors
 - 107 (v1): lvalue subs should receive the rvalue as an argument & 118
(v1): lvalue subs: parameters, explicit assignment, and wantarray()
changes--lvalue subs are already useful for PDL
 - 115 (v1): Default methods for objects--provides overloading for brackets
 - 116 (v1): Efficient numerics with perl--overview of PDL
 - 117 (v1): Perl syntax support for ranges--should be merged with RFC 81
shortly
 - 123 (v1): Builtin: lazy--lets the programmer decide when lazy evaluation
is safe
 - 128 (v2): Subroutines: Extend subroutine contexts to include name
parameters and lazy arguments
 - 142 (v1): Enhanced Pack/Unpack--important data crunching features
 - 148 (v1): Add reshape() for multi-dimensional array reshaping--Different
notation for RFCs 90 and 91

I've cc'd the authors of these RFCs--you may want to change your RFC so that
it has this list as its home (if appropriate), or at least please subscribe
to this list since we might discuss things of relevance to you.


Reply via email to