All,

P.S. What follows is rough and will be smoothed out or reworked.

I propose, perhaps redundantly, that Perl 6 include a complete set of native language constructs for a relational data model, akin to that introduced in E. F. Codd's classic paper, "A Relational Model of Data for Large Shared Data Banks" (a copy of which is at http://www.acm.org/classics/nov95/toc.html ), and also discussed at length in such books as C. J. Date's "Database in Depth" (O'Reilly, 2005). Codd's paper itself (see 1.5) says that the necessary pieces are good candidates for a sub-language of any typical programming language.

The actual relational data model (which is not the same as SQL per se) is expressable in terms of mathematics, such as sets and predicate calculus, and therefore I believe that Perl 6 already has most of what is needed in the language already.

Essentially it comes down to better handling of data sets.

It is very possible, then that all which may be necessary is an extension of the standard data types, or operators, or builtin functions, and/or utilization of the Perl 6 object model.

What I would like, for example, are standard data types which are akin to Relations/RelVars/etc (tables/rowsets), Tuples (rows), Attributes (fields), Sets (enums), Domains (data types) and such. Largely these already map to existing Perl 6 entities:

* a Domain is like a class that defines a set of possible values, and each value can be multi-part; equal to a perl Class

 * an Attribute stores a value which is a perl Object

* a Tuple is an associative array having one or more Attributes, and each Attribute has a name or ordinal position and it is typed according to a Domain;
this is like a restricted Hash in a way, where each key has a specific type

* a Relation is an unordered set of Tuples, where every Tuple has the same definition, as if the Relation were akin to a specific Perl class and every Tuple in it were akin to a Perl object of that class

Fairly standard so far.

Specifically what I would like to see added to Perl, if that doesn't already exist, is a set of operators that work on Relations, like set operations, such as these (these bulleted definitions from "Database in Depth", 1.3.3, some context excluded):

* Restrict - Returns a relation containing all tuples from a specified relation that satisfy a specified condition. For example, we might restrict relation EMP to just the tuples where the DNO value is D2.

* Project - Returns a relation containing all (sub)tuples that remain in a specified relation after specified attributes have been removed. For example, we might project relation EMP on just the ENO and SALARY attributes.

* Product - Returns a relation containing all possible tuples that are a combination of two tuples, one from each of two specified relations. Product is also known variously as cartesian product, cross product, cross join, and cartesian join (in fact, itis just a special case of join, as we'll see in Chapter 5).

* Intersect - Returns a relation containing all tuples that appear in both of two specified relations. (Actually, intersect also is a special case of join.)

* Union - Returns a relation containing all tuples that appear in either or both of two specified relations.

* Difference - Returns a relation containing all tuples that appear in the first and not the second of two specified relations.

* Join - Returns a relation containing all possible tuples that are a combination of two tuples, one from each of two specified relations, such that the two tuples contributing to any given result tuple have a common value for the common attributes of the two relations (and that common value appears just once, not twice, in that result tuple). NOTE, This kind of join was originally called the natural join. Since natural join is far and away the most important kind, however, it's become standard practice to take the unqualified term join to mean the natural join specifically, and I'll follow that practice in this book.

* Divide - Takes two relations, one binary and one unary, and returns a relation consisting of all values of one attribute of the binary relation that match (in the other attribute) all values in the unary relation.

Now, all that I'm saying, could be implemented as a Perl 6 module, and if necessary I can do this for illustrative purposes, but I believe that this is essentially simple and something analagous should be included in the core language for similar reasons that junctions and PDL are.

I also want to make clear that this functionality is entirely about better support for data processing with Perl native variables, and has nothing to do with external data repositores such as SQL databases. Though I anticipate that one could extend or override built-ins so that they interact with remote databases instead of internal variables, such as with the concept of sub-classing or role reusing or tying.

Thank you. -- Darren Duncan

Reply via email to