Hello,

In my work on designing the Muldis D language, one of the biggest unresolved design problems I'm having is working out features that involve sorting a set of values for various reasons, including implementing ordered output of database queries, or implementing '<'|'>' operators, or min|max|between operators, or implementing quota queries or windowed queries.

I'm looking for some input on how I might best proceed with getting these working.

The design needs to have semantics explicitly designed enough that the features provide the range of actual features or behaviours that people want to use with a database, that are highly deterministic while being highly portable, so its easy to predict what any request would give you and have that be the same in any implementation, and that is easy to translate both ways semantics intact between Muldis D and various SQL dialects and various normal programming languages like Perl et al, and that is easy to use.

In various SQL dialects, it is common practice that when you want to sort a rowset deterministically, you use an ORDER BY clause that lists which columns (stored or computed) of the rowset whose values you are sorting the rows on, and their order of precedence when some column values compare as equal and others inequal. This has the advantage of being very terse and generally polymorphic but it requires that the type of the column's values has some built-in sorting method to automatically use.

In Perl in the generic case, you sort a list by saying eg "sort { <binary-order-compare-expr> } @rowset", and the expression would explicitly invoke whatever behaviour-specific operators you want; this gives you the most control, but it is more verbose and potentially less polymorphic.

AFAIK, most generic languages take the Perl approach, where a generic higher-order sort function takes a binary comparison function as an argument which determines order of 2 arbitrary list items.

Up to now in Muldis D, I have tried to setup an environment more like SQL's in that if a data type is marked as being Ordered and defines a certain fundamental compare operator, then values of that type can be used in generic sorting or order-compare or quota context without said using code having to code differently depending on what the data type is, as per SQL's ORDER BY.

I've had some problems so far in conceiving how to get all that to work, some of which are related to certain other language design issues which could potentially be changed, and others which I will discuss here next.

One main issue is that so-called ordered types may have more than one desired linear order of values depending on the context, and so the desired algorithm would have to be specified when asking to sort values of this type, in order to provide feature flexibility.

For a common example, see text collations; depending on the user, base characters with particular accents may sort above or below or beside the same characters with other accents or no accents. (Note that Muldis D only has a single built-in character repertoire, which is latest-Unicode, though one can define subtypes of it that just allow a subset of those characters. Its character strings are also encoding and normal-form agnostic.)

Now, design and implementation-wise I'm inclined to think that it would be easiest to adopt the approach typical in non-SQL languages for Muldis D's order-sensitive operators, where there is explicitly a different-named one for each data type and you invoke that explicitly when you want to do a sort or compare or what have you. And of course, users can define their own operators and have them invoked here. I see this as a simpler and more flexible design, letting users say exactly what they want and get it.

But in the general case we don't get the terseness of SQL's "ORDER BY foo ASC, bar DESC, baz ASC"; instead we say something along this level of verbosity: "Seq.sort( 'what' => $myrowset, 'how' => [['foo', 'asc', 'Int.cmp'], ['bar', 'desc', 'Text.cmp', 'french'], ['baz', 'asc', 'Date.cmp']] )". Or alternately the user can define their own MyPkg.foobarbazsort() function and then at use time they simply say "MyPkg.foobarbazsort( 'what' => $myrowset )".

So one main question for the moment, does it seem okay for you as potential users that an Ordered role would be eliminated, and that each applicable type would have their own ones of these operators rather than sharing same-spelling generic ones: '<', '>', between, min, max, sort, etc (or alternately just '<' or 'compare' and all the others become generic higher-order functions taking the prior ones as arguments?); and that you would invoke each of these directly where applicable?

Or generally speaking, are there any other advice about dealing with these matters like collations and such in language design? I'm looking for something easy and extensible. What do your projects already do about these?

Barring feedback, I'll probably try eliminating the Ordered role and do what I said above, and just see how that works.

Thank you. -- Darren Duncan

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/[EMAIL PROTECTED]

Reply via email to