[Dbix-class] rfc - how best to do design sorting, collations, etc

Darren Duncan Tue, 19 Feb 2008 20:45:21 -0800

Hello,

In my work on designing the Muldis D language, one of the biggestunresolved design problems I'm having is working out features thatinvolve sorting a set of values for various reasons, includingimplementing ordered output of database queries, or implementing'<'|'>' operators, or min|max|between operators, or implementingquota queries or windowed queries.

I'm looking for some input on how I might best proceed with gettingthese working.

The design needs to have semantics explicitly designed enough thatthe features provide the range of actual features or behaviours thatpeople want to use with a database, that are highly deterministicwhile being highly portable, so its easy to predict what any requestwould give you and have that be the same in any implementation, andthat is easy to translate both ways semantics intact between Muldis Dand various SQL dialects and various normal programming languageslike Perl et al, and that is easy to use.

In various SQL dialects, it is common practice that when you want tosort a rowset deterministically, you use an ORDER BY clause thatlists which columns (stored or computed) of the rowset whose valuesyou are sorting the rows on, and their order of precedence when somecolumn values compare as equal and others inequal. This has theadvantage of being very terse and generally polymorphic but itrequires that the type of the column's values has some built-insorting method to automatically use.

In Perl in the generic case, you sort a list by saying eg "sort {<binary-order-compare-expr> } @rowset", and the expression wouldexplicitly invoke whatever behaviour-specific operators you want;this gives you the most control, but it is more verbose andpotentially less polymorphic.

AFAIK, most generic languages take the Perl approach, where a generichigher-order sort function takes a binary comparison function as anargument which determines order of 2 arbitrary list items.

Up to now in Muldis D, I have tried to setup an environment more likeSQL's in that if a data type is marked as being Ordered and defines acertain fundamental compare operator, then values of that type can beused in generic sorting or order-compare or quota context withoutsaid using code having to code differently depending on what the datatype is, as per SQL's ORDER BY.

I've had some problems so far in conceiving how to get all that towork, some of which are related to certain other language designissues which could potentially be changed, and others which I willdiscuss here next.

One main issue is that so-called ordered types may have more than onedesired linear order of values depending on the context, and so thedesired algorithm would have to be specified when asking to sortvalues of this type, in order to provide feature flexibility.

For a common example, see text collations; depending on the user,base characters with particular accents may sort above or below orbeside the same characters with other accents or no accents. (Notethat Muldis D only has a single built-in character repertoire, whichis latest-Unicode, though one can define subtypes of it that justallow a subset of those characters. Its character strings are alsoencoding and normal-form agnostic.)

Now, design and implementation-wise I'm inclined to think that itwould be easiest to adopt the approach typical in non-SQL languagesfor Muldis D's order-sensitive operators, where there is explicitly adifferent-named one for each data type and you invoke that explicitlywhen you want to do a sort or compare or what have you. And ofcourse, users can define their own operators and have them invokedhere. I see this as a simpler and more flexible design, lettingusers say exactly what they want and get it.

But in the general case we don't get the terseness of SQL's "ORDER BYfoo ASC, bar DESC, baz ASC"; instead we say something along thislevel of verbosity: "Seq.sort( 'what' => $myrowset, 'how' => [['foo','asc', 'Int.cmp'], ['bar', 'desc', 'Text.cmp', 'french'], ['baz','asc', 'Date.cmp']] )". Or alternately the user can define their ownMyPkg.foobarbazsort() function and then at use time they simply say"MyPkg.foobarbazsort( 'what' => $myrowset )".

So one main question for the moment, does it seem okay for you aspotential users that an Ordered role would be eliminated, and thateach applicable type would have their own ones of these operatorsrather than sharing same-spelling generic ones: '<', '>', between,min, max, sort, etc (or alternately just '<' or 'compare' and all theothers become generic higher-order functions taking the prior ones asarguments?); and that you would invoke each of these directly whereapplicable?

Or generally speaking, are there any other advice about dealing withthese matters like collations and such in language design? I'mlooking for something easy and extensible. What do your projectsalready do about these?

Barring feedback, I'll probably try eliminating the Ordered role anddo what I said above, and just see how that works.


Thank you. -- Darren Duncan

_______________________________________________
List: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/dbix-class
IRC: irc.perl.org#dbix-class
SVN: http://dev.catalyst.perl.org/repos/bast/DBIx-Class/
Searchable Archive: http://www.grokbase.com/group/[EMAIL PROTECTED]

[Dbix-class] rfc - how best to do design sorting, collations, etc

Reply via email to