Re: [Haskell-cafe] The difficulty of designing a sequence class

Robert Dockins Mon, 31 Jul 2006 06:05:11 -0700

On Jul 30, 2006, at 5:28 PM, Brian Hulley wrote:

Robert Dockins wrote:
On Sunday 30 July 2006 07:47, Brian Hulley wrote:
Another option, is the Edison library which uses:
     class (Functor s, MonadPlus s) => Sequence s where

so here MonadPlus is used instead of Monoid to provide empty and
append. So I've got three main questions:
1) Did Edison choose MonadPlus just because this fitted in with the
lack of multi-parameter typeclasses in H98?
Edison's design hails from a time when MPTCs were not only
non-standard (as they still are), but also not widely used, and
before fundeps were avaliable (I think).  So the answer to this one
is pretty much "yes".
[snip]
Hi - Thanks for the answers to this and my other questions. Onething I just realised is that there doesn't seem to be any instancedeclarations anywhere in the standard libs relating Monoid toMonadPlus so it's a bit unsettling to have to make a "random"choice on the question of what kind of object a Sequence is...
I tried:

   class (forall a. Monoid s a) => Sequence s where ...
but of course that doesn't work, so I suppose MonadPlus is the onlyoption when 'a' doesn't appear as a type variable arg of the classbeing defined.
BTW, for what purpose are you desiging a new sequence class?  You are
clearly aware of other efforts in this area; in what ways to they not
meet your needs?
The existing sequence and collection classes I've looked at don'tdo enough.
For example, when I tried to represent the text in an edit widget,I realised I needed a sequence of characters that could also beconsidered to be a sequence of lines, and it is necessary to beable to index the sequence by character position as well as by lineposition, as well as keeping track of the total number ofcharacters, the total number of lines, and the maximum number ofcharacters on any one line (so as to be able to calculate the x,yextents when laying out the widget, assuming a fixed width font(tabs ignored!)), with very efficient split and append operations.

So, what you want is a sequence of sequences that can betransparently converted to a "flattened" sequence and vice versa? Andyou also want to keep track of the total number of lines andcharacters within each line. Additionally, you want to keep track ofthe maximum number of characters in any one line.

I managed to get a good representation by using a FingerTree oflines where each line uses a ByteString.I made my own FingerTree class based on the one referenced in thepaper at http://www.soi.city.ac.uk/~ross/papers/FingerTree.html butwithout the symbolic names which I find totally unreadable andconfusing, and also so I could get full control of the strictnessof the implementation, and also as a way of understanding themsince I'd never come across such a complicated data structurebefore. (I highly recommend this paper to anyone who wants to learnabout FingerTrees, Monoids and other very useful concepts.)
So one thing existing sequence classes don't have (apart fromFingerTree) is the concept of measurement which is essential whenyou want efficient updates. Eg in my text buffer, the measurementmaintained for a sequence is the number of chars and number oflines and maximum line length.

Edison has support for transparently keeping track of the size of asequence.

http://www.eecs.tufts.edu/~rdocki01/docs/edison/Data-Edison-Seq-SizedSeq.html

It may well be possible to create a slightly generalized wrapper thatkeeps track of arbitrary "measures". (If they can be computed by afunction which is associative, commutative and has a unit).

Humm, sort of an incremental fold.... I like it.

Then I needed a structure for a Trie widget a bit like (detailsomitted):
     data Node = Expanded Value T | Collapsed Value T | Leaf Value
     newtype T = T (FingerTree (Key, Node))
where objects of type T could be regarded as a finite map (eg fromhierarchical module names to modules) as well as a flattened linearsequence indexed by line number (for display on the screen in awidget given the current scroll bar position), and which alsoneeded to keep track of the total horizontal and vertical extent ofthe Trie as it would appear in the widget's font.
There are several different kinds of measurement going on in thisdata structure, as well as the complexity of the extra recursionthrough the leaf to a new level. Existing sequence abstractionsdon't seem to provide the operations needed to treat a nested datastructure as a single sequence.
In summary:
1) Often a complex data structure must be able to be simultaneouslyregarded as a single flattened sequence2) Measurements are needed for efficient updates (may need to keeptrack of several at once)3) Indexing and size are sometimes needed relative to the flattenedsequence not just the top level4) It is useful to have a finite map that can also be regarded as alinear sequence5) Such finite maps may also be nested (when the keys arehierarchical) but this nesting should be hidden from the user...6) I want a design that can allow complex data structures to bebuilt up easily and instanced to the appropriate interfaces7) Also naming conventions in the existing libs are a bit irregularand burdened with old fashioned lisp-isms eg in Data.Edison.Seqthere are functions "lview" and "reducel" but I'd argue that theremust be one and only one way of forming any identifier in anyprogram namely that the function should appear first followed byqualifiers (so that related functionality always appears togetherin a lexicographical listing of functions) and it must use camelcase with no exceptions at all, thus "viewL" and "reduceL" (and"foldL").

OK. Point taken. I'm not sure I agree with the no-exceptions camel-case, but the lexicographical-listing-groups-functionality holdsstrong appeal for me.

8) More factoring needs to be done since not all sequences need tobe indexed or measured or to be "flattened through the leaf" (egthe FingerTree paper already has a separate class for Reduce and Ibelieve their implementation also referred to a class for Foldable)rather than bundling everything in a single Sequence class.
Anyway apologies for my very rambling answer - I'm still a long wayfrom finding a good set of classes to address the above issues :-)

Well, I guess I'd suggest you attempt to identify specific problemswith already existing packages and attempt to work with those whomaintain such packages before reinventing something as basic (anddifficult to get right!) as data structure abstractions.

Such maintainers may be willing to accept patches and/or implementrequested features in order to reduce fragmentation in this space*hint, hint* :-)



<soapbox type="Edison plug">

I personally think that Edison is a great piece of work, and I tookup maintainership because I felt it was a great shame that no one wasusing it. My ultimate goal is to make Edison the package thateveryone thinks of first when they discover they need a Haskelldatastructure for some purpose. Even if Edison does not fill thatneed, I want every Haskeller to compare his needs against what Edisonprovides before striking out on his own, and I want that to be adecision made with some hesitation. Over time I hope to make thecases where Edison doesn't cut the mustard fewer and further between.

So, if you've ever looked at Edison, or ever do so in the future, anddecide it isn't what you need, please let me know why so I can makeit better for the next time. After all, squeaky wheels get thegrease, but only if I can hear the squeaking!

</soapbox>



Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
          -- TMBG



_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] The difficulty of designing a sequence class

Reply via email to