On Mon, Sep 18, 2006 at 04:08:28PM +0000, Tuomo Valkonen wrote: > * UTF-16 is, of course, a rather different case than just a change in > encoding. The way I'd go about it, is to make the current patch type > polymorphic to input in arbitrary character types, if it isn't already, > and add skeleton support for plugging in and specifying different patch > type for files of arbitrary formats. (So, one day, support could be > written for structural formats to have structural instead of line-based > patches, and so on.)
I think this could be a worthwhile task, although not so easy. If darcs could handle MS Word documents and other "industrial" file formats, it would become a "real" RCS in one more sense of the word real. And it would probably boost the development of new patch types, which would be interesting. One complication is the diff algorithm. It forms hunks, and would form the UTF-16 hunks and many of the eventual plug-in structural format hunks. It needs to be polymorphic as well, or worse... My number one wish for new patch type, once I finally get time to finish the replace-with-space patch type, is be a hunk-move patch type that can move a block of lines between files, and trivially within the same file. This would sort of be a higher order patch type, since one would want it to be able to move _any_ kind of hunk, also the plug-in ones. It would be nice if the user didn't have to ask for a specific diff algorithm on each record. The diff function could take functions that partition each file in one (or more) levels and automatically produces hunks of a sort. But it would probably be a waste of time to run the diff for each different type on each different file. And it would be a weird dialogue with lots of strange dependencies between changes when recording. And there must be some way to guarantee that "general hunks" abide the rules of the patch algebra and to automatically compute the commutings between them. This would probably limit the plug-in "language" they'd be expressed in. Hand-coding a UTF-16 hunk would be easier, but there's still the problem of how to do it in the diff algorithm and the "select changes" dialogue, unless there should simply be _either_ Raw8-bit or UTF-16, which wouldn't be so nice, I think. -- Tommy Pettersson <[EMAIL PROTECTED]> _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
