Re: [Haskell-cafe] Correspondence between libraries and modules

wren ng thornton Tue, 24 Apr 2012 21:59:54 -0700

On 4/23/12 3:06 PM, Alvaro Gutierrez wrote:

I see. The first thing that comes to mind is the notion of module
granularity, which of course is subjective, so whether a single module or
multiple ones should handle e.g. doubles and integrals is a good question;
are there guidelines as to how those choices are made?

I'm not sure if there are any guidelines per se; that's more of ageneral software engineering problem. If you browse around on Hackageyou'll get a fairly good idea what the norms are though. Everyone seemsto have settled on a common range of scope--- with notable exceptionslike the containers library with far too many functions per module, andsome of Ed Kmett's work on category theory which tends towards very fewdeclarations per module.

At any rate, why do these modules, with sufficiently-different
functionality, live in the same library -- is it that they share some
common bits of implementation, or to ease the management of source code?

I contacted Don Stewart (the former maintainer) to see whether hethought I should release the integral stuff on its own, or integrate itinto bytestring-lexing. We agreed that it made more sense to try tobuild up a core library for lexing various common data types, ratherthan having a bunch of little libraries. He'd just never had time to getaround to developing bytestring-lexing further; so I took over.

Eventually I plan to add rendering functions for floating point, and tosplit up the parsers for different floating point formats[1], so that itmore closely resembles the integral stuff. But that won't be until thisfall or later, unless someone requests it sooner.

[1] Having an omni-parser can be helpful when you want to be liberalabout your input. But when you're writing parsers for a specifiedformat, usually they're not that liberal so we need to offer restrictedlexers in order to give code reuse.

When dealing with FFI code, because of the impedance mismatch between
Haskell and imperative languages like C, it's clear that there's going to
be some massaging of the API beyond simply declaring FFI calls. As such,
clearly we'd like to have separate modules for doing the low-level binding
vs presenting a high-level API. Moreover, depending on what you're
interfacing with, you may be forced to have multiple low-level modules.


Ah, that's a good use case. Is the lower-level module usually made "public"
as well, or is it only an implementation detail?

Depends on the project. For ByteStrings, most of that is hidden away asimplementation details. For binding to C libraries, I think the currentadvice is to offer the low-level interface so that if there's somethingthe high-level interface can't handle well, people have some easy recourse.

On the other hand, the main purpose of packages or libraries is as unit of
distribution, code reuse, and separate compilation. Even with the Haskell
culture of making small libraries, most worthwhile units of
distribution/reuse/compilation tend to be larger than a single
namespace/concern. Thus, it makes sense to have more than one module per
package, because otherwise we'd need some higher level mechanism in order
to manage the collections of package-modules which should be considered a
single unit (i.e., clients will almost always want the whole bunch of them).


This is the part that I'm trying to get a better sense of. I can see how in
some cases, it makes sense for more than one module to form a unit, because
they are tightly coupled semantically or implementation-wise -- so clients
will indeed want the whole bunch. On the other hand, several libraries
provide modules that are all over the place, in a way that doesn't form a
"unit" of any kind (e.g. MissingH), and it's not clear that you would want
any Network stuff when all you need is String utilities.

Yeah, MissingH and similar libraries are just grab-bags full of stuff.Usually grab-bag libraries think of themselves as place-holders, withthe intention of breaking things out once there's something of a largeenough size to warrant being its own package. (Whether the breaking outactually happens is another matter.) But to get the general sense ofthings, you should ignore them.

Instead, consider one of the parsing libraries like uu-parsinglib,attoparsec, parsec, frisby. There are lots of pieces to a parsingframework, but it makes sense to distribute them together.

Or, consider one of the base libraries for iteratees, enumerators,pipes, conduits, etc. Like parsing, these offer a whole framework. Youwon't usually need 100% of it, but everyone needs a different 80%.

Or to mention some more of my own packages, consider stm-chans,unification-fd, or unix-bytestrings. In unification-fd, the stuffoutside of Control.Unification.* could be moved elsewhere, but the stuffwithin there makes sense to be split up yet distributed together. Forstm-chans because of the similarity in interfaces, use cases, etc, it'dbe peculiar to want to separate them into different packages. Inunix-bytestring I separated off the Iovec stuff (FFI implementationdetails) from the main API, but clearly they must go together.

But the way you describe it, it seems that despite centralization having
those disadvantages, it is more or less the way the system works, socially
(egos, bad form, etc.) and technically (because of the lack of compiler
support)


There's a difference between centralization and communalization.

With centralization there's a central authority who makes all the rulesand (usaully) enforces them. This is the benevolent dictator modelcommon in open-source. The problem is: what do you do if the dictatorgoes missing (gets hit by a bus, is too busy this semester, etc)?

With communalization, there's no central authority that writes/enforcesthe laws; instead, the community as a whole will come to agree on thenorms. This is the way societies often operate (i.e., societies ascultures, rather than as governments). In virtue of the socialinteraction, things come to be a particular way, but there isn'tnecessarily any person or committee that decided it should be that way.Moreover, in order to disrupt the norms it's not enough to dispose of adictator; you need some wide-scale way of disrupting the network ofsocial interaction. The problem here is that it can be very hard tosteer a community. If you've identified a problem, it's not clear how toget it fixed (whereas a dictator could just issue a fiat).

In practice, every organization has a bit of both models; it's just aquestion of how much of each, and in what contexts. The Haskellcommunity is more centralized when it comes to things like the HaskellReport and the Haskell Platform, because you really need it there.Whereas Hackage and the Cafe are more of your standard social community.

except that it is ad-hoc instead of mechanically enforced. In
other words, I don't see what the advantages of allowing ambiguity
currently are.

If you mechanically enforce things then you will find clashes. That'snot the problem: clashes exist, you find them, whatever. The problem is:now that you've found it, how are you going to resolve it?

You can't just make Hackage refuse packages which would cause a modulename conflict. If you try then you'll get angry developers who justleave or who badmouth Haskell (or both), which does no good for anyone.You have to have an escape hatch, some way for people to raiselegitimate issues such as "the conflictor hasn't been maintained in fiveyears and has no users", or "I wrote the old package and this newpackage is meant to supersede it", etc. But now you need to have a groupof people who work on resolving those issues and making thosecase-by-case decisions about how conflicts should be resolved.

Allowing clashes saves you from needing that group of people. If youallow clashes, there are no developer complaints to be resolved. A lotof resources are tied up in making those central authority groups, andby not having such a central authority we free up those resources to beused elsewhere.

In cases like Perl's CPAN and Linux distros, they have enough resourcesthat they can afford the overhead cost to create and maintain suchgroups. In addition, they're large enough that the resources for thatgroup doesn't necessarily diminish the resources for other things. E.g.,some members of the Linux developer community are no good atprogramming, but they're great at social organization. If you have acentral authority group, they can contribute to that and thereby provideresources; vs, if there's no such group, they're unlikely to offerprogramming time or other resources instead.

Whereas for small communities: overhead costs are higher proportionally,and small communities aren't able to gather as many resources to coverthem. In addition, the person who could offer social organization isprobably already offering other resources which she wouldn't be able tooffer if she moved over to helping the central authority; so you'recloser to a zero-sum game of needing to decide how to allocate yourscarce resources.

Ah, interesting. So, perhaps I misunderstand, but this seems like an
argument in favor of having uniquely-named modules (e.g. Foo.FD and
Foo.TF) instead of overlapping ones, right?


Yeah, probably.

I mean, ideally I'd like to see GHC retooled so that both fundeps andtype families actually compile down to the same code, and one is justsugar for the other (or both are sugar for some third thing). Then we'dget rid of the real problem of there being multiple incompatible ways ofdoing the same thing. Until then, it's probably better to just pick oneapproach for each project, rather than trying to maintain parallel forksfor each approach. But if you're going to maintain parallel forks, thenit's probably best to not do the module punning thing.


--
Live well,
~wren

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Correspondence between libraries and modules

Reply via email to