On 4/23/12 3:06 PM, Alvaro Gutierrez wrote:
I see. The first thing that comes to mind is the notion of module
granularity, which of course is subjective, so whether a single module or
multiple ones should handle e.g. doubles and integrals is a good question;
are there guidelines as to how those choices are made?

I'm not sure if there are any guidelines per se; that's more of a general software engineering problem. If you browse around on Hackage you'll get a fairly good idea what the norms are though. Everyone seems to have settled on a common range of scope--- with notable exceptions like the containers library with far too many functions per module, and some of Ed Kmett's work on category theory which tends towards very few declarations per module.

At any rate, why do these modules, with sufficiently-different
functionality, live in the same library -- is it that they share some
common bits of implementation, or to ease the management of source code?

I contacted Don Stewart (the former maintainer) to see whether he thought I should release the integral stuff on its own, or integrate it into bytestring-lexing. We agreed that it made more sense to try to build up a core library for lexing various common data types, rather than having a bunch of little libraries. He'd just never had time to get around to developing bytestring-lexing further; so I took over.

Eventually I plan to add rendering functions for floating point, and to split up the parsers for different floating point formats[1], so that it more closely resembles the integral stuff. But that won't be until this fall or later, unless someone requests it sooner.


[1] Having an omni-parser can be helpful when you want to be liberal about your input. But when you're writing parsers for a specified format, usually they're not that liberal so we need to offer restricted lexers in order to give code reuse.


When dealing with FFI code, because of the impedance mismatch between
Haskell and imperative languages like C, it's clear that there's going to
be some massaging of the API beyond simply declaring FFI calls. As such,
clearly we'd like to have separate modules for doing the low-level binding
vs presenting a high-level API. Moreover, depending on what you're
interfacing with, you may be forced to have multiple low-level modules.

Ah, that's a good use case. Is the lower-level module usually made "public"
as well, or is it only an implementation detail?

Depends on the project. For ByteStrings, most of that is hidden away as implementation details. For binding to C libraries, I think the current advice is to offer the low-level interface so that if there's something the high-level interface can't handle well, people have some easy recourse.


On the other hand, the main purpose of packages or libraries is as unit of
distribution, code reuse, and separate compilation. Even with the Haskell
culture of making small libraries, most worthwhile units of
distribution/reuse/compilation tend to be larger than a single
namespace/concern. Thus, it makes sense to have more than one module per
package, because otherwise we'd need some higher level mechanism in order
to manage the collections of package-modules which should be considered a
single unit (i.e., clients will almost always want the whole bunch of them).

This is the part that I'm trying to get a better sense of. I can see how in
some cases, it makes sense for more than one module to form a unit, because
they are tightly coupled semantically or implementation-wise -- so clients
will indeed want the whole bunch. On the other hand, several libraries
provide modules that are all over the place, in a way that doesn't form a
"unit" of any kind (e.g. MissingH), and it's not clear that you would want
any Network stuff when all you need is String utilities.

Yeah, MissingH and similar libraries are just grab-bags full of stuff. Usually grab-bag libraries think of themselves as place-holders, with the intention of breaking things out once there's something of a large enough size to warrant being its own package. (Whether the breaking out actually happens is another matter.) But to get the general sense of things, you should ignore them.

Instead, consider one of the parsing libraries like uu-parsinglib, attoparsec, parsec, frisby. There are lots of pieces to a parsing framework, but it makes sense to distribute them together.

Or, consider one of the base libraries for iteratees, enumerators, pipes, conduits, etc. Like parsing, these offer a whole framework. You won't usually need 100% of it, but everyone needs a different 80%.

Or to mention some more of my own packages, consider stm-chans, unification-fd, or unix-bytestrings. In unification-fd, the stuff outside of Control.Unification.* could be moved elsewhere, but the stuff within there makes sense to be split up yet distributed together. For stm-chans because of the similarity in interfaces, use cases, etc, it'd be peculiar to want to separate them into different packages. In unix-bytestring I separated off the Iovec stuff (FFI implementation details) from the main API, but clearly they must go together.


But the way you describe it, it seems that despite centralization having
those disadvantages, it is more or less the way the system works, socially
(egos, bad form, etc.) and technically (because of the lack of compiler
support)

There's a difference between centralization and communalization.

With centralization there's a central authority who makes all the rules and (usaully) enforces them. This is the benevolent dictator model common in open-source. The problem is: what do you do if the dictator goes missing (gets hit by a bus, is too busy this semester, etc)?

With communalization, there's no central authority that writes/enforces the laws; instead, the community as a whole will come to agree on the norms. This is the way societies often operate (i.e., societies as cultures, rather than as governments). In virtue of the social interaction, things come to be a particular way, but there isn't necessarily any person or committee that decided it should be that way. Moreover, in order to disrupt the norms it's not enough to dispose of a dictator; you need some wide-scale way of disrupting the network of social interaction. The problem here is that it can be very hard to steer a community. If you've identified a problem, it's not clear how to get it fixed (whereas a dictator could just issue a fiat).

In practice, every organization has a bit of both models; it's just a question of how much of each, and in what contexts. The Haskell community is more centralized when it comes to things like the Haskell Report and the Haskell Platform, because you really need it there. Whereas Hackage and the Cafe are more of your standard social community.

except that it is ad-hoc instead of mechanically enforced. In
other words, I don't see what the advantages of allowing ambiguity
currently are.

If you mechanically enforce things then you will find clashes. That's not the problem: clashes exist, you find them, whatever. The problem is: now that you've found it, how are you going to resolve it?

You can't just make Hackage refuse packages which would cause a module name conflict. If you try then you'll get angry developers who just leave or who badmouth Haskell (or both), which does no good for anyone. You have to have an escape hatch, some way for people to raise legitimate issues such as "the conflictor hasn't been maintained in five years and has no users", or "I wrote the old package and this new package is meant to supersede it", etc. But now you need to have a group of people who work on resolving those issues and making those case-by-case decisions about how conflicts should be resolved.

Allowing clashes saves you from needing that group of people. If you allow clashes, there are no developer complaints to be resolved. A lot of resources are tied up in making those central authority groups, and by not having such a central authority we free up those resources to be used elsewhere.

In cases like Perl's CPAN and Linux distros, they have enough resources that they can afford the overhead cost to create and maintain such groups. In addition, they're large enough that the resources for that group doesn't necessarily diminish the resources for other things. E.g., some members of the Linux developer community are no good at programming, but they're great at social organization. If you have a central authority group, they can contribute to that and thereby provide resources; vs, if there's no such group, they're unlikely to offer programming time or other resources instead.

Whereas for small communities: overhead costs are higher proportionally, and small communities aren't able to gather as many resources to cover them. In addition, the person who could offer social organization is probably already offering other resources which she wouldn't be able to offer if she moved over to helping the central authority; so you're closer to a zero-sum game of needing to decide how to allocate your scarce resources.


Ah, interesting. So, perhaps I misunderstand, but this seems like an
argument in favor of having uniquely-named modules (e.g. Foo.FD and
Foo.TF) instead of overlapping ones, right?

Yeah, probably.

I mean, ideally I'd like to see GHC retooled so that both fundeps and type families actually compile down to the same code, and one is just sugar for the other (or both are sugar for some third thing). Then we'd get rid of the real problem of there being multiple incompatible ways of doing the same thing. Until then, it's probably better to just pick one approach for each project, rather than trying to maintain parallel forks for each approach. But if you're going to maintain parallel forks, then it's probably best to not do the module punning thing.

--
Live well,
~wren

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to