On 4/23/12 11:39 AM, Gregg Lebovitz wrote:
On 04/23/2012 12:03 AM, wren ng thornton wrote:
However, until better technical support is implemented (not just for
GHC, but also jhc, UHC,...) it's best to follow social practice.

Wren, I am new to Haskell and not aware of all of the conventions. Is
there a place where I can find information on these social practices?
Are they documented some place?

Not that I know of, though they're fairly standard for any open-source programming community. E.g., when it comes to module names: familiarize yourself with what's out there; try to fit in with the patterns you see[1]; don't intentionally clash, steal namespaces[2], or squat on valuable territory[3]; be reasonable and conscientious when interacting with people.


[1] e.g., the use of Data.* for data structures which are predominantly/universally treated as such, vs the use of Control.* for things which are often thought of as control structures (monads, etc). The use of Foo.Bar.Strict and Foo.Bar.Lazy when you provide both strict and lazy versions of some whole API, usually with Foo.Bar re-exporting whichever one seems the sensible default. The use of Foo.Bar.Class to resolve circular import issues when defining a class and a bunch of datatypes with instances. Etc.

[2] I mean things like if some package is providing a bunch of Foo.Bar.* modules, and it's the only one doing so, then you should try to get in touch with the maintainer before you start publishing your own Foo.Bar.* modules--- in order to collaborate, to send patches up-stream, or just to let them know what's going on.

[3] Witness an unintentional breach of this myself a while back. When I was hacking up the exact-combinatorics package for my own use, I put things in Math.Combinatorics.* since that's a reasonable place and wasn't in use; but I didn't think of that fact when I decided to publish the code. When pointed out, I promptly moved everything to Math.Combinatorics.Exact.* since that project is only interested in exact combinatorics and I have no intention of codifying all of combinatoric theory; hence using Math.Combinatorics.* would be squatting on very valuable names.


However, centralization is prone to bottlenecks and systemic failure.
As such, while it would be nice to ensure that a given module is
provided by only one package, there is no mechanism in place to
enforce this (except at compile time for the code that links the
conflicting modules together).

 From someone new to the community, it seems that yes centralization has
its issues, but it also seems that practices could be put in place that
minimize the bottlenecks and systemic failures.

Unless I greatly misunderstand the challenges, there seem to be lot of
ways to approach this problem and none of them are new. We all use
systems that are composed of many modules neatly combined into complete
systems. Linux distributions do this well. So does Java. Maybe should
borough from their experiences and think about how we put packages
together and what mechanisms we need to resolve inter-package dependencies.

Java attempts to resolve the issue by imposing universal authority (use reverse urls for the first part of your package name). Many Java developers flagrantly ignore that claim to authority. Sun/Oracle has no interest in actually policing these violations, and there's no central repository for leveraging social pressure to do it. Moreover, open-source developers who do not have a commercial/institutional affiliation are specifically placed in a tough spot, and are elided from public discourse because of that fact, which is extremely problematic on too many levels to get into here. Furthermore, many developers ---especially among open-source and academic authors--- have an inherent distrust for ambient authority like this.

To pick another similar namespacing issue, consider the problem of Google Code. In Google Code there's a single namespace for projects, and the Google team spends a lot of effort on maintaining that namespace and resolving conflicts. (I know folks who've worked in the lab next door to that team. So, yes, they do spend a lot of work on it.) Whereas if you consider BitBucket or GitHub, each user is given a separate project namespace, and therefore the only thing that has to be maintained is the user namespace--- which has to be done anyways in order to deal with logins. The model of Google Code, SourceForge, and Java all assume that projects and repositories are scarce resources. Back in the day that may have been true (or may not), but today it is clearly false. Repos are cheap and everyone has a dozen side projects.

If you look at the case of Perl and CPAN, there's the same old story: universal authority. Contrary to Java, CPAN does very much actively police (or rather, vett) the namespace. However, this extreme level of policing requires a great deal of work and serves to drive away a great many developers from publishing their code on CPAN.

I'm not as familiar with the innards of how various Linux distros manage things, but they're also tasked with the additional burden of needing to pull in stuff from places like CPAN, Hackage, etc. Because of that, their namespace situation seems quite different from that of Hackage or CPAN on their own. I do know that Debian at least (and presumably the others as well) devote a great deal of manpower to all this.

So we have (1) the Java model where there are rules that noone follows; (2) the Google Code, CPAN, and Linux distro model of devoting a great deal of community resources to maintaining the rules; and (3) the BitBucket, GitHub, Hackage model of having few institutionalized rules and leaving it to social factors. The first option buys us nothing over the last, excepting a false sense of security and the ability to alienate private open-source developers.

The second option does arguably give us something, but it's extremely expensive. I don't know if you've been involved in the administrative side of that, but if not then it is far more expensive than you realize. I've worked with CPAN, and many of the folks on this list do packaging for Debian, Arch, and other Linux distros, so we're familiar with what it means to ask for a universal authority. The Perl and Linux distro communities are *huge* and so they can actually afford the cost of setting up this authority, but even they run into limitations of scale. Considering how much difficulty we've had getting someone to officially take over Hackage so that we can finally get to using hackage2, it's fair to say that Haskell has nowhere near a large enough community to sustain the kind of work it would take to police the namespace.

There is no technical solution to this problem, at least not any used by the communities you cite. The only solutions on offer require a great deal of human effort, which is always a social/political/economic matter. The only technical avenues I see are ways of making the problem less problematic, such as GitHub and BitBucket distinguishing the user namespace from each user's project namespace, such as the -XPackageImports extension (which is essentially the same as GitHub/BitBucket), or such as various ideas about using tree-grafting to rearrange the module namespace on a per-project basis thereby allowing clients to resolve the conflicts rather than requiring a global solution. I'm quite interested in that last one, though I don't have any time for it in the foreseeable future.

--
Live well,
~wren

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to