Re: [Haskell-cafe] Correspondence between libraries and modules

wren ng thornton Tue, 24 Apr 2012 21:35:11 -0700

On 4/23/12 11:39 AM, Gregg Lebovitz wrote:

On 04/23/2012 12:03 AM, wren ng thornton wrote:

However, until better technical support is implemented (not just for
GHC, but also jhc, UHC,...) it's best to follow social practice.


Wren, I am new to Haskell and not aware of all of the conventions. Is
there a place where I can find information on these social practices?
Are they documented some place?

Not that I know of, though they're fairly standard for any open-sourceprogramming community. E.g., when it comes to module names: familiarizeyourself with what's out there; try to fit in with the patterns yousee[1]; don't intentionally clash, steal namespaces[2], or squat onvaluable territory[3]; be reasonable and conscientious when interactingwith people.

[1] e.g., the use of Data.* for data structures which arepredominantly/universally treated as such, vs the use of Control.* forthings which are often thought of as control structures (monads, etc).The use of Foo.Bar.Strict and Foo.Bar.Lazy when you provide both strictand lazy versions of some whole API, usually with Foo.Bar re-exportingwhichever one seems the sensible default. The use of Foo.Bar.Class toresolve circular import issues when defining a class and a bunch ofdatatypes with instances. Etc.

[2] I mean things like if some package is providing a bunch of Foo.Bar.*modules, and it's the only one doing so, then you should try to get intouch with the maintainer before you start publishing your own Foo.Bar.*modules--- in order to collaborate, to send patches up-stream, or justto let them know what's going on.

[3] Witness an unintentional breach of this myself a while back. When Iwas hacking up the exact-combinatorics package for my own use, I putthings in Math.Combinatorics.* since that's a reasonable place andwasn't in use; but I didn't think of that fact when I decided to publishthe code. When pointed out, I promptly moved everything toMath.Combinatorics.Exact.* since that project is only interested inexact combinatorics and I have no intention of codifying all ofcombinatoric theory; hence using Math.Combinatorics.* would be squattingon very valuable names.

However, centralization is prone to bottlenecks and systemic failure.
As such, while it would be nice to ensure that a given module is
provided by only one package, there is no mechanism in place to
enforce this (except at compile time for the code that links the
conflicting modules together).


 From someone new to the community, it seems that yes centralization has
its issues, but it also seems that practices could be put in place that
minimize the bottlenecks and systemic failures.

Unless I greatly misunderstand the challenges, there seem to be lot of
ways to approach this problem and none of them are new. We all use
systems that are composed of many modules neatly combined into complete
systems. Linux distributions do this well. So does Java. Maybe should
borough from their experiences and think about how we put packages
together and what mechanisms we need to resolve inter-package dependencies.

Java attempts to resolve the issue by imposing universal authority (usereverse urls for the first part of your package name). Many Javadevelopers flagrantly ignore that claim to authority. Sun/Oracle has nointerest in actually policing these violations, and there's no centralrepository for leveraging social pressure to do it. Moreover,open-source developers who do not have a commercial/institutionalaffiliation are specifically placed in a tough spot, and are elided frompublic discourse because of that fact, which is extremely problematic ontoo many levels to get into here. Furthermore, many developers---especially among open-source and academic authors--- have an inherentdistrust for ambient authority like this.

To pick another similar namespacing issue, consider the problem ofGoogle Code. In Google Code there's a single namespace for projects, andthe Google team spends a lot of effort on maintaining that namespace andresolving conflicts. (I know folks who've worked in the lab next door tothat team. So, yes, they do spend a lot of work on it.) Whereas if youconsider BitBucket or GitHub, each user is given a separate projectnamespace, and therefore the only thing that has to be maintained is theuser namespace--- which has to be done anyways in order to deal withlogins. The model of Google Code, SourceForge, and Java all assume thatprojects and repositories are scarce resources. Back in the day that mayhave been true (or may not), but today it is clearly false. Repos arecheap and everyone has a dozen side projects.

If you look at the case of Perl and CPAN, there's the same old story:universal authority. Contrary to Java, CPAN does very much activelypolice (or rather, vett) the namespace. However, this extreme level ofpolicing requires a great deal of work and serves to drive away a greatmany developers from publishing their code on CPAN.

I'm not as familiar with the innards of how various Linux distros managethings, but they're also tasked with the additional burden of needing topull in stuff from places like CPAN, Hackage, etc. Because of that,their namespace situation seems quite different from that of Hackage orCPAN on their own. I do know that Debian at least (and presumably theothers as well) devote a great deal of manpower to all this.

So we have (1) the Java model where there are rules that noone follows;(2) the Google Code, CPAN, and Linux distro model of devoting a greatdeal of community resources to maintaining the rules; and (3) theBitBucket, GitHub, Hackage model of having few institutionalized rulesand leaving it to social factors. The first option buys us nothing overthe last, excepting a false sense of security and the ability toalienate private open-source developers.

The second option does arguably give us something, but it's extremelyexpensive. I don't know if you've been involved in the administrativeside of that, but if not then it is far more expensive than you realize.I've worked with CPAN, and many of the folks on this list do packagingfor Debian, Arch, and other Linux distros, so we're familiar with whatit means to ask for a universal authority. The Perl and Linux distrocommunities are *huge* and so they can actually afford the cost ofsetting up this authority, but even they run into limitations of scale.Considering how much difficulty we've had getting someone to officiallytake over Hackage so that we can finally get to using hackage2, it'sfair to say that Haskell has nowhere near a large enough community tosustain the kind of work it would take to police the namespace.

There is no technical solution to this problem, at least not any used bythe communities you cite. The only solutions on offer require a greatdeal of human effort, which is always a social/political/economicmatter. The only technical avenues I see are ways of making the problemless problematic, such as GitHub and BitBucket distinguishing the usernamespace from each user's project namespace, such as the-XPackageImports extension (which is essentially the same asGitHub/BitBucket), or such as various ideas about using tree-grafting torearrange the module namespace on a per-project basis thereby allowingclients to resolve the conflicts rather than requiring a globalsolution. I'm quite interested in that last one, though I don't have anytime for it in the foreseeable future.


--
Live well,
~wren

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Correspondence between libraries and modules

Reply via email to