On 12/17/2016 02:34 PM, Chris Wright wrote:
Just looking at this again:

The obvious workaround to the problem that dependencies must be module-
level is to simply define many small modules---in the extreme, one per
declaration.

Andrei works in phobos a lot. Phobos has a lot of large modules. For
instance, std.datetime is 35,000 lines. It's not unusual for a phobos
module to have over 6,000 lines (std.math, std.typecons, std.traits,
std.format, std.conv).

Let's take a look at that hypothesis. The example I chose randomly (and which turned to be a rat's nest of fuzzy dependencies) was std/array.d, clocking at 3585 lines. Then looking at the entire project:

wc -l std/*.d std/{algorithm,container,digest,experimental,internal,net,range,regex}/**/*.d | sort --key=1 -n | cat -n

This outputs the modules in the standard library (excluding those that are simple header translations), sorted by LoC, numbered. See result in http://paste.ofcode.org/Lc5xfcs8GqpT2cabApSSgk. That shows 137 modules, median length 903, average length 2055 --- including full documentation, unittests, and examples. These numbers seem quite reasonable and if anything compare favorably against other projects I've been on.

I'd normally recommend breaking up modules at one fifth that size.

Yeah, std/datetime.d is a monster, from what I can tell owing to a rote and redundant way of handling unittesting. I didn't look at its dependencies, but I doubt they are special. I was quite vocal about breaking it up, but I got mellower with time since (a) someone measured its size without unittests and it was something like one order of magnitude smaller, and (b) there was really no more trouble using or maintaining it than with anything else in Phobos.

I should also add that each large project has a couple of outliers like that. I even recall a switch of a couple thousand lines once :o).

The
standard library benefits from low granularity modules. It needs to
implement a variety of related tools for working with particular things.

For the hunting-for-definitions case, you also need:

* a module with more than a few imports, from different libraries or
packages
* ambiguous names, or functions that are widely used
* the user can't use an IDE / ctags / dcd
* the user can't use ddox / dpldocs.info, which turns type references
into links; or the user is using that and needs to find the definition of
a template constraint
* the maintainer cannot use selective imports
* the maintainer cannot break the module up to reduce the number of
dependencies
* the maintainer is willing to spend the effort to convert top-level
imports into tightly scoped imports

For the compilation-speed case, you need:

* large dependencies that this allows you to skip (the module combines
several types of functionality with different dependencies)
* the imported module must be in another compilation unit (incremental
compilation or a separate library)
* the dependencies can't be used by any other module in the compilation
unit
* no selective imports
* the module being compiled depends on something in the same scope

That's a pretty marginal use case.

Most of these have been the case with all C++ and D projects I've been involved with at Facebook.

Please let me know what of this information I should include in the DIP to make it better. Thanks.


Andrei

Reply via email to