Re: I like dlang but i don't like dub

H. S. Teoh via Digitalmars-d-learn Fri, 18 Mar 2022 14:06:06 -0700

On Fri, Mar 18, 2022 at 11:16:51AM -0700, Ali Çehreli via Digitalmars-d-learn 
wrote:
> tldr; I am talking on a soap box with a big question mind hovering
> over on my head: Why can't I accept pulling in dependencies
> automatically?

Because it's a bad idea for your code to depend on some external
resource owned by some anonymous personality somewhere out there on the
'Net that isn't under your control.

> On 3/18/22 07:48, H. S. Teoh wrote:
> 
> > As a package manager, dub is OK, it does its job.
> 
> As a long-time part of the D community, I am ashamed to admit that I
> don't use dub. I am ashamed because there is no particular reason, or
> my reasons may not be rational.

I have only used dub once -- for an experimental vibe.d project -- and
that only using a dummy empty project the sole purpose of which was to
pull in vibe.d (the real code is compiled by a different build system).

And I'm not even ashamed to admit it. :-P

> > As a build system
> 
> I have seen and used a number of build systems that were started after
> make's shortcomings and they ended up with their own shortcomings.
> Some of them were actually program code that teams would write to
> build their system. As in steps "compile these, then do these". What?
> My mind must have been tainted by the beauty of make that writing
> build steps in a build tool strikes me as unbelievable... But it
> happened. I don't remember its name but it was in Python. You would
> modify Python code to build your programs. (?)

Maybe you're referring to SCons?  I love SCons... not because it's
Python, but because it's mostly declarative (the Python calls don't
actually build anything immediately -- they register build actions with
the build engine and are executed later by an opaque scheduler). The
procedural part is really for things like creating lists of files and
such (though for the most common tasks there are already
basically-declarative functions available for use), or for those
occasions where the system simply doesn't have the means to express what
you want to do, and you need to invent your own build recipe and plug it
in.

> I am well aware of make's many shortcomings but love it's declarative
> style where things happen automatically. That's one smart program
> there. A colleague loves Bazel and is playing with it. Fingers
> crossed...

Make in its most basic incarnation was on the right path.  What came
after, however, was a gigantic mess. The macro system, for example,
which leads to spaghetti code of the C #ifdef-hell kind.  Just look at
dmd/druntime/phobos' makefiles sometime, and see if you can figure out
what exactly it's trying to do, and how.

There's also implementational issues, the worst of which is
non-reproducibility: running `make` after making some changes has ZERO
guarantees about the consistency of what happens afterwards. It *may*
just work, or it may silently link in stale binaries from previous
builds that silently replace some symbols with obsolete versions,
leading to heisenbugs that exist in your executable but do not exist in
your code.  (I'm not making this up; I have seen this with my own eyes
in my day job on multiple occasions.)

The usual bludgeon-solution to this is `make clean; make`, which defeats
the whole purpose of having a build system in the first place (just
write a shell script to recompile everything from scratch, every time).
Not to mention that `clean` isn't a built-in rule, and I've encountered
far too many projects where `make clean` doesn't *really* clean
everything thoroughly. Lately I've been resorting to `git clean -dfx` as
a nuke-an-ant solution to this persistent problem. (Warning: do NOT run
the above git command unless you know what you're doing. :-P)

> > I much rather prefer Adam's arsd libs[1], where you can literally
> > just copy the module into your own workspace (they are almost all
> > standalone single-file modules
> 
> That sounds great but aren't there common needs of those modules to
> share code from common modules?

Yes and no. The dependencies aren't zero, to be sure.  But Adam also
doesn't take code reuse to the extreme, in that if some utility function
can be written in 2-3 lines, there's really no harm repeating it across
modules.  Introducing a new module just to reuse 2-3 lines of code is
the kind of emperor's-clothes philosophy that leads to Dependency Hell.

Unfortunately, since the late 70's/early 80's code reuse has become the
sacred cow of computer science curriculums, and just about everybody has
been so indoctrinated that they would not dare copy-n-paste a 2-3 line
function for fear that the Reuse Cops would come knocking on their door
at night.

> It is ironic that packages being as small as possible reduces the
> chance of dependencies of those modules and at the same time it
> increases the total number of dependencies.

IMNSHO, when the global dependency graph becomes non-trivial (e.g.,
NP-complete Dependency Hell), that's a sign that you've partitioned your
code wrong.  Dependencies should be simple, i.e., more-or-less like a
tree, without diamond dependencies or conflicting dependencies of the
kind that makes resolving dependencies NP-complete.

The one-module-per-dependency thing about Adam's arsd is an ideal that
isn't always attainable. But the point is that one ought to strive in
the direction of less recursive dependencies rather than more.  When
importing a single Go or Python module triggers the recursive
installation of 50+ modules, 45 of which I've no idea why they're
needed, is a sign that something has gone horribly, horribly wrong with
the whole thing; we're losing sight of the forest for the trees. That
way be NP-complete dragons.

> > The dependency graph of a project should not be more than 2 levels
> > deep (preferably just 1).
> 
> I am fortunate that my programs are commond line tools and libraries
> that so far depended only on system libraries. The only outside
> dependency is cmake-d to plug into our build system. (I don't
> understand or agree with all of cmake-d but things are in an
> acceptable balance at the moment.) The only system tool I lately
> started using is ssh. (It's a topic for another time but my program
> copies itself to the remote host over ssh to work as a pair of client
> and server.)

I live and breathe ssh. :-D  I cannot imagine getting anything done at
all without ssh.  Incidentally, this is why I prefer a vim-compatible
programming environment over some heavy-weight IDE any day. Running an
IDE over ssh is out of the question.

> > You shouldn't have to download half the world
> 
> The first time I learned about pulling in dependencies terrified me.

This is far from the first time I encountered this concept, and it
*still* terrifies me. :-D

> (This is the part I realize I am very different from most other
> programmers.)

I love being different! ;-)

> I am still terrified that my dependency system will pull in a tree of
> code that I have no idea doing. Has it been modified to be malicious
> overnight? I thought it was possible. The following story is an
> example of what I was exactly terrified about:
> 
> https://medium.com/hackernoon/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5

EXACTLY!!!  This is the sort of thing that gives nightmares to people
working in network security.  Cf. also the Ken Thompson compiler hack.

> Despite such risks many projects just pull in code. (?) What am I
> missing?

IMNSHO, it's because of the indoctrination of code reuse. "Why write
code when you can reuse something somebody else has already written?"
Sounds good, but there are a lot of unintended consequences:

1) You become dependent on code of unknown provenance written by authors
of unknown motivation; how do you know you aren't pulling in malicious
code?  (Review the code you say? Ha! If you were that diligent, you'd
have written the code yourself in the first place.  Not likely.)  This
problem gets compounded with every recursive dependency (it's perhaps
imaginable if you carefully reviewed library L before using it -- but L
depends on 5 other libraries, each of which in turn depends on 8 others,
ad nauseaum. Are you seriously going to review ALL of them?)

2) You become dependent on an external resource, the availability of
which may not be under your control. E.g., what happens if you're on the
road without an internet connection, your local cache has expired, and
you really *really* need to recompile something?  Or what if one day,
the server on which this dependency was hosted suddenly upped and
vanished itself into the ether?  Don't tell me "but it's hosted on XYZ
network run by Reputable Company ABC, they'll make sure their servers
never go down!" -- try saying that 10 years later when you suddenly
really badly need to recompile your old code.  Oops, it doesn't compile
anymore, because a critical dependency doesn't exist anymore and nobody
has a copy of the last ancient version the code compiled with.

3) The external resource is liable to change any time, without notice
(the authors don't even know you exist, let alone who you are and why
changing some API will seriously break your code). Wake up the day of
your important release, and suddenly your project doesn't compile
anymore 'cos upstream committed an incompatible change. Try explaining
that one to your irate customers. :-P

> I heard about a team at a very high-profile company actually reviewing
> such dependencies before accepting them to the code base. But
> reviewing them only at acceptance time! Once the dependency is
> accepted, the projects would automatically pull in all unreviewed
> changes and run potentially malicious code on your computer.

Worse yet, at review time library L depended on external packages X, Y,
Z.  Let's grant that X, Y, Z were reviewed as well (giving the benefit
of the doubt here).  But are the reviewers seriously going to continue
reviewing X, Y, Z on an ongoing basis?  Perhaps X, Y, Z depended upon P,
Q, R as well; is *anyone* who uses L going to even notice when R's
maintainer turned rogue and committed some nasty backdoor into his code?

> I am still trying to understand where I went wrong. I simply cannot
> understand this. (I want to believe they changed their policy and they
> don't pull in automatically anymore.)

If said company is anything like the bureaucratic nightmare I have to
deal with every day, I'd bet that nobody cares about this 'cos it's not
their department.  Such menial tasks are owned by the department of
ItDoesntGetDone, and nobody ever knows what goes on there -- we're just
glad they haven't bothered us about show-stopping security flaws yet.
;-)

> When I (had to) used Go for a year about 4 years ago, it was the same:
> The project failed to build one morning because tere was an API change
> on one of the dependencies. O... K... They fixed it in a couple of
> hours but still...  Yes, the project should probably have depended on
> a particular version but then weren't we interested in bug fixes or
> added functionality? Why should we have decided to hold on to version
> 1.2.3 instead of 1.3.4? Should teams follow their many dependencies
> before updating? Maybe that's the part I am missing...

See, this is the fundamental problem I have with today's philosophy of
"put it all in `the cloud', that's the hip thing to do". I do *not*
trust that code from some external server somewhere out there isn't
going to just vanish into the ether suddenly, or keel over and die the
day after, or just plain get hacked (very common these days) and had
trojan code inserted into the resource I depend on.  Or the server just
plain becomes unreachable because I'm on the road, or my ISP is acting
up (again), or the network it's on got sanctioned overnight and now I'm
breaking the law just by downloading it.

I also do *not* trust that upstream isn't going to commit some
incompatible change that will fundamentally break my code in ways that
are very costly to fix.  I mean, they have every right to do so, why
should they stop just because some anonymous user out here depended on
their code?  I want to *manually* verify every upgrade to make sure that
it hasn't broken anything, before I commit to the next version of the
dependency. AND I want to keep a copy of the last working version on my
*local harddrive* until I'm 100% sure I don't need it anymore.  I do NOT
trust some automated package manager to do this for me correctly (I
mean, software can't possibly ever fail, right?).

And you know what?  I've worked with some projects that have lasted for
over a decade or two, and on that time scale, the oft-touted advantages
of code reuse have taken on a whole new perspective that people these
days don't often think about.  I've seen several times how as time goes
on, external dependencies become more of a liability than an asset. In
the short term, yeah it lets you get off the ground faster, saves you
the effort of reinventing the wheel, blah blah blah.  In the long term,
however, these advantages don't seem so advantageous anymore:

- You don't *really* understand the code you depend on, which means if
  upstream moves in an incompatible direction, or just plain abandons
  the project (the older the project the more likely this happens), you
  would not have the know-how to replicate the original functionality
  required by your own code.

- Sometimes the upstream breakage is a subtle one -- it works most of
  the time, but in this one setting with this one particular customer
  the behaviour changed. Now your customer is angry and you don't have
  the know-how to fix it (and upstream isn't going to do it 'cos the old
  behaviour was a bug).

- You may end up with an irreplacable dependency on abandoned old code,
  but since it isn't your code you don't have the know-how to maintain
  it (e.g., fix bugs, security holes, etc.). This can mean you're stuck
  with a security flaw that will be very expensive to fix.

- Upstream may not have broken anything, but the performance
  characteristics may have changed (for the worse). I'm not making this
  up -- I've seen an actual project where compiling with the newer
  library causes a 2x reduction in runtime performance. Many months
  after, it was somewhat improved, but still inferior to the original
  *unoptimized* library. And complaining upstream didn't help -- they
  insisted their code wasn't intended to be used this way, so the
  performance issues are the user's fault, not theirs.

- Upstream licensing terms may change, leaving you stuck up the creek
  without a paddle.

Writing the code yourself may have required more up-front investment
(and provoke the ire of the Code Reuse police), but you have the
advantage that you own the code, have a copy of it always available,
won't have licensing troubles, and understand the code well enough to
maintain it over the long term.  You become independent of the network
availability, immune to outages and unwanted breaking changes.

The code reuse emperor has no clothes, but his cronies brand me as
heretic scum worthy only to be spat out like a gnat. Such is life. ;-)

> Thanks for listening... Boo hoo... Why am I like this? :)
[...]

'cos you're the smart one. ;-)  Most people don't even think about these
issues, and then years later it comes back and bites them in the behind.

T

-- 
It is of the new things that men tire --- of fashions and proposals and 
improvements and change. It is the old things that startle and intoxicate. It 
is the old things that are young. -- G.K. Chesterton

Re: I like dlang but i don't like dub

Reply via email to