On Fri, Mar 18, 2022 at 11:16:51AM -0700, Ali Çehreli via Digitalmars-d-learn wrote: > tldr; I am talking on a soap box with a big question mind hovering > over on my head: Why can't I accept pulling in dependencies > automatically?
Because it's a bad idea for your code to depend on some external resource owned by some anonymous personality somewhere out there on the 'Net that isn't under your control. > On 3/18/22 07:48, H. S. Teoh wrote: > > > As a package manager, dub is OK, it does its job. > > As a long-time part of the D community, I am ashamed to admit that I > don't use dub. I am ashamed because there is no particular reason, or > my reasons may not be rational. I have only used dub once -- for an experimental vibe.d project -- and that only using a dummy empty project the sole purpose of which was to pull in vibe.d (the real code is compiled by a different build system). And I'm not even ashamed to admit it. :-P > > As a build system > > I have seen and used a number of build systems that were started after > make's shortcomings and they ended up with their own shortcomings. > Some of them were actually program code that teams would write to > build their system. As in steps "compile these, then do these". What? > My mind must have been tainted by the beauty of make that writing > build steps in a build tool strikes me as unbelievable... But it > happened. I don't remember its name but it was in Python. You would > modify Python code to build your programs. (?) Maybe you're referring to SCons? I love SCons... not because it's Python, but because it's mostly declarative (the Python calls don't actually build anything immediately -- they register build actions with the build engine and are executed later by an opaque scheduler). The procedural part is really for things like creating lists of files and such (though for the most common tasks there are already basically-declarative functions available for use), or for those occasions where the system simply doesn't have the means to express what you want to do, and you need to invent your own build recipe and plug it in. > I am well aware of make's many shortcomings but love it's declarative > style where things happen automatically. That's one smart program > there. A colleague loves Bazel and is playing with it. Fingers > crossed... Make in its most basic incarnation was on the right path. What came after, however, was a gigantic mess. The macro system, for example, which leads to spaghetti code of the C #ifdef-hell kind. Just look at dmd/druntime/phobos' makefiles sometime, and see if you can figure out what exactly it's trying to do, and how. There's also implementational issues, the worst of which is non-reproducibility: running `make` after making some changes has ZERO guarantees about the consistency of what happens afterwards. It *may* just work, or it may silently link in stale binaries from previous builds that silently replace some symbols with obsolete versions, leading to heisenbugs that exist in your executable but do not exist in your code. (I'm not making this up; I have seen this with my own eyes in my day job on multiple occasions.) The usual bludgeon-solution to this is `make clean; make`, which defeats the whole purpose of having a build system in the first place (just write a shell script to recompile everything from scratch, every time). Not to mention that `clean` isn't a built-in rule, and I've encountered far too many projects where `make clean` doesn't *really* clean everything thoroughly. Lately I've been resorting to `git clean -dfx` as a nuke-an-ant solution to this persistent problem. (Warning: do NOT run the above git command unless you know what you're doing. :-P) > > I much rather prefer Adam's arsd libs[1], where you can literally > > just copy the module into your own workspace (they are almost all > > standalone single-file modules > > That sounds great but aren't there common needs of those modules to > share code from common modules? Yes and no. The dependencies aren't zero, to be sure. But Adam also doesn't take code reuse to the extreme, in that if some utility function can be written in 2-3 lines, there's really no harm repeating it across modules. Introducing a new module just to reuse 2-3 lines of code is the kind of emperor's-clothes philosophy that leads to Dependency Hell. Unfortunately, since the late 70's/early 80's code reuse has become the sacred cow of computer science curriculums, and just about everybody has been so indoctrinated that they would not dare copy-n-paste a 2-3 line function for fear that the Reuse Cops would come knocking on their door at night. > It is ironic that packages being as small as possible reduces the > chance of dependencies of those modules and at the same time it > increases the total number of dependencies. IMNSHO, when the global dependency graph becomes non-trivial (e.g., NP-complete Dependency Hell), that's a sign that you've partitioned your code wrong. Dependencies should be simple, i.e., more-or-less like a tree, without diamond dependencies or conflicting dependencies of the kind that makes resolving dependencies NP-complete. The one-module-per-dependency thing about Adam's arsd is an ideal that isn't always attainable. But the point is that one ought to strive in the direction of less recursive dependencies rather than more. When importing a single Go or Python module triggers the recursive installation of 50+ modules, 45 of which I've no idea why they're needed, is a sign that something has gone horribly, horribly wrong with the whole thing; we're losing sight of the forest for the trees. That way be NP-complete dragons. > > The dependency graph of a project should not be more than 2 levels > > deep (preferably just 1). > > I am fortunate that my programs are commond line tools and libraries > that so far depended only on system libraries. The only outside > dependency is cmake-d to plug into our build system. (I don't > understand or agree with all of cmake-d but things are in an > acceptable balance at the moment.) The only system tool I lately > started using is ssh. (It's a topic for another time but my program > copies itself to the remote host over ssh to work as a pair of client > and server.) I live and breathe ssh. :-D I cannot imagine getting anything done at all without ssh. Incidentally, this is why I prefer a vim-compatible programming environment over some heavy-weight IDE any day. Running an IDE over ssh is out of the question. > > You shouldn't have to download half the world > > The first time I learned about pulling in dependencies terrified me. This is far from the first time I encountered this concept, and it *still* terrifies me. :-D > (This is the part I realize I am very different from most other > programmers.) I love being different! ;-) > I am still terrified that my dependency system will pull in a tree of > code that I have no idea doing. Has it been modified to be malicious > overnight? I thought it was possible. The following story is an > example of what I was exactly terrified about: > > https://medium.com/hackernoon/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5 EXACTLY!!! This is the sort of thing that gives nightmares to people working in network security. Cf. also the Ken Thompson compiler hack. > Despite such risks many projects just pull in code. (?) What am I > missing? IMNSHO, it's because of the indoctrination of code reuse. "Why write code when you can reuse something somebody else has already written?" Sounds good, but there are a lot of unintended consequences: 1) You become dependent on code of unknown provenance written by authors of unknown motivation; how do you know you aren't pulling in malicious code? (Review the code you say? Ha! If you were that diligent, you'd have written the code yourself in the first place. Not likely.) This problem gets compounded with every recursive dependency (it's perhaps imaginable if you carefully reviewed library L before using it -- but L depends on 5 other libraries, each of which in turn depends on 8 others, ad nauseaum. Are you seriously going to review ALL of them?) 2) You become dependent on an external resource, the availability of which may not be under your control. E.g., what happens if you're on the road without an internet connection, your local cache has expired, and you really *really* need to recompile something? Or what if one day, the server on which this dependency was hosted suddenly upped and vanished itself into the ether? Don't tell me "but it's hosted on XYZ network run by Reputable Company ABC, they'll make sure their servers never go down!" -- try saying that 10 years later when you suddenly really badly need to recompile your old code. Oops, it doesn't compile anymore, because a critical dependency doesn't exist anymore and nobody has a copy of the last ancient version the code compiled with. 3) The external resource is liable to change any time, without notice (the authors don't even know you exist, let alone who you are and why changing some API will seriously break your code). Wake up the day of your important release, and suddenly your project doesn't compile anymore 'cos upstream committed an incompatible change. Try explaining that one to your irate customers. :-P > I heard about a team at a very high-profile company actually reviewing > such dependencies before accepting them to the code base. But > reviewing them only at acceptance time! Once the dependency is > accepted, the projects would automatically pull in all unreviewed > changes and run potentially malicious code on your computer. Worse yet, at review time library L depended on external packages X, Y, Z. Let's grant that X, Y, Z were reviewed as well (giving the benefit of the doubt here). But are the reviewers seriously going to continue reviewing X, Y, Z on an ongoing basis? Perhaps X, Y, Z depended upon P, Q, R as well; is *anyone* who uses L going to even notice when R's maintainer turned rogue and committed some nasty backdoor into his code? > I am still trying to understand where I went wrong. I simply cannot > understand this. (I want to believe they changed their policy and they > don't pull in automatically anymore.) If said company is anything like the bureaucratic nightmare I have to deal with every day, I'd bet that nobody cares about this 'cos it's not their department. Such menial tasks are owned by the department of ItDoesntGetDone, and nobody ever knows what goes on there -- we're just glad they haven't bothered us about show-stopping security flaws yet. ;-) > When I (had to) used Go for a year about 4 years ago, it was the same: > The project failed to build one morning because tere was an API change > on one of the dependencies. O... K... They fixed it in a couple of > hours but still... Yes, the project should probably have depended on > a particular version but then weren't we interested in bug fixes or > added functionality? Why should we have decided to hold on to version > 1.2.3 instead of 1.3.4? Should teams follow their many dependencies > before updating? Maybe that's the part I am missing... See, this is the fundamental problem I have with today's philosophy of "put it all in `the cloud', that's the hip thing to do". I do *not* trust that code from some external server somewhere out there isn't going to just vanish into the ether suddenly, or keel over and die the day after, or just plain get hacked (very common these days) and had trojan code inserted into the resource I depend on. Or the server just plain becomes unreachable because I'm on the road, or my ISP is acting up (again), or the network it's on got sanctioned overnight and now I'm breaking the law just by downloading it. I also do *not* trust that upstream isn't going to commit some incompatible change that will fundamentally break my code in ways that are very costly to fix. I mean, they have every right to do so, why should they stop just because some anonymous user out here depended on their code? I want to *manually* verify every upgrade to make sure that it hasn't broken anything, before I commit to the next version of the dependency. AND I want to keep a copy of the last working version on my *local harddrive* until I'm 100% sure I don't need it anymore. I do NOT trust some automated package manager to do this for me correctly (I mean, software can't possibly ever fail, right?). And you know what? I've worked with some projects that have lasted for over a decade or two, and on that time scale, the oft-touted advantages of code reuse have taken on a whole new perspective that people these days don't often think about. I've seen several times how as time goes on, external dependencies become more of a liability than an asset. In the short term, yeah it lets you get off the ground faster, saves you the effort of reinventing the wheel, blah blah blah. In the long term, however, these advantages don't seem so advantageous anymore: - You don't *really* understand the code you depend on, which means if upstream moves in an incompatible direction, or just plain abandons the project (the older the project the more likely this happens), you would not have the know-how to replicate the original functionality required by your own code. - Sometimes the upstream breakage is a subtle one -- it works most of the time, but in this one setting with this one particular customer the behaviour changed. Now your customer is angry and you don't have the know-how to fix it (and upstream isn't going to do it 'cos the old behaviour was a bug). - You may end up with an irreplacable dependency on abandoned old code, but since it isn't your code you don't have the know-how to maintain it (e.g., fix bugs, security holes, etc.). This can mean you're stuck with a security flaw that will be very expensive to fix. - Upstream may not have broken anything, but the performance characteristics may have changed (for the worse). I'm not making this up -- I've seen an actual project where compiling with the newer library causes a 2x reduction in runtime performance. Many months after, it was somewhat improved, but still inferior to the original *unoptimized* library. And complaining upstream didn't help -- they insisted their code wasn't intended to be used this way, so the performance issues are the user's fault, not theirs. - Upstream licensing terms may change, leaving you stuck up the creek without a paddle. Writing the code yourself may have required more up-front investment (and provoke the ire of the Code Reuse police), but you have the advantage that you own the code, have a copy of it always available, won't have licensing troubles, and understand the code well enough to maintain it over the long term. You become independent of the network availability, immune to outages and unwanted breaking changes. The code reuse emperor has no clothes, but his cronies brand me as heretic scum worthy only to be spat out like a gnat. Such is life. ;-) > Thanks for listening... Boo hoo... Why am I like this? :) [...] 'cos you're the smart one. ;-) Most people don't even think about these issues, and then years later it comes back and bites them in the behind. T -- It is of the new things that men tire --- of fashions and proposals and improvements and change. It is the old things that startle and intoxicate. It is the old things that are young. -- G.K. Chesterton