At 10:32 AM 10/17/2008 -0700, Toshio Kuratomi wrote:
So I have a question for all the developers on this list.  Philip thinks
that using symlinks will drive adoption better than an API to access
package data.  I think an API will have better adoption than a symlink
hack.  But the real question is what do people who maintain packages
think?  Since Philip's given his reasoning, here's mine:

1) Philip says that with symlinks distributions will likely have to
submit patches to the build scripts to tag various files as belonging to
certain categories.  If you, as an upstream are going to accept a patch
to your build scripts to place files in a different place wouldn't you
also accept a patch to your source code to use a well defined API to
pull files from a different source?  This is a distribution's bread and
butter and if there's a small, useful, well-liked, standard API for
accessing data files you will start receiving patches from distributions
that want to help you help them.

I'll leave this to the developers, but please note that the real historical answer to this question is "no", or at least "not in the current release". Keep in mind that most "yeses" you get to this question will really mean, "when I can get around to understanding the API and testing it and have time to put it in a new release" -- while the "yeses" for adding spec metadata are more likely to mean, "yes, I'll check it in right now if it looks correct".


2) Symlinks cannot be used universally.  Although it might not be common
to want an FHS style install in such an environment, it isn't unheard
of.  At one time in the distant past I had to use cygwin so I know that
while this may be a corner case, it does exist.

Cygwin does symlinks, actually.



3) The primary argument for symlinks is that symlinks are compatible
with __file__.  But this compatibility comes at a cost -- symlinks can't
do anything extra.  In a different subthread Philip argues that
setuptools provides more than distutils and that's why people switch and
that the next generation tool needs to provide even more than
setuptools.  Symlinks cannot do that.

I think Ian's already said this, but the API itself has to do something more, and so far nobody's proposed an API that does anything "more" than what setuptools does in this area, from the developer point of view. (Except for the request that such an API be in the stdlib and thus avoid an extra dependency... but that of course introduces yet another implementation delay, if it means a new release of Python.)


4) In contrast an API can do more:  It can deal with writable files. On
Unix, persistent, per user storage would go in the user's home
directory, on other OS's it would go somewhere else.  This is
abstractable using an API at runtime but not using symlinks at install time.

This is all well and good, but it's actually quite orthogonal to most uses of __file__ and resources today.


5) cross package data.  Using __file__ to detect file location is
inherently not suitable for crossing package boundaries.  Egg
Translations would not be able to use a symlink based backend to do its
work for this reason.

EggTranslations doesn't use __file__, it uses the API, so I don't see how this relates.


6) zipped eggs.  These require an API.  So moving to symlinks is
actually a regression.

As I mentioned earlier, setuptools marks eggs that use __file__ as needing to be installed unzipped, so it's not a regression; it's simply providing the same level of compatibility that setuptools does.

It's requiring the use of an API that's a regression wrt developer-side features.


7) Philip says that the reason pkg_resources does not see widespread
adoption is that the developer cost of using an API is too high compared
to __file__.  I don't believe that the difference between file and API
is that great.

It isn't; it's the *switching* cost that's high, and that's the cost that needs to be minimized in order to drive adoption quickly.


[snip]

I'll just note that the bullets I'm skipping are mostly irrelevant to the issue at hand: i.e., switching cost of using *any* API, AND switching cost for the developers who *are* using pkg_resources presently. Let's not forget that second group of people, because the fact they are using the API shows they are likely early adopters. Make it too hard for them to switch, and you might not have any early adopters left for the new thing. ;-)


* The API isn't flexible enough.  EggTranslations places its data within
the metadata store of eggs instead of within the data store.  This is
because the metadata is able to be read outside of the package in which
it is included while the package data can only be accessed from within
the package.

Actually, this is incorrect. EggTranslations' use of project-level data is so that it's not necessary to include a Python module in the egg, just to have a place to put the data. Access from other packages hasn't got anything to do with it.


8) To a distribution, symlinks are just a hack.  We use them for things
like php web apps when the web application is hardcoded to accept only
one path for things (like the writable state files being intermixed with
the program code).  Managing a symlink farm is not something
distributions are going to get excited over so adoption by distributions
that this is the way to work with files won't happen until upstreams
move on their own.

We need to distinguish between "providing the ability to have a low-cost transition" and "the recommended True Way".

IOW, symlinks and an API are not mutually exclusive; I'm just pointing out that if an API is required, the transition of packages to the new standard will occur *only as quickly as the slowest upstream dependency*.

If the developer of A depends on B, and B hasn't transitioned yet, then A can't transition.


Further, since the install tool is being proposed as a separate project
from the metadata to mark files, the expectation is that the
distributions are going to want to write an install tool that manages
this symlink farm.  For that to happen, you have to get distributions to
be much more than simply neutral about the idea of symlinks, you have to
have them enthused enough about using symlinks that they are willing to
spend time writing a tool to do it.

Well, the question is whether they prefer to have a long, drawn out transition or not. Maybe they don't care about that part, but my assumption was that a replacement for setuptools/easy_install in this space was desired sooner rather than later.

If that's the case, then making it possible for packages to transition without changing their runtime code is a must-have.



So once again, I think this boils down to these questions: if we have a
small library whose sole purpose is to abstract a data store so you can
find out where a particular non-code file lives on this system will you
use it?  If a distribution packager sends you a patch so the data files
are marked correctly and the code can retrieve their location instead of
hardcoding an offset against __file__ will you commit it?

I think the answer to both questions is "yes... eventually... if the API is in the stdlib for all Python versions I'm targeting and everybody else is doing it." Which is why *requiring* it for transition will prevent the distros from seeing benefits from a new standard for quite some time.

Conversely, if the patch for installation metadata is separated from patches to code, I would expect a *much* faster uptake of the metadata patches. And, once having accepted the metadata patch, a developer is actually more likely to take the second step willingly, than if required to do both at once. (See "Influence" by Cialdini.)

To be 100% clear (I hope): I have no objection to an API. It's unequivocally a good idea, and *should* be part of BUILDS. *Requiring* it, on the other hand, is unequivocally a *bad* idea, if you want adoption sooner rather than later.

Now, if you want to establish a transition timetable for phasing out __file__ usage, deprecation, etc., based on when the API will be available in the stdlib etc., publicize and bless that schedule, etc... again, these are all good ideas.

The ONLY thing I object to is requiring it up front from day 1, because then we're just shooting off a giant foot-gun wrt adoption.

_______________________________________________
Distutils-SIG maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to