On Apr 11, 2016 04:18, "Erik Bray" <erik.m.b...@gmail.com> wrote:
>
> On Fri, Apr 8, 2016 at 5:49 PM, Nathaniel Smith <n...@vorpus.org> wrote:
> > Can you give a tiny concrete example? My questions are basic enough
that I
> > feel like I'm missing something fundamental :-)
>
> Yes, I think you might be missing something, but I'm not sure exactly
> where.  In the issue I provided a tiny example which you can see here:
>
> https://gist.github.com/embray/12a67edb82b213217e31f408007898e6
>
> The C code generated by this example currently does not compile on
> Windows, because of how Cython uses DL_IMPORT incorrectly.  Regardless
> of what it *does* do, Cython should not be using the DL_IMPORT macro
> at all actually since that no longer even exists in Python.

Sure.

For that example, the correct thing to do is to *not* export the function.

Backing up, to make sure we're on the same page: There are three levels of
symbol visibility in C: file-internal, shared library internal (different
.c files that make up the library can see it, but users of the shared
library can't see it), and shared library exported (everyone can see it;
can also carry other consequences, e.g. on Linux then internal calls will
become noticeably slower, and it becomes possible for weird symbol
interposition issues to occur). So the rule of thumb is to make everything
as private as you can get away with.

Making this more interesting:
- vanilla C only includes 2 ways to mark symbol visibility, which is not
enough to make a 3-way distinction. Hence the need for an extra attribute
thingummy.
- everyone agrees that symbols marked 'static' should be file-internal, but
different platforms disagree about what should happen if the extra
attribute thingummy is missing.

So on Windows, the convention is:
'static' -> file internal
no marking -> shared library internal
'dllexport' -> public

And on Linux it's:
'static' -> file internal
'visibility (hidden)' -> shared library internal
no marking -> public

It's generally agreed that Linux got this wrong and that you should always
use '-fvisibility=hidden' to switch it to the windows style, but cython
doesn't control compiler options and thus should probably generate code
that works correctly regardless of such compiler settings. Fortunately,
Linux does provide some markings to explicitly make things public: you can
mark a symbol 'visibility (default)' (which means public), or you can use
the dllexport syntax, just like Windows, because gcc is helpful like that.

OTOH, windows is annoying because of this dllimport thing that started this
whole thread: on other systems just marking symbols as extern is enough to
handle both shared-object-internal and shared-library-exported symbols, and
the linker will sort it out. On Windows, you have to explicitly distinguish
between these. (And annoyingly, if you accidentally leave out the dllimport
making on functions then it will use some fallback hack that works but
silently degrades performance; on other symbols it just doesn't work, and
ditto for if you use it when it isn't needed.)

So final conclusion: for non-static symbols, cython should first decide
whether they are supposed to be shared-library-internal or actually
exported from the shared library.

For shared-library-internal symbols: their definition should be marked
'visibility(hidden)' on Linux, and unmarked on Windows. This is easy using
some preprocessor gunk. (Or maybe simplest is: marked that everywhere
except if using msvc, because I think everyone else will understand
'visibility (hidden)' even if it's a no op.) Their declaration in the
header file should just be 'extern' everywhere.

For shared-library-exported symbols: I am dubious about whether cython
should even support these at all. But if it does, then the definitions
should be marked 'dllexport' (no macro trickery needed, because everyone
understands this syntax), and their declaration in the header file needs
some extra hack that is the subject of this thread.

Now, back to your example: Here the caller and callee are both compiled
into the same shared library, so you don't want dllexport/dllimport at all,
you just want a shared-library-internal symbol, which as we see is much
easier.

NumPy also ran into some problems with this in our experiments with using
cython internally. Our temporary solution was to use the preprocessor to
monkeypatch DL_IMPORT into expanding to the appropriate
shared-library-internal thing :-).

> > My first question is why you even need this, since AFAIK there are no
cases
> > where it is correct to have a cython module dllexporting symbols that
appear
> > in header files. This is what cimport is for, right?
>
> I don't think this has anything to do with cimport.  Could you explain
> what you mean?

We only need to solve the dllimport issue if we want to support
shared-library-exported symbols from cython extensions. This is only useful
if you have different extensions that are directly linking to each other
using the platform linker. But this simply doesn't work in almost any
cases, because platform linkers have all kinds of quirks that don't play
well with python packages -- to start with, your runtime linker probably
does not follow Python's rules for which directories to search to find a
shared library... To solve these problems, the Cython devs invented the
cimport mechanism, which is basically a portable, python-centric way of
doing shared-library-exports while avoiding all the problems caused by
using the platform linker. So my question was, what's your use case that's
better served by linking directly against an extension module and using the
platform's shared-library-export functionality, instead of cython's?

> > My second question is why you would want to do this via the command
line,
> > when compiling the dll means that you are compiling some
cython-generated
> > .c, which means that you can put the #define directly in the source
code,
> > no?
>
> Not necessarily. As you can see in my example the file fooutil.c is
> hand-written C code that was not generated by Cython, but which uses a
> function in the Cython-generated code.  It includes "foo.h".  In
> principle you're right--the hand-written C code could set the proper
> #defines but it would have to do so *before* including "foo.h".  It's
> not very obvious that this would be needed.  Most other code I've seen
> that addresses this issue--including cpython itself--do so by passing
> an appropriate define to the compiler via the command-line, and that
> seems the clearest to me.

I see, right. I guess my suggestion would be that if a symbol really does
need to be marked for shared-library-export *and simultaneously* used by
different files within the same shared library -- which is the only case
where this arises -- then possibly the simplest and most robust thing is to
set up the header file so that external users just do #include "foo.h", and
the internal users do

  #define FOO_INTERNAL
  #include "foo.h"

(This is how numpy's include files work, actually, though for reasons
unrelated to symbol visibility.)

-- bottom line --

I think the obvious thing is that cython should provide a natural way to
mark a function as being shared-library-internal; that covers 99% of real
use cases *and* just works without any further macros needed. Probably the
existing "public" annotation should be changed to mean this. (Obviously it
wasn't quite fully thought through in the first place and has few if any
users, since it got this very wrong without anyone noticing. So fixing it
seems viable to me.)

And then *maybe* there should also be a way to make a symbol
shared-library-exported, if that really is useful, but as a non-default
experts-only kind of thing, and as such it would be OK to require these
rare expert users to be a bit more careful about how they #include the
resulting header within their own project.

-n
_______________________________________________
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel

Reply via email to