Vitja Makarov, 07.11.2011 19:28:
2011/11/6 Stefan Behnel:
Vitja Makarov, 06.11.2011 18:10:

When file encoding is specified cython generates two PyObject entries
for string consts one for the variable name and one for the string
constant.

That's because the content may actually become different after decoding,
even if the encoded byte sequence is identical. Note that decoding is only
done in Py3. In Py2, the byte sequence is used, so both values are
identical.

If they are the identical after decoding isn't it better to have only
one of them?

Well, yes. That's not trivial, though, because the decision is taken at C compile time. And the benefit tends to be negligible, because this case is really rare and the affected strings tend to be quite short.


Here is minimal example:
$ cat cplus.pyx
# -*- coding: koi8-r -*-
wtf = 'wtf'

Generaets the following code:

/* Implementation of 'cplus' */
static char __pyx_k__wtf[] = "wtf";
static char __pyx_k____main__[] = "__main__";
static char __pyx_k____test__[] = "__test__";
static PyObject *__pyx_n_s____main__;
static PyObject *__pyx_n_s____test__;
static PyObject *__pyx_n_s__wtf;
static PyObject *__pyx_n_s__wtf;

...

static __Pyx_StringTabEntry __pyx_string_tab[] = {
   {&__pyx_n_s____main__, __pyx_k____main__, sizeof(__pyx_k____main__),
0, 0, 1, 1},
   {&__pyx_n_s____test__, __pyx_k____test__, sizeof(__pyx_k____test__),
0, 0, 1, 1},
   {&__pyx_n_s__wtf, __pyx_k__wtf, sizeof(__pyx_k__wtf), "koi8-r", 0, 1,
1},
   {&__pyx_n_s__wtf, __pyx_k__wtf, sizeof(__pyx_k__wtf), 0, 0, 1, 1},
   {0, 0, 0, 0, 0, 0, 0}
};

Both Python object variables should have different cnames.

What's about adding encoding suffix?

Yes, I think that would fix it, although it could be a bit misleading when reading the C code with a Py3 context in mind. But using a counter doesn't make it very readable, either.

Stefan
_______________________________________________
cython-devel mailing list
cython-devel@python.org
http://mail.python.org/mailman/listinfo/cython-devel

Reply via email to