from:"David Malcolm"

[Python-Dev] Re: PEP 611: The one million limit.

2019-12-06 Thread David Malcolm

On Thu, 2019-12-05 at 16:38 +, Mark Shannon wrote:
> Hi Everyone,
> 
> Thanks for all your feedback on my proposed PEP. I've editing the PEP
> in 
> light of all your comments and it is now hopefully more precise and
> with 
> better justification.
> 
> https://github.com/python/peps/pull/1249

Other program languages have limits in their standards.  For example:

Values for #line in the C preprocessor:
"If lineno is 0 or greater than 32767 (until C99) 2147483647 (since
C99), the behavior is undefined."
  https://en.cppreference.com/w/c/preprocessor/line

Similar for C++'s preprocessor (but for C++11)
  https://en.cppreference.com/w/cpp/preprocessor/line


(These days I maintain GCC's location-tracking code, and we have a
number of implementation-specific limits and heuristics for packing
file/line/column data into a 32-bit type; see
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libcpp/include/line-map.h 
and in particular LINE_MAP_MAX_LOCATION_WITH_COLS,
LINE_MAP_MAX_LOCATION, LINE_MAP_MAX_COLUMN_NUMBER, etc)


Hope this is constructive
Dave
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7N3CF4MDOBSPKANRZJSZOY6JVAGOCHXF/
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-Dev] Stop using timeit, use perf.timeit!

2016-06-10 Thread David Malcolm

On Fri, 2016-06-10 at 23:20 +1000, Steven D'Aprano wrote:
> On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote:
> > Hi,
> > 
> > Last weeks, I made researchs on how to get stable and reliable
> > benchmarks, especially for the corner case of microbenchmarks. The
> > first result is a serie of article, here are the first three:
> 
> Thank you for this! I am very interested in benchmarking.
> 
> > https://haypo.github.io/journey-to-stable-benchmark-system.html
> > https://haypo.github.io/journey-to-stable-benchmark-deadcode.html
> > https://haypo.github.io/journey-to-stable-benchmark-average.html
> 
> I strongly question your statement in the third:
> 
> [quote]
> But how can we compare performances if results are random? 
> Take the minimum?
> 
> No! You must never (ever again) use the minimum for 
> benchmarking! Compute the average and some statistics like
> the standard deviation:
> [end quote]
> 
> 
> While I'm happy to see a real-world use for the statistics module, I 
> disagree with your logic.
> 
> The problem is that random noise can only ever slow the code down, it
> cannot speed it up. 

Consider a workload being benchmarked running on one core, which has a
particular pattern of cache hits and misses.  Now consider another
process running on a sibling core, sharing the same cache.

Isn't it possible that under some circumstances the 2nd process could
prefetch memory into the cache in such a way that the workload under
test actually gets faster than if the 2nd process wasn't running?

[...snip...]

Hope this is constructive
Dave
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Update PEP 7 to require curly braces in C

2016-01-19 Thread David Malcolm

On Mon, 2016-01-18 at 19:18 -0500, Terry Reedy wrote:
> On 1/18/2016 6:20 PM, Brett Cannon wrote:
> >
> >
> > On Sun, 17 Jan 2016 at 11:10 Brett Cannon  > > wrote:
> >
> > While doing a review of http://bugs.python.org/review/26129/ I asked
> > to have curly braces put around all `if` statement bodies. Serhiy
> > pointed out that PEP 7 says curly braces are optional:
> > https://www.python.org/dev/peps/pep-0007/#id5. I would like to
> > change that.
> >
> > My argument is to require them to prevent bugs like the one Apple
> > made with OpenSSL about two years ago:
> > https://www.imperialviolet.org/2014/02/22/applebug.html. Skipping
> > the curly braces is purely an aesthetic thing while leaving them out
> > can lead to actual bugs.
> >
> > Anyone object if I update PEP 7 to remove the optionality of curly
> > braces in PEP 7?
> >
> >
> > Currently this thread stands at:
> >
> > +1
> >Brett
> >Ethan
> >Robert
> >Georg
> >Nick
> >Maciej Szulik
> > +0
> >Guido
> > -0
> >Serhiy
> >MAL
> > -1
> >Victor (maybe; didn't specifically vote)
> >Larry
> >Stefan
> 
> Though I don't write C anymore, I occasionally read our C sources.  I 
> dislike mixed bracketing in a multiple clause if/else statement,  and 
> would strongly recommend against that.  On the other hand, to my 
> Python-trained eye, brackets for one line clauses are just noise.  +-0.
> 
> If coverity's scan does not flag the sort of misleading bug bait 
> formatting that at least partly prompted this thread
> 
> if (a):
> b;
> c;
> 
> then I think we should find or write something that does and run it over 
> existing code as well as patches.

FWIW, for the forthcoming gcc 6, I've implemented a new
-Wmisleading-indentation warning that catches this.  It's currently
enabled by -Wall:

sslKeyExchange.c: In function 'SSLVerifySignedServerKeyExchange':
sslKeyExchange.c:631:8: warning: statement is indented as if it were guarded 
by... [-Wmisleading-indentation]
goto fail;
^~~~
sslKeyExchange.c:629:4: note: ...this 'if' clause, but it is not
if ((err = SSLHashSHA1.update(, )) != 0)
^~


(not that I've had time for core Python development lately, but FWIW in
gcc-python-plugin I mandate braces for single-statement clauses).

Dave

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A macro for easier rich comparisons

2015-04-28 Thread David Malcolm

On Tue, 2015-04-28 at 10:50 -0700, Glenn Linderman wrote:
 On 4/28/2015 2:13 AM, Victor Stinner wrote:
 
   #define Py_RETURN_RICHCOMPARE(val1, val2, op) 
 \
do {
\
switch (op) {   
\
case Py_EQ: if ((val1) == (val2)) Py_RETURN_TRUE; 
Py_RETURN_FALSE;  \
case Py_NE: if ((val1) != (val2)) Py_RETURN_TRUE; 
Py_RETURN_FALSE;  \
case Py_LT: if ((val1)  (val2)) Py_RETURN_TRUE; 
Py_RETURN_FALSE;   \
case Py_GT: if ((val1)  (val2)) Py_RETURN_TRUE; 
Py_RETURN_FALSE;   \
case Py_LE: if ((val1) = (val2)) Py_RETURN_TRUE; 
Py_RETURN_FALSE;  \
case Py_GE: if ((val1) = (val2)) Py_RETURN_TRUE; 
Py_RETURN_FALSE;  \
}   
\
Py_RETURN_NOTIMPLEMENTED;   
\
} while (0)
  I would prefer a function for that:
  
  PyObject *Py_RichCompare(long val1, long2, int op);
 Why would you prefer a function?  As a macro, when the op is a
 constant, most of the code would be optimized away by a decent
 compiler.
 
 I suppose when the op is not a constant, then a function would save
 code space.
 
 So I suppose it depends on the predominant use cases.

There's also the possibility of wrapping C++ code that uses overloaded
operators: having it as a macro could allow those C++ operators to be be
mapped into Python.

Hope this is constructive
Dave

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Add Gentoo packagers of external modules to Misc/ACKS

2013-12-09 Thread David Malcolm

On Sun, 2013-12-08 at 05:29 -0500, R. David Murray wrote:
 As far as we have been able to determine, Tae Wong is in fact a bot
 (note the 'seo' in the email address...a tip of the hand, as far as
 I can see).  We have removed all access permissions (including email)
 from the related account on the bug tracker already.  IMO this address
 should be blocked from posting to all python lists.

FWIW the address has also been posting to the gcc lists helpfully
asking for spam posts to be removed (with *links* to the posts), plus
some (apparently) random-harvested paragraphs of text from various other
mailing lists, presumably to try to get past filters.

See e.g. the URL obtained by running:
 echo uggc://tpp.tah.bet/zy/tpp/2013-12/zft00097.ugzy | rot13


Hope this is constructive
Dave

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Building a Faster Python

2013-07-22 Thread David Malcolm

On Mon, 2013-07-22 at 09:32 +0200, Maciej Fijalkowski wrote:
 On Mon, Jul 22, 2013 at 9:32 AM, Maciej Fijalkowski fij...@gmail.com wrote:
  On Mon, Jul 22, 2013 at 8:15 AM, Antoine Pitrou solip...@pitrou.net wrote:
  On Sun, 21 Jul 2013 16:36:35 -0700
  Raymond Hettinger raymond.hettin...@gmail.com wrote:
  Our current Mac OS X builds use GCC-4.2.
 
  On Python2.7, I ran a comparison of gcc-4.2.1 builds
  versus gcc-4.8.1 and found that the latter makes a much
  faster Python.  PyBench2.0 shows the total running time
  dropping from 5653ms to 4571ms.  The code is uniformly
  better in just about every category.
 
  You could try running the benchmarks suite to see what that gives:
  http://hg.python.org/benchmarks/
 
  Regards
 
  Antoine.
 
  or pypy benchmark suite which is more comprehensive for python 2.7
  (http://bitbucket.org/pypy/benchmarks)
 
 Besides, is there any reason not to use clang by default on OS X?

How did this thread go from:
  for OS X, GCC 4.8.1 gives you significantly faster machine code
   than the system GCC 4.2.1
to
  let's just use clang
?

Presumably if you want the faster possible machine code for the platform
the correct approach is to benchmark the available compilers trying
various options and to see which does best.  Note that the meaning of
e.g. -O3 varies from compiler to compiler (I see someone else already
made the Apples-to-Apples pun).

(I should declare that I've been hacking on GCC for the last few months,
so I have an interest in this)

Hope this is constructive
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Building a Faster Python

2013-07-22 Thread David Malcolm

On Mon, 2013-07-22 at 17:15 +0200, Antoine Pitrou wrote:
 Le Mon, 22 Jul 2013 11:08:32 -0400,
 David Malcolm dmalc...@redhat.com a écrit :
  
  How did this thread go from:
for OS X, GCC 4.8.1 gives you significantly faster machine code
 than the system GCC 4.2.1
  to
let's just use clang
  ?
  
  Presumably if you want the faster possible machine code for the
  platform the correct approach is to benchmark the available compilers
  trying various options and to see which does best.
 
 Depends whether the goal is to choose the fastest compiler, or to
 use the platform's official compiler. Linux distributions don't compile
 packages with icc, AFAIK.

If Intel put icc under a Free Software/Open Source license, maybe we
would ;-)

(only my own opinion, of course)
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Doubly linked lists in Python core?

2013-06-11 Thread David Malcolm

On Tue, 2013-06-11 at 12:14 -0700, Benjamin Peterson wrote:
 2013/6/11 Skip Montanaro s...@pobox.com:
  I encountered this disconcerting message yesterday on a Linux system
  running Python 2.7.2:
 
  *** glibc detected *** /opt/local/bin/python: corrupted double-linked
  list: 0x03b01c90 ***
 
 I suspect that's a corrupt linked list interal to glibc.

Yes: almost certainly the one inside glibc's implementation of malloc.

Somewhere in the process you have a double-free, or a buffer overrun
that's splatting the links that live in memory between the allocated
bufffers.

You may want to try running the process under valgrind.

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Validating SSL By Default (aka Including a Cert Bundle in CPython)

2013-06-03 Thread David Malcolm

On Mon, 2013-06-03 at 12:48 -0400, Barry Warsaw wrote:
 On Jun 03, 2013, at 09:05 AM, Ben Darnell wrote:
 
 The data is analogous to the time zone database (PEP 431) in that it may
 need to be updated independently of Python's own release schedule, so we
 may want to use similar techniques to manage both.  Also see certifi (
 https://pypi.python.org/pypi/certifi), which is a copy of the Mozilla list
 in a pip-installable form.
 
 Right, this is very much analogous, except with the additional twist that
 out-of-date certificates can pose a significant security risk.
 
 I'm fairly certain that Debian and Ubuntu would explicitly not use any
 certificates shipped with Python, for two main reasons: 1) our security teams
 already manage the certificate store distro-wide and we want to make sure that
 one update fixes everything; 2) we don't want to duplicate code in multiple
 packages[1].

Fedora/RHEL are in a similar position; I expect we'd rip out the bundled
certs in our builds shortly after unzipping the tarball, and use a
system-wide cert store (I rm -rf bundled libraries in our builds, to
make sure we're not using them).

[...snip...]


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mysterious Python pyc file corruption problems

2013-05-17 Thread David Malcolm

On Fri, 2013-05-17 at 12:42 -0400, Barry Warsaw wrote:
 On May 16, 2013, at 04:52 PM, Terry Jan Reedy wrote:
 
 If the corruption only happens on Ubuntu, that would constitute 'rhyme'
 ;-). I realize that asking for reports on other systems is part of the reason
 you posted, but I don't remember seeing any others yet.
 
 Right. :)  It's harder to dig out similar problems in Debian[1] but it's
 pretty clear that there have been *some* similar reports in Debian.  Ubuntu
 and Debian share almost all their Python infrastructure.  It would definitely
 be interesting to whether Fedora/RedHat or any other Linux distros have seen
 similar problems.

FWIW I don't recall seeing such problems on Fedora/RH, though that could
be due to...

 I don't know how Fedora/RH does package installation.  In Debian/Ubuntu, we do
 not ship pyc files, but instead they are generated in post-installation
 scripts, which boil down to calls to `$PYTHON -m py_compile -  filenames`.

Fedora/RH pregenerate the .pyc files during rpm creation, and they exist
as part of the rpm payload.


Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Mysterious Python pyc file corruption problems

2013-05-17 Thread David Malcolm

On Fri, 2013-05-17 at 14:23 -0400, Barry Warsaw wrote:
 On May 17, 2013, at 01:19 PM, David Malcolm wrote:
 
 Fedora/RH pregenerate the .pyc files during rpm creation, and they exist
 as part of the rpm payload.
 
 Good to know, thanks.  Do you use `$PYTHON -m py_compile` to generate the pyc
 files at build time?

We use compileall.compiledir() most of the time, but occasionally use
py_compile.compile()

Specifically, for python 2, the core rpm-build package has a script:
  /usr/lib/rpm/brp-python-bytecompile
run automatically in a postprocessing phase after the upstream source
has installed to a DESTDIR, and this invokes compileall.compiledir() on
all .py files in the package payload, with various logic to segment the
different parts of the filesystem to be bytecompiled by the appropriate
python binary (since we have duplicate .py files for different python
runtimes).  This is all done sequentially, so I'd be surprised if
different pythons splatted on each other's .pyc files at this time.

In addition, python3-devel contains a:
  /etc/rpm/macros.pybytecompile
which defines a py_byte_compile() macro, which can be used for
overriding these rules (IIRC), and this does use pycompile.compile()

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposing Argument Clinic, a new way of specifying arguments to builtins for CPython

2012-12-04 Thread David Malcolm

On Mon, 2012-12-03 at 14:29 -0800, Larry Hastings wrote:

[...snip compelling sales pitch...]

I like the idea.

As noted elsewhere, sane generated C code is much easier to step through
in the debugger than preprocessor macros (though sane in that sentence
is begging the question, I guess, but the examples you post look good to
me). It's also seems cleaner to split the argument handling from the
implementation of the function (iirc Cython already has an analogous
split and can use this to bypass arg tuple creation).

The proposal potentially also eliminates a source of bugs: mismatches
between the format strings in PyArg_Parse* vs the underlying C types
passed as varargs (which are a major pain for bigendian CPUs where int
vs long screwups can really bite you).

I got worried that this could introduce a bootstrapping issue (given
that the clinic is implemented using python itself), but given that the
generated code is checked in as part of the C source file, you always
have the source you need to regenerate the interpreter.

Presumably 3rd party extension modules could use this also, in which
case the clinic tool could be something that could be installed/packaged
as part of Python 3.4 ?

[...snip...]

Big unresolved questions:

* How would we convert all the builtins to use Clinic? I fear any
solution will involve some work by hand. Even if we can automate
big chunks of it, fully automating it would require parsing arbitrary
C. This seems like overkill for a one-shot conversion.
(Mark Shannon says he has some ideas.)

Potentially my gcc python plugin could be used to autogenerate things.
FWIW I already have Python code running inside gcc that can parse the
PyArg_* APIs:
http://git.fedorahosted.org/cgit/gcc-python-plugin.git/tree/libcpychecker/PyArg_ParseTuple.py

Though my plugin runs after the C preprocessor has been run, so it may
be fiddly to use this to autogenerate patches.

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Bumping autoconf from 2.68 to 2.69

2012-10-16 Thread David Malcolm

On Tue, 2012-10-16 at 10:59 +0200, Stefan Krah wrote:
 Charles-François Natali neolo...@free.fr wrote:
  Well, so I guess all committers will have to use the same
  Linux/FreeBSD/whatever distribution then?
  AFAICT there's no requirement regarding the mercurial version used by
  committers either.
 
 It should be sufficient to install autoconf-x.y into /home/user/bin or
 something similar. Installing autoconf from source really takes about
 3 minutes.
FWIW this was also documented in the devguide, as of:
http://hg.python.org/devguide/rev/5432be4d4e1a

though that description was then removed in:
http://hg.python.org/devguide/rev/27be97280cff



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] docs.python.org pointing to Python 3 by default?

2012-05-21 Thread David Malcolm

On Fri, 2012-05-18 at 14:24 -0400, Barry Warsaw wrote:
 At what point should we cut over docs.python.org to point to the Python 3
 documentation by default?  Wouldn't this be an easy bit to flip in order to
 promote Python 3 more better?

If we do, perhaps we should revisit http://bugs.python.org/issue10446

http://hg.python.org/cpython/rev/b41404a3f7d4/ changed pydoc in the py3k
branch to direct people to http://docs.python.org/X.Y/library/ rather
than to http://docs.python.org/library/

This was applied to the 3.2 and 3.1 branches, but hasn't been backported
to any of the 2.* - so if docs.python.org starts defaulting to python 3,
it makes sense to backport that change to 2.*


Hope this is helpful
Dave


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Highlighting reference-stealing APIs [was Re: cpython: Fix email post-commit review comments.]

2012-04-18 Thread David Malcolm

On Thu, 2012-04-19 at 10:48 +1200, Greg Ewing wrote:
 Antoine Pitrou wrote:
 
  (and here we see why reference-stealing APIs are a nuisance: because
  you never know in advance whether a function will steal a reference or
  not, and you have to read the docs for each and every C API call you
  make)
 
 Fortunately, they're very rare, so you don't encounter
 them often.
 
 Unfortunately, they're very rare, so you're all the more
 likely to forget about them and get bitten.
 
 Functions with ref-stealing APIs really ought to have
 a naming convention that makes them stand out and remind
 you to consult the documentation.
FWIW my refcount static analyzer adds various new compile-time
attributes to gcc:
http://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html#marking-functions-that-steal-references-to-their-arguments
so you can write declarations like these:

extern void bar(int i, PyObject *obj, int j, PyObject *other)
  CPYCHECKER_STEALS_REFERENCE_TO_ARG(2)
  CPYCHECKER_STEALS_REFERENCE_TO_ARG(4);

There's a similar attribute for functions that return borrowed
references:

  PyObject *foo(void)
CPYCHECKER_RETURNS_BORROWED_REF;

Perhaps we should add such attributes to the headers for Python 3.3?
(perhaps with a different naming convention?)

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Virtualenv not portable from Python 2.7.2 to 2.7.3 (os.urandom missing)

2012-03-29 Thread David Malcolm

On Wed, 2012-03-28 at 18:22 +, Jason R. Coombs wrote:
 I see this was reported as a debian bug.
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=665776
 
 To reproduce, using virtualenv 1.7+ on Python 2.7.2 on Ubuntu, create
 a virtualenv. Move that virtualenv to a host with Python 2.7.3RC2
 yields:
 jaraco@vdm-dev:~$ /usr/bin/python2.7 -V
 Python 2.7.3rc2
 jaraco@vdm-dev:~$ env/bin/python -V
 Python 2.7.2
 jaraco@vdm-dev:~$ env/bin/python -c import os; os.urandom()
 Traceback (most recent call last):
   File string, line 1, in module
 AttributeError: 'module' object has no attribute 'urandom'

It looks like this a symptom of the move of urandom to os.py to
posximodule et al.

At first glance, it looks like this specific hunk should be reverted:
http://hg.python.org/cpython/rev/a0f43f4481e0#l7.1
so that if you're running with the new stdlib but an old python binary
the combination can still have a usable os.urandom

Should this be tracked in bugs.python.org?

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 416: Add a frozendict builtin type

2012-02-29 Thread David Malcolm

On Wed, 2012-02-29 at 19:21 +0100, Victor Stinner wrote:
 As requested, I create a PEP and a related issue:
 
 http://www.python.org/dev/peps/pep-0416/

[...snip...]

 
 Rationale
 =
 
 A frozendict mapping cannot be changed, but its values can be mutable
 (not hashable). A frozendict is hashable and so immutable if all
 values are hashable (immutable).
The wording of the above seems very unclear to me.

Do you mean A frozendict has a constant set of keys, and for every key,
d[key] has a specific value for the lifetime of the frozendict.
However, these values *may* be mutable.  The frozendict is hashable iff
all of the values are hashable. ?  (or somesuch)

[...snip...]

  * Register frozendict has a collections.abc.Mapping
s/has/as/ ?

[...snip...]

 If frozendict is used to harden Python (security purpose), it must be
 implemented in C. A type implemented in C is also faster.

You mention security purposes here, but this isn't mentioned in the
Rationale or Use Cases

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

2012-01-21 Thread David Malcolm

On Fri, 2012-01-20 at 16:55 +0100, Frank Sievertsen wrote:
 Hello,
 
 I still see at least two ways to create a DOS attack even with the
 collison-counting-patch.

[snip description of two types of attack on the collision counting
approach]

 What to do now?
 I think it's not smart to reduce the number of allowed collisions 
 dramatically
 AND count all slot-collisions at the same time.

Frank: did you see the new approach I proposed in:
http://bugs.python.org/issue13703#msg151735
http://bugs.python.org/file24289/amortized-probe-counting-dmalcolm-2012-01-21-003.patch

(repurposes the ma_smalltable region of large dictionaries to add
tracking of each such dict's average iterations taken per modification,
and raise an exception when it exceeds a particular ratio)

I'm interested in hearing how it holds up against your various test
cases, or what flaws there are in it.

Thanks!
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread David Malcolm

On Thu, 2012-01-05 at 19:34 +0200, Maciej Fijalkowski wrote:
 On Thu, Jan 5, 2012 at 3:39 PM, Antoine Pitrou solip...@pitrou.net wrote:
  On Thu, 5 Jan 2012 15:26:27 +1100
  Andrew Bennetts and...@bemusement.org wrote:
 
  I don't think that's news either.
  http://mail.python.org/pipermail/python-dev/2003-May/035907.html and
  http://twistedmatrix.com/pipermail/twisted-python/2003-June/004339.html for
  instance show that in 2003 it was clearly known to at least be likely to 
  be an
  exploitable DoS in common code (a dict of HTTP headers or HTTP form keys).
 
  There was debate about whether it's the language's responsibility to 
  mitigate
  the problem or if apps should use safer designs for handling untrusted 
  input
  (e.g. limit the number of keys input is allowed to create, or use something
  other than dicts), and debate about just how practical an effective exploit
  would be.  But I think it was understood to be a real concern 8 years ago, 
  so
  not exactly sudden.
 
  That's not news indeed, but that doesn't make it less of a problem,
  especially now that the issue has been widely publicized through a
  conference and announcements on several widely-read Web sites.
 
  That said, only doing the security fix in 3.3 would have the nice side
  effect of pushing people towards Python 3, so perhaps I'm for it after
  all.
 
  Half-jokingly,
 
  Antoine.

 
 Just to make things clear - stdlib itself has 1/64 of tests relying on
 dict order. Changing dict order in *older* pythons will break
 everyone's tests and some peoples code. Making this new 2.6.x release
 would mean that people using new python 2.6 would have to upgrade an
 unspecified amount of their python packages, that does not sound very
 cool. Also consider that new 2.6.x would go as a security fix to old
 ubuntu, but all other packages won't, because they'll not contain
 security fixes. Just so you know

We have similar issues in RHEL, with the Python versions going much
further back (e.g. 2.3)

When backporting the fix to ancient python versions, I'm inclined to
turn the change *off* by default, requiring the change to be enabled via
an environment variable: I want to avoid breaking existing code, even if
such code is technically relying on non-guaranteed behavior.  But we
could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
That way /usr/bin/python would default to the old behavior, but web apps
would have some protection.   Any such logic here also suggests the need
for an attribute in the sys module so that you can verify the behavior.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Hash collision security issue (now public)

2012-01-05 Thread David Malcolm

On Thu, 2012-01-05 at 20:35 +, Paul Moore wrote:
 On 5 January 2012 19:33, David Malcolm dmalc...@redhat.com wrote:
  We have similar issues in RHEL, with the Python versions going much
  further back (e.g. 2.3)
 
  When backporting the fix to ancient python versions, I'm inclined to
  turn the change *off* by default, requiring the change to be enabled via
  an environment variable: I want to avoid breaking existing code, even if
  such code is technically relying on non-guaranteed behavior.  But we
  could potentially tweak mod_python/mod_wsgi so that it defaults to *on*.
  That way /usr/bin/python would default to the old behavior, but web apps
  would have some protection.   Any such logic here also suggests the need
  for an attribute in the sys module so that you can verify the behavior.
 
 Uh, surely no-one is suggesting backporting to ancient versions? I
 couldn't find the statement quickly on the python.org website (so this
 is via google), but isn't it true that 2.6 is in security-only mode
 and 2.5 and earlier will never get the fix? Having a source-only
 release for 2.6 means the fix is off by default in the sense that
 you can choose not to build it. Or add a #ifdef to the source if it
 really matters.
Sorry, if I was unclear.   I don't expect python-dev to do this
backporting, but those of us who do maintain such ancient pythons via
Linux distributions may want to do the backport for our users.  My email
was to note that it may make sense to pick more conservative defaults
for such a scenario, as compared to 2.6 onwards.

[snip]

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Fwd: Anyone still using Python 2.5?

2011-12-21 Thread David Malcolm

On Wed, 2011-12-21 at 10:42 +0100, Charles-François Natali wrote:
  Do people still have to use this in commercial environments or is
  everyone on 2.6+ nowadays?
 
 RHEL 5.7 ships with Python 2.4.3. So no, not everybody is on 2.6+
 today, and this won't happen before a couple years.

(and RHEL 4.9 with Python 2.3.4, fwiw)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Promoting Python 3 [was: PyPy 1.7 - widening the sweet spot]

2011-11-22 Thread David Malcolm

On Tue, 2011-11-22 at 09:13 -0800, Toshio Kuratomi wrote:
 On Wed, Nov 23, 2011 at 01:41:46AM +0900, Stephen J. Turnbull wrote:
  Barry Warsaw writes:
  
Hopefully, we're going to be making a dent in that in the next version of
Ubuntu.
  
  This is still a big mess in Gentoo and MacPorts, though.  MacPorts
  hasn't done anything about ceating a transition infrastructure AFAICT.
  Gentoo has its eselect python set VERSION stuff, but it's very
  dangerous to set to a Python 3 version, as many things go permanently
  wonky once you do.  (So far I've been able to work around problems
  this creates, but it's not much fun.)  I have no experience with this
  in Debian, Red Hat (and derivatives) or *BSD, but I have to suspect
  they're no better.  (Well, maybe Red Hat has learned from its 1.5.2
  experience! :-)
  
 For Fedora (and currently, Red Hat is based on Fedora -- a little more about
 that later, though), we have parallel python2 and python3 stacks.  As time
 goes on we've slowly brought more python-3 compatible modules onto the
 python3 stack (I believe someone had the goal a year and a half ago to get
 a complete pylons web development stack running on python3 on Fedora which
 brought a lot of packages forward).

FWIW, current status of Fedora's Python 3 stack can be seen here:
  http://fedoraproject.org/wiki/Python3
and that page may be of interest to other distributions - I know of at
least one other distribution that's screen-scraping it ;)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393 close to pronouncement

2011-09-26 Thread David Malcolm

On Tue, 2011-09-27 at 00:19 +0200, Victor Stinner wrote:
 Hi,
 
 Le lundi 26 septembre 2011 23:00:06, Guido van Rossum a écrit :
  So, if you have the time, please review PEP 393 and/or play with the
  code (the repo is linked from the PEP's References section now).

 
 PEP
 ===

 GDB Debugging Hooks It's not done yet.
I can do these if need be, but IIRC you (Victor) said on #python-dev
that you were already working on them.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] is anyone using Misc/RPM?

2011-06-13 Thread David Malcolm

On Sun, 2011-06-12 at 15:37 -0500, Benjamin Peterson wrote:
 If no one is using it, I'd like to delete it. I also don't think we
 should be in business of distributing distribution specific files.

FWIW, Fedora and RHEL don't use this particular .spec file; we roll our
own.

I can't speak for all of the other RPM-using distributions, of course.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] cpython: Remove some extraneous parentheses and swap the comparison order to

2011-06-07 Thread David Malcolm

On Tue, 2011-06-07 at 11:03 +0200, M.-A. Lemburg wrote:
 Georg Brandl wrote:
  On 06/07/11 05:20, brett.cannon wrote:
  http://hg.python.org/cpython/rev/fc282e375703
  changeset:   70695:fc282e375703
  user:Brett Cannon br...@python.org
  date:Mon Jun 06 20:20:36 2011 -0700
  summary:
Remove some extraneous parentheses and swap the comparison order to
  prevent accidental assignment.
 
  Silences a warning from LLVM/clang 2.9.
  
  Swapping the comparison order here seems a bit inconsistent to me. There are
  lots of others around (e.g. len == 0 in the patch context below). Why is
  this one so special?
 
  I think that another developer even got told off once for these kinds of
  comparisons.
 
  I hope the Clang warning is only about the parentheses.
 
 I agree with Georg: if ('u' == typecode) is not well readable,
 since you usually put the variable part on the left and the constant
 part on the right of an equal comparison.

[FWIW, I'm one of the reprobates that likes to put the constant on the
LHS when I'm coding in C, but I see I'm in the minority here]

I know that this style is unpopular, but if it helps, try mentally
pronouncing  ==  in C as  is the value of.

In this example, when I read that line, my mind is thinking:
 
   if 'u' is the value of typecode

After ~12 years of doing this, it comes naturally.  I appreciate that
this may come across as weird though :)

[snip]

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Make test failed issues for phyton 3.2 on centos5.5

2011-04-11 Thread David Malcolm

On Mon, 2011-04-11 at 07:58 -0400, Tres Seaver wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 04/10/2011 08:00 PM, Stephen Yeng wrote:
 
  11 skips unexpected on linux2:
  test_bz2 test_dbm_gnu test_dbm_ndbm test_gzip test_readline
  test_ssl test_tcl test_tk test_ttk_guionly test_ttk_textonly
  test_zlib
 
 Looks like you are missing a bunch of development headers on the system
 (at the time Python's 'configure' was run).  E.g., on a Debian system,
 
  $ sudo apt-get install zlib1g-dev libbz-dev libreadline-dev # etc

On RHEL 5 (and therefore presumably CentOS), the corresponding command
looks something like this:

sudo yum install \
  readline-devel openssl-devel gmp-devel \
  ncurses-devel gdbm-devel zlib-devel expat-devel \
  libGL-devel tk tix gcc-c++ libX11-devel glibc-devel \
  bzip2 tar findutils pkgconfig tcl-devel tk-devel \
  tix-devel bzip2-devel sqlite-devel \
  db4-devel \
  libffi-devel

You'll want to rerun configure after installing these dependencies.

FWIW neither the devguide nor
  http://docs.python.org/using/unix.html#building-python
seems to have a handy guide to how to install all useful build-time deps
on various distros.

I added something similar for PyPy here:
http://codespeak.net/pypy/dist/pypy/doc/getting-started-python.html#translating-the-pypy-python-interpreter
at the PyCon sprint.

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] AST Transformation Hooks for Domain Specific Languages

2011-04-08 Thread David Malcolm

On Fri, 2011-04-08 at 21:29 +1000, Nick Coghlan wrote:
 A few odds and ends from recent discussions finally clicked into
 something potentially interesting earlier this evening. Or possibly
 just something insane. I'm not quite decided on that point as yet (but
 leaning towards the latter).

I too am leaning towards the latter (I'm afraid my first thought was to
check the date on the email); as Michael said, I too don't think it
stands much of a chance in core.

 Anyway, without further ado, I present:
 
 AST Transformation Hooks for Domain Specific Languages
 ==

This reminds me a lot of Mython:
  http://mython.org/
If you haven't seen it, it's well worth a look.

My favourite use case for this kind of thing is having the ability to
embed shell pipelines into Python code, by transforming bash-style
syntax into subprocess calls (it's almost possible to do all this in
regular Python by overloading the | and  operators, but not quite).

 Consider:
 
 # In some other module
 ast.register_dsl(dsl.sql, dsl.sql.TransformAST)

Where is this registered?   Do you have to import this other module
before importing the module using dsl.sql ?   It sounds like this is
global state for the interpreter.

 # In a module using that DSL

How is this usage expressed?  via the following line?

 import dsl.sql

I see the import dsl.sql here, but surely you have to somehow process
the import in order to handle the rest of the parsing.

This is reminiscent of the from __future__  specialcasing in the
parser.  But from my understanding of CPython's Python/future.c, you
already have an AST at that point (mod_ty, from Python/compile.c).
There seems to be a chicken-and-egg problem with this proposal.

Though another syntax might read:

  from __dsl__ import sql

to perhaps emphasize that something magical is about to happen.

[...snip example of usage of a DSL, and the AST it gets parsed to...]

Where and how would the bytes of the file usage the DSL get converted to
an in-memory tree representation?  

IIRC, manipulating AST nodes in CPython requires some care: the parser
has its own allocator (PyArena), and the entities it allocates have a
shared lifetime that ends when PyArena_Free occurs.

 So there you are, that's the crazy idea. The stoning of the heretic
 may now commence :)

Or, less violently, take it to python-ideas?  (though I'm not subscribed
there, fwiw, make of that what you will)

One exciting aspect of this is that if someone changes the DSL file,
the meaning of all of your code changes from under you.  This may or may
not be a sane approach to software development :)

(I also worry what this means e.g. for people writing text editors,
syntax highlighters, etc; insert usual Alan Perlis quote about syntactic
sugar causing cancer of the semicolon)

Also, insert usual comments about the need to think about how
non-CPython implementations of Python would go about implementing such
ideas.

 Where this idea came from was the various discussions about make
 statement style constructs and a conversation I had with Eric Snow at
 Pycon about function definition time really being *too late* to do
 anything particularly interesting that couldn't already be handled
 better in other ways. Some tricks Dave Malcolm had done to support
 Python level manipulation of the AST during compilation also played a
 big part, as did Eugene Toder's efforts to add an AST optimisation
 step to the compilation process.

Like I said earlier, have a look at Mython

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] rXXX links in the bug tracker after the migration to Mercurial

2011-03-04 Thread David Malcolm

On Fri, 2011-03-04 at 18:17 +0100, Georg Brandl wrote:
 On 04.03.2011 13:59, Victor Stinner wrote:
  Hi,
  
  Does the bug tracker will continue to support rX links after the
  migration to Mercurial? 
 
 Yes.  They will link to http://hg.python.org/lookup/rX, which uses
 the conversion metadata to find the correct hg revision.

Are these destinations meant to work yet?

I just tried one of these:
  http://hg.python.org/lookup/r81488
and it's giving me an Internal Server Error

FWIW, the above is a commit to release26-maint:
  http://svn.python.org/view?view=revisionrevision=81488


Thanks
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-03 Thread David Malcolm

On Wed, 2011-03-02 at 01:14 +0100, Martin v. Löwis wrote:
  I think a PEP would help, but in this case I would request that before
  the PEP gets written (it can be a really short one!) somebody actually
  go out and get consensus from a number of important distros. Besides
  Barry, do we have any representatives of distros here?
 
 Matthias Klose represents Debian, Dave Malcolm represents Redhat,
 and Dirkjan Ochtman represents Gentoo.

Current status within RHEL and Fedora:
  The python rpm package has:
- a /usr/bin/python, which is the system build of Python 2
- hardlinked with /usr/bin/python2.N (where N is the appropriate
minor release number; currently 2.7 for Fedora 14 onwards)
- a symlink /usr/bin/python2, pointing at /usr/bin/python

  There are a number of other rpm packages with names matching *py*,
which use the system build of Python 3

  There is a python3 package on Fedora 13 onwards with:
- a /usr/bin/python3, which is the system build of Python 3
- hardlinked with /usr/bin/python3.N (where N is the appropriate
minor release number; will be 3.2 as of Fedora 15)

  There are number of add-on rpm packages containing 3rd-party Python 3
code with names of the form python3-*.

  Some more status on our pre-packaged Python 3 stack can be seen here:
 https://fedoraproject.org/wiki/Python3

  I've also added python-debug and python3-debug binaries,
containing --with-pydebug builds of the same code.

On a related note, we have a number of scripts packaged across the
distributions with a shebang line that reads:
   #!/usr/bin/env python
which AIUI follows upstream recommendations.

There was a proposal to change these when packaging them to hardcode the
specific python binary:

https://fedoraproject.org/wiki/Features/SystemPythonExecutablesUseSystemPython
on the grounds that a packaged system script is expecting (and has been
tested against) a specific python build.

That proposal has not yet been carried out.  Ideally if we did this,
we'd implement it as a postprocessing phase within rpmbuild, rather
than manually patching hundreds of files.

Note that this would only cover shebang lines at the tops of scripts.

If a 3rd-party program launches python directly, that could fail, and I
don't see a convenient way of fixing every reference in all code in all
packages (without, say, running a SystemTap script to monitor for
programs exec-ing /usr/bin/python)

For example, I wonder what the automake macro for detecting python would
make of a /usr/bin/python that's python 3:
  http://www.gnu.org/software/hello/manual/automake/Python.html
I've seen a few hand-coded makefiles for Python extension modules that
were broken by the SOABI changes in PEP 3149.  To be fair, thouse
makefiles were badly written, but I think that changing the meaning
of /usr/bin/python would break a lot of things.

FWIW, I don't see the harm in providing a /usr/bin/python2 symlink, but
I don't plan to change /usr/bin/python at this time.


Hope this is helpful
Dave


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Support the /usr/bin/python2 symlink upstream

2011-03-03 Thread David Malcolm

On Thu, 2011-03-03 at 14:17 -0500, David Malcolm wrote:
 On Wed, 2011-03-02 at 01:14 +0100, Martin v. Löwis wrote:

   There are a number of other rpm packages with names matching *py*,
 which use the system build of Python 3
Gah; I meant Python 2 here.


(Must proofread my screeds before posting them)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-24 Thread David Malcolm

On Mon, 2011-01-24 at 21:17 +0100, Martin v. Löwis wrote:

... snip ...

 I'd like to propose PEP 393, which takes a different approach,
 addressing both problems simultaneously: by getting a flexible
 representation (one that can be either 1, 2, or 4 bytes), we can
 support the full range of Unicode on all systems, but still use
 only one byte per character for strings that are pure ASCII (which
 will be the majority of strings for the majority of users).

There was some discussion about this at PyCon 2010, where we referred to
it casually as Pay-as-you-go unicode

... snip ...

 - str: shortest-form representation of the unicode string; the lower
   two bits of the pointer indicate the specific form:
   01 = 1 byte (Latin-1); 11 = 2 byte (UCS-2); 11 = 4 byte (UCS-4);
Repetition of 11; I'm guessing that the 2byte/UCS-2 should read 10,
so that they give the width of the char representation.

   00 = null pointer

Naturally this assumes that all pointers are at least 4-byte aligned (so
that they can be masked off).  I assume that this is sane on every
platform that Python supports, but should it be spelled out explicitly
somewhere in the PEP?

 
   The string is null-terminated (in its respective representation).
 - hash, state: same as in Python 3.2
 - utf8_length, utf8: UTF-8 representation (null-terminated)
If this is to share its buffer with the str representation for the
Latin-1 case, then I take it this ptr will typically be (str  ~4) ?
i.e. only str has the low-order-bit type info.

 - wstr_length, wstr: representation in platform's wchar_t
   (null-terminated). If wchar_t is 16-bit, this form may use surrogate
   pairs (in which cast wstr_length differs form length).
 
 All three representations are optional, although the str form is
 considered the canonical representation which can be absent only
 while the string is being created.

Spelling out the meaning of optional:
  does this mean that the relevant ptr is NULL; if so, if utf8 is null,
is utf8_length undefined, or is it some dummy value?  (i.e. is the
pointer the first thing to check before we know if utf8_length is
meaningful?); similar consideration for the wstr representation.


 The Py_UNICODE type is still supported but deprecated. It is always
 defined as a typedef for wchar_t, so the wstr representation can double
 as Py_UNICODE representation.
 
 The str and utf8 pointers point to the same memory if the string uses
 only ASCII characters (using only Latin-1 is not sufficient). The str
...though the ptrs are non-equal for this case, as noted above, as str
has an 0x1 typecode.

 and wstr pointers point to the same memory if the string happens to
 fit exactly to the wchar_t type of the platform (i.e. uses some
 BMP-not-Latin-1 characters if sizeof(wchar_t) is 2, and uses some
 non-BMP characters if sizeof(wchar_t) is 4).
 
 If the string is created directly with the canonical representation
 (see below), this representation doesn't take a separate memory block,
 but is allocated right after the PyUnicodeObject struct.

Is the idea to do pointer arithmentic when deleting the PyUnicodeObject
to determine if the ptr is in that location, and not delete it if it is,
or is there some other way of determining whether the pointers need
deallocating?  If the former, is this embedding an assumption that the
underlying allocator couldn't have allocated a buffer directly adjacent
to the PyUnicodeObject.  I know that GNU libc's malloc/free
implementation has gaps of two machine words between each allocation;
off the top of my head I'm not sure if the optimized Object/obmalloc.c
allocator enforces such gaps.

... snip ...

Extra section:

GDB Debugging Hooks
---
Tools/gdb/libpython.py contains debugging hooks that embed knowledge
about the internals of CPython's data types, include PyUnicodeObject
instances.  It will need to be slightly updated to track the change.

(I can do that change if need be; it shouldn't be too hard).



Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Python-checkins] devguide: Add coredev.rst to the index.

2011-01-11 Thread David Malcolm

On Tue, 2011-01-11 at 21:56 +0100, brett.cannon wrote:
 brett.cannon pushed a2d0edc3420e to devguide:
 
 http://hg.python.org/devguide/rev/a2d0edc3420e
 changeset:   83:a2d0edc3420e
 tag: tip
 user:Brett Cannon br...@python.org
 date:Tue Jan 11 12:56:47 2011 -0800
 summary:
   Add coredev.rst to the index.
 
 files:
   faq.rst
   index.rst
 
 diff --git a/faq.rst b/faq.rst
 --- a/faq.rst
 +++ b/faq.rst
 @@ -41,9 +41,6 @@
  Repository  read-only  
 read-write
  --- -- 
 --
  PEPshttp://svn.python.org/projects/peps/trunk  
 svn+ssh://python...@svn.python.org/peps/trunk
 -2.7 http://svn.python.org/projects/python/branches/release27-maint 
 svn+ssh://python...@svn.python.org/python/branches/release27-maint
 -3.1 http://svn.python.org/projects/python/branches/release31-maint 
 svn+ssh://python...@svn.python.org/python/branches/release31-maint
 -3.2 http://svn.python.org/projects/python/branches/py3k
 svn+ssh://python...@svn.python.org/python/branches/py3k
  === == 
 ==
  
 
 diff --git a/index.rst b/index.rst

Was this removal of some of the SVN info from faq.rst an accident?

[...snip addition of coredev.rst to index.rst...]


Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2011-01-03 Thread David Malcolm

On Sun, 2011-01-02 at 19:18 -0800, Guido van Rossum wrote:
 On Sun, Jan 2, 2011 at 5:50 PM, Alex Gaynor alex.gay...@gmail.com wrote:
  No, it's singularly impossible to prove that any global load will be any 
  given
  value at compile time.  Any optimization based on this premise is wrong.
 
 True.
 
 My proposed way out of this conundrum has been to change the language
 semantics slightly so that global names which (a) coincide with a
 builtin, and (b) have no explicit assignment to them in the current
 module, would be fair game for such optimizations, with the
 understanding that the presence of e.g. len = len anywhere in the
 module (even in dead code!) would be sufficient to disable the
 optimization.
 
 But barring someone interested in implementing something based on this
 rule, the proposal has languished for many years.

Is there a PEP for this?

 
 FWIW, this is reminiscent of Fortran's rules for intrinsics (its
 name for builtins), which have a similar optimization behavior (except
 there the potential overrides that the compiler doesn't need to take
 into account are load-time definitions).

I've been attempting another way in:
  http://bugs.python.org/issue10399
using a new JUMP_IF_SPECIALIZABLE opcode.  This compares what a value
is against a compile-time prediction, branching to an optimized
implementation if the guess was correct.  I use this to implement
function-call inlining within the generated bytecode.

Caveat-of-doom: That code's very much a work-in-progress at this stage,
though: sometimes it doesn't segfault :) and the way that I track the
predicted values is taking some other liberties with semantics (see that
URL and the dmalcolm-ast-optimization-branch in SVN).

(There's probably at least 2 PEPs in the above idea, though have yet to
write my first PEP)

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 11: Dropping support for ten year old systems

2010-12-07 Thread David Malcolm

On Tue, 2010-12-07 at 00:05 +0100, Martin v. Löwis wrote: 
  So by this policy, RHEL and SuSE users would be off worse than with
  my original proposal (10 years).
  
  Red Hat continues to provide patches for RHEL within the Extended Life
  Cycle (years 8, 9 and 10), but it's an optional add-on.
 
 My understanding is that you keep the patches available - but you
 don't produce any new ones, right?

It typically involves backporting (and testing) security fixes to the
older versions of the various OS packages.  Whether or not the results
of that work count as new patches is debatable.

I don't know if CentOS and the other people who rebuild the RHEL sources
track those final 3 years.

Dave


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 11: Dropping support for ten year old systems

2010-12-06 Thread David Malcolm

On Mon, 2010-12-06 at 10:18 +0100, Martin v. Löwis wrote:
  EOL dates of prominent Linux distribution :
 
 I think I would need more information than that. Nick's proposal was
 more specific: when does the vendor stop producing patches? This is
 a clear criterion, and one that I support.
 
  RHEL:
  https://access.redhat.com/support/policy/updates/errata/
 
 My interpretation: Python support until end of production phase 3 (7 years).

(...)

 So by this policy, RHEL and SuSE users would be off worse than with
 my original proposal (10 years).

Red Hat continues to provide patches for RHEL within the Extended Life
Cycle (years 8, 9 and 10), but it's an optional add-on.

So another interpretation of the above with Nick's proposal could be 10
years on RHEL.  (though obviously I'm biased in favor of RHEL)

Approaching this from another angle: please do add me to the nosy on
any compatibility bugs with running latest python code on RHEL.  I'm
also looking into getting RHEL buildbot machines, FWIW.


  Considering the nature of the Fedora project, dropping unsupported fedora 
  distributions may or may not be helpful for Pyhton and it's users.
 
 Again, for Linux, I think the issue is somewhat less critical: in terms
 of portability and ABI stability, it seems like they manage best (i.e.
 we have least version-dependent code for Linux in Python, probably
 because a Linux version doesn't exist in the first place, so
 distributions must provide source and binary compatibility even
 across vendors, making such support across versions more easy).

The other compat issues are in the toolchain: e.g. very recent versions
of gcc .  In downstream Fedora, we tend to be amongst the first to run
into new compilation warnings (and, occasionally, exciting
code-generation bugs...)

But this tends to be the opposite kind of problem: beginning of life,
rather than end-of-life, and these sorts of things will need fixing for
every Linux build eventually.

FWIW, I'm trying to keep Fedora's system python 2 and python 3 builds
as up-to-date as reasonable, so Fedora users will (I hope) be running
fairly recent code python as is.  We have 2.7 as /usr/bin/python as of
F14, for instance.


Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Continuing 2.x

2010-10-29 Thread David Malcolm

On Fri, 2010-10-29 at 09:11 +0200, Antoine Pitrou wrote:
 On Fri, 29 Oct 2010 02:55:55 -0400
 Glyph Lefkowitz gl...@twistedmatrix.com wrote:
  
  Let's say that 20% of the code on PyPI is just junk;
  it's unfair to expect 100% of all code ever to get ported.  But,
 still:
  with this back-of-the-envelope estimate of the rate of porting, it
 will
  take over 50 years before a decisive majority of Python code is on
  Python 3.
 
 Well, no. A decisive majority would be much smaller than that. There
 are probably between 2% and 5% of the CheeseShop entries which are
 widely used dependencies. When these 2 to 5% all get ported, you have
 a
 decisive majority.
 
 Yes, perhaps more than 50% of 2.x code will never get ported. But,
 perhaps more than 50% of 1.5.2 code never got upgraded either. That
 doesn't make it any decisive; just dead (or pining for security fixes
 in some old rusty RedHat Enterprise Linux server, if you prefer).

Ouch!  Having spent much of the last week doublechecking fixes for CVEs
in the builds of python 2.2, 2.3 and 2.4 in the various older RHEL
releases, that cuts deep :)

Red Hat's security team monitors vulnerabilities in Python, and we do
continue to support these releases in the context of our products, even
though they're no longer supported by the wider Python development
community.  As with the the security work done by python-dev on the more
up-to-date Python releases, it's tedious and painstaking work (we do
have customers paying us to do it, though)

If you have concerns about specific security flaws that may affect the
older releases of python that are no longer supported by python.org but
are within a product supported by Red Hat, please email
secal...@redhat.com

See:
https://www.redhat.com/security/team/contact/
for more information.

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [patch] fpconst for python3

2010-10-20 Thread David Malcolm

(my apologies, if necessary, for top-posting)

FWIW Neal asked about this on Fedora's development mailing list as well:
http://lists.fedoraproject.org/pipermail/devel/2010-October/144535.html

If I'm reading:
  http://pypi.python.org/pypi/fpconst/
correctly, that project hasn't had an upstream update in over four
years, and the upstream home page seems to be down.

I believe that there are a number of python 2 modules that are either
mature or good enough, but need porting to work with python 3, and in
some of these cases, upstream may have either disappeared, or lost
interest.

I'm guessing that Neal has already made an effort to contact the
maintainer of fpconst.

IIRC, fpconst is a relatively simple module, and the fixes to port to
python 3 seem simple.

So IMHO this is on-topic for python.org (if not necessarily this list),
in that we have an interest in timely porting of Python 2 code to Python
3, and there has been discussion here on how to encourage people to port
their code.

We don't want each Linux distribution to do different patches to port
the code to Python 3: there ought to be some kind of clearing-house for
this kind of simple porting, so that python 3 stacks will work
consistently across the different distributions.

Whether or not that's python-dev, I'm not sure. Perhaps:
  http://mail.python.org/mailman/listinfo/python-porting
is a better list.

FWIW you can see the current status of Python 3 porting within Fedora
here:
  https://fedoraproject.org/wiki/Python3#Porting_status

Hope this is helpful
Dave

On Tue, 2010-10-19 at 20:07 -0500, Benjamin Peterson wrote:
 fpconst developers?
 
 2010/10/19 Neal Becker ndbeck...@gmail.com:
  Where should I send this patch?
 
  diff -u fpconst-0.7.2/fpconst.py fpconst-0.7.2.new/fpconst.py
  --- fpconst-0.7.2/fpconst.py2005-02-24 12:42:03.0 -0500
  +++ fpconst-0.7.2.new/fpconst.py2010-10-19 20:55:07.407765664 -0400
  @@ -40,18 +40,18 @@
   ident = $Id: fpconst.py,v 1.16 2005/02/24 17:42:03 warnes Exp $
 
   import struct, operator
  +from functools import reduce
 
   # check endianess
  -_big_endian = struct.pack('i',1)[0] != '\x01'
  -
  +_big_endian = struct.pack('i',1)[0] != 1
   # and define appropriate constants
   if(_big_endian):
  -NaN= struct.unpack('d', '\x7F\xF8\x00\x00\x00\x00\x00\x00')[0]
  -PosInf = struct.unpack('d', '\x7F\xF0\x00\x00\x00\x00\x00\x00')[0]
  +NaN= struct.unpack('d', b'\x7F\xF8\x00\x00\x00\x00\x00\x00')[0]
  +PosInf = struct.unpack('d', b'\x7F\xF0\x00\x00\x00\x00\x00\x00')[0]
  NegInf = -PosInf
   else:
  -NaN= struct.unpack('d', '\x00\x00\x00\x00\x00\x00\xf8\xff')[0]
  -PosInf = struct.unpack('d', '\x00\x00\x00\x00\x00\x00\xf0\x7f')[0]
  +NaN= struct.unpack('d', b'\x00\x00\x00\x00\x00\x00\xf8\xff')[0]
  +PosInf = struct.unpack('d', b'\x00\x00\x00\x00\x00\x00\xf0\x7f')[0]
  NegInf = -PosInf
 
   def _double_as_bytes(dval):
 
  ___
  Python-Dev mailing list
  Python-Dev@python.org
  http://mail.python.org/mailman/listinfo/python-dev
  Unsubscribe: 
  http://mail.python.org/mailman/options/python-dev/benjamin%40python.org
 
 
 
 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pronouncement needed in issue9675

2010-09-28 Thread David Malcolm

On Tue, 2010-09-28 at 04:44 +0200, Jesus Cea wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 http://bugs.python.org/issue9675
 
 Long history sort: Python 2.7 backported Capsule support and
 (incorrectly, in my opinion) marked CObject as deprecated.
 
 All C modules in the stdlib were updated to Capsule (with a CObject
 compatibility layer), except BSDDB, because this change was done late in
 the cycle, the proposed patch was buggy (solvable) and a pronouncement
 was done that CObject was not actually deprecated.
 
 But in python 2.7 release, CObject is marked as deprecated (arg!), so
 when executing python with -We (mark warnings as errors), bsddb fails.
 
 Since I think that adopting Capsule in BSDDB for 2.7.1 would break the
 API compatibility (maybe the CObject proxy would solve this), and since
 a previous pronouncement was done abour CObject not-deprecated in 2.7.x,
 I would like comments.
 
 Long history and links to previous pronouncements in
 http://bugs.python.org/issue9675
 
 My proposal: CObject should not be marked as deprecated in 2.7.1.

This breaks quite a few third-party modules, some with segfaults; we (as
in Fedora) ran into this building with 2.7 as the
standard /usr/bin/python for Fedora 14.

We fixed some of them, but are advising people not to turn on warnings
as errors in our release notes [1]

See this tracker bug for a survey of the gory details:
https://bugzilla.redhat.com/showdependencytree.cgi?id=620842hide_resolved=0

The list includes SWIG, numpy, and the GTK stack.

I'd be more than happy to patch Fedora's python builds to remove the
deprecation warning, but I don't want to stray too far from python.org's
official behavior here.

So, I'm +1 on removing this deprecation warning in 2.7, FWIW (though I'm
not a committer yet)

Hope this is helpful
Dave
[1]
https://fedoraproject.org/wiki/Features/Python_2.7#Caveat:_PyCObject_and_warnings


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Pronouncement needed in issue9675

2010-09-28 Thread David Malcolm

On Tue, 2010-09-28 at 11:18 -0400, David Malcolm wrote:
 On Tue, 2010-09-28 at 04:44 +0200, Jesus Cea wrote:
[snip]

  Long history and links to previous pronouncements in
  http://bugs.python.org/issue9675

Re-reading my post, I realize that my wording was really unclear; sorry.

  
  My proposal: CObject should not be marked as deprecated in 2.7.1.
 
 This breaks quite a few third-party modules, some with segfaults; we (as
 in Fedora) ran into this building with 2.7 as the
 standard /usr/bin/python for Fedora 14.

Restating: the _deprecation_ breaks quite a few 3rd-party modules, and
I'm in agreement with Jesus' proposal.

[snip]


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Internal counter to debug leaking file descriptors

2010-08-31 Thread David Malcolm


On Tue, 2010-08-31 at 17:40 +, exar...@twistedmatrix.com wrote:
 On 05:22 pm, gl...@twistedmatrix.com wrote:
 
 On Aug 31, 2010, at 10:03 AM, Guido van Rossum wrote:
 On Linux you can look somewhere in /proc, but I don't know that it
 would help you find where a file was opened.
 
 /dev/fd is actually a somewhat portable way of getting this 
 information.  I don't think it's part of a standard, but on Linux it's 
 usually a symlink to /proc/self/fd, and it's available on MacOS and 
 most BSDs (based on a hasty and completely-not-comprehensive 
 investigation).  But it won't help you find out when the FDs were 
 originally opened, no.
 ___
 
 On OS X and Solaris, dtrace and ustack will tell you exactly when and 
 where the FDs were originally opened, though.  On Linux, SystemTap might 
 give you the same information (but I know much less about SystemTap). 
 If http://bugs.python.org/issue4111 is resolved, then this may even be 
 possible without using a patched version of Python.

I believe you can do something like this:
$ cat /tmp/trace-all-syscalls.stp 
/*
  Watch all syscalls in a specified process, dumping a user-space
  backtrace 
*/
probe syscall.* {
  if (pid() == target()) {
  printf(%s(%s)\n, probefunc(), argstr)
  print_ubacktrace();
  }
}

$ sudo stap --ldd -d /usr/bin/python /tmp/trace-all-syscalls.stp -c python -c 
'print 42'

This generates a torrent of debug data like this:
sys_mmap_pgoff(0x0, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 
0)
 0x38f44e17aa : mmap64+0xa/0x30 [libc-2.11.90.so]
 0x38f44673fc : _IO_file_doallocate+0x7c/0x110 [libc-2.11.90.so]
 0x38f447498c : _IO_doallocbuf+0x2c/0x50 [libc-2.11.90.so]
 0x38f4472ef4 : _IO_file_underflow@@GLIBC_2.2.5+0x1b4/0x230 [libc-2.11.90.so]
 0x38f44749ce : _IO_default_uflow+0xe/0x30 [libc-2.11.90.so]
 0x38f446fdcb : getc+0xab/0xf0 [libc-2.11.90.so]
 0x39054f3e13 : r_long+0x23/0x120 [libpython2.6.so.1.0]
 0x39054f3f3b : PyMarshal_ReadLongFromFile+0x2b/0x30 [libpython2.6.so.1.0]
 0x39054f0661 : load_source_module+0x271/0x640 [libpython2.6.so.1.0]
 0x39054f1cc5 : import_submodule+0x155/0x300 [libpython2.6.so.1.0]
 0x39054f1f85 : load_next+0x115/0x2a0 [libpython2.6.so.1.0]
 0x39054f2592 : import_module_level+0x212/0x730 [libpython2.6.so.1.0]
 0x39054f3314 : PyImport_ImportModuleLevel+0x44/0xb0 [libpython2.6.so.1.0]
 0x39054d843f : builtin___import__+0x8f/0xa0 [libpython2.6.so.1.0]
 0x3905443f43 : PyObject_Call+0x53/0x100 [libpython2.6.so.1.0]
 0x39054d89b3 : PyEval_CallObjectWithKeywords+0x43/0xf0 [libpython2.6.so.1.0]
 0x39054db674 : PyEval_EvalFrameEx+0x21b4/0x65b0 [libpython2.6.so.1.0]
 0x39054e03a8 : PyEval_EvalCodeEx+0x938/0x9e0 [libpython2.6.so.1.0]
 0x39054e0482 : PyEval_EvalCode+0x32/0x40 [libpython2.6.so.1.0]
 0x39054f02c2 : PyImport_ExecCodeModuleEx+0xc2/0x1f0 [libpython2.6.so.1.0]
 0x39054f07a6 : load_source_module+0x3b6/0x640 [libpython2.6.so.1.0]


You may want to specify specific syscalls in the above to narrow the
scope.

Issue 4111 patches cpython to statically mark Python frame entry/exit so
that systemtap can directly instrument that; in Fedora 13 onwards I've
built Python with systemtap hooks so that you can add:

probe python.function.entry {
printf(%s:%s:%d\n, filename, funcname, lineno);
}


(Arguably this is wrong, it's frame entry/exit, rather than function
entry/exit).

Potentially systemtap could be taught how to decipher/prettyprint Python
backtraces in a similar way to how gdb does it (by hooking into
PyEval_EvalFrameEx)


Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] regrtest oddify

2010-04-08 Thread David Malcolm

On Wed, 2010-04-07 at 18:20 -0500, Benjamin Peterson wrote:
 2010/4/7 Martin v. Löwis mar...@v.loewis.de:
  I have commented out all tests in test_gdb, yet
 
  http://www.python.org/dev/buildbot/all/builders/sparc%20Ubuntu%20trunk/builds/47/steps/test/logs/stdio
 
  still shows them being run. Can anybody explain that, please?
 
 That's because the buildbot only updated to the revision before your change.

FWIW I've attached a patch [1] to http://bugs.python.org/issue8330 which
I believe may fix the issues seen in that log.

Hope this is helpful
Dave

[1] http://bugs.python.org/file16808


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Packaging JIT-less versions of Python

2010-03-01 Thread David Malcolm

On Mon, 2010-03-01 at 15:35 -0800, Collin Winter wrote:
 Hey packaging guys,
 
 We recently committed a change to Unladen Swallow [1] that moves all
 the JIT infrastructure into a Python extension module. The theory [2]
 behind this patch was that this would make it easier for downstream
 packagers to ship a JIT-less Python package, with the JIT compiler
 available via an optional add-on package.
 
 Some questions for you, so we're not shooting blind here:
 - Have you guys thought about how a JIT-enabled Python 3 installation
 would be packaged by your respective distros?
I suspect that there may be architecture-specific issues both in getting
LLVM to work, and in getting the Python JIT code running (e.g. I've no
idea how well it works on, say, s390).  The answers may vary between CPU
architecture (e.g. on by default for i386 and x86_64, off by default on
ppc, or whatever).


 - Would you prefer the default python3.x package to have a JIT, or
 would you omit the JIT by default?

I'd be inclined to turn it on by default in Fedora's rawhide
development branch as soon as it's meaningful to test it (with a
suitable warning on our mailing list); we like to try this kind of
thing!  If no major issues are found, I'd leave it turned on.  I'd
expect to run into fun CPU-arch-specific bugs, though.  Multicore
64-bit big-endian springs to mind.

I can't speak for what I'd do for RHEL at this time.  It's likely to be
affected by how experience in Fedora goes, but goes through more formal
QA, and on more enterprisey hardware (e.g. s390).


 - How would you prefer to build the JIT-less package (current options:
 via a ./configure flag; or by deleting _llvmjit.so from the
 JIT-enabled package)?
 - Would the two packages be able to exist side-by-side, or would they
 be mutually exclusive?

I have a particular interest in ABI compatibility: if turning JIT on and
off is going to change the ABI of extension modules, that would be a
major pain, as I hope that we will have dozens of C extension modules
available via RPM for our Python 3 stack by the time of the great
unladen merger.

So I'm keen for the ability to toggle the JIT code in the face of bugs
and have it not affect ABI.  -Xjit will do this at runtime (once
that's renamed), but I think it would be useful to be able to toggle the
JIT on/off default during the build, so that I can fix a broken
architecture for non-technical users, but have individual testers opt
back in with -Xjit whilst tracking down a major bug.

In either case, I don't want to have to recompile 30 extension modules
to try with/without JIT; that would introduce too much change during
bug-hunts, and be no fun at all.


(In the blue-sky nirvana future, I'd love to be able to ship
ahead-of-time compiled versions of the stdlib, pre-optimized based on
realworld workloads.  Back in my reality, though, I have bugs to fix
before I can work on _that_ patch :( )


 My strong preference would be to have the JIT included by default so
 that it receives as much testing as possible.

Sounds reasonable.  Hope the above made sense and is useful.
Dave

 
 Thanks,
 Collin Winter
 
 [1] - http://code.google.com/p/unladen-swallow/source/detail?r=1110
 [2] - http://code.google.com/p/unladen-swallow/issues/detail?id=136


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-27 Thread David Malcolm

On Thu, 2010-01-21 at 14:46 -0800, Jeffrey Yasskin wrote:
 On Thu, Jan 21, 2010 at 10:09 AM, Hanno Schlichting ha...@hannosch.eu wrote:
  I'm a relative outsider to core development (I'm just a Plone release
  manager), but'll allow myself a couple of questions. Feel free to
  ignore them, if you think they are not relevant at this point :-) I'd
  note that I'm generally enthusiastic and supportive of the proposal :)
  As a data point, I can add that all tests of Zope 2.12 / Plone 4.0 and
  their dependency set run fine under Unladen Swallow.
 
 Hi, thanks for checking that!
 
  On Wed, Jan 20, 2010 at 11:27 PM, Collin Winter collinwin...@google.com 
  wrote:
  We have chosen to reuse a set of existing compiler libraries called LLVM
  [#llvm]_ for code generation and code optimization.
 
  Would it be prudent to ask for more information about the llvm
  project? Especially in terms of its non-code related aspects. I can
  try to hunt down this information myself, but as a complete outsider
  to the llvm project this takes much longer, compared to someone who
  has interacted with the project as closely as you have.
 
  Questions like:
 

[snip]

  Managing LLVM Releases, C++ API Changes
  ---
 
  LLVM is released regularly every six months. This means that LLVM may be
  released two or three times during the course of development of a CPython 
  3.x
  release. Each LLVM release brings newer and more powerful optimizations,
  improved platform support and more sophisticated code generation.
 
  How does the support and maintenance policy of llvm releases look
  like? If a Python version is pegged to a specific llvm release, it
  needs to be able to rely on critical bug fixes and security fixes to
  be made for that release for a rather prolonged time. How does this
  match the llvm policies given their frequent time based releases?
 
 LLVM doesn't currently do dot releases. So, once 2.7 is released,
 it's very unlikely there would be a 2.6.1. They do make release
 branches, and they've said they're open to dot releases if someone
 else does them, so if we need a patch release for some issue we could
 make it ourselves. I recognize that's not ideal, but I also expect
 that we'll be able to work around LLVM bugs with changes in Python,
 rather than needing to change LLVM.

[snip]

(I don't think the following has specifically been asked yet, though
this thread has become large)

As a downstream distributor of Python, a major pain point for me is when
Python embeds a copy of a library's source code, rather than linking
against a system library (zlib, libffi and expat spring to mind): if
bugs (e.g. security issues) arise in a library, I have to go chasing
down all of the embedded copies of the library, rather than having
dynamic linking deal with it for me.

So I have some concerns about having a copy of LLVM embedded in Python's
source tree, which I believe other distributors of Python would echo;
our rough preference ordering is:

   dynamic linking  static linking  source code copy

I would like CPython to be faster, and if it means dynamically linking
against the system LLVM, that's probably OK (though I have some C++
concerns already discussed elsewhere in this thread).  If it means
statically linking, or worse, having a separate copy of the LLVM source
as an implementation detail of CPython, that would be painful.

I see that the u-s developers have been run into issues in LLVM itself,
and fixed them (bravo!), and seem to have done a good job of sending
those fixes back to LLVM for inclusion. [1]

Some questions for the U-S devs:
  - will it be possible to dynamically link against the system LLVM?
(the PEP currently seems to speak of statically linking against it)
  - does the PEP anticipate that the Python source tree will start
embedding a copy of the LLVM source tree?
  - if so, what can be done to mitigate the risk of drift from upstream?
(this is the motivation for some of the following questions)
  - to what extent do you anticipate further changes needed in LLVM for
U-S? (given the work you've already put in, I expect the answer is
probably a lot, but we can't know what those will be yet)
  - do you anticipate all of these changes being accepted by the
upstream LLVM maintainers?
  - to what extent would these changes be likely to break API and/or ABI
compat with other users of LLVM (i.e. would a downstream distributor of
CPython be able to simply apply the necessary patches to the system LLVM
in order to track? if they did so, would it require a recompilation of
all of the other users of the system LLVM?)
  - if Python needed to make a dot-release of LLVM, would LLVM allow
Python to increment the SONAME version identifying the ABI within the
DSO (.so) files, and guarantee not to reuse that SONAME version? (so
that automated ABI dependency tracking in e.g. RPM can identify the ABI
incompatibilities without being stomped on by a future upstream LLVM
release)
  -

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-27 Thread David Malcolm

On Wed, 2010-01-27 at 11:34 -0800, Jeffrey Yasskin wrote:
 On Wed, Jan 27, 2010 at 11:16 AM, Collin Winter collinwin...@google.com 
 wrote:
  We absolutely do not want CPython to include a copy of LLVM in its
  source tree. Unladen Swallow has done this to make it easier to pick
  up changes to LLVM's codebase as we make them, but this is not a
  viable model for CPython's long-term development. As mentioned in
  http://www.python.org/dev/peps/pep-3146/#managing-llvm-releases-c-api-changes,
  one of our full-time engineers is tasked with fixing all critical
  issues in LLVM before LLVM's 2.7 release so that CPython can simply
  use that release.
 
 I'm now tracking my to-do list for LLVM 2.7 in
 http://code.google.com/p/unladen-swallow/issues/detail?id=131.

Many thanks for addressing these concerns!

Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-21 Thread David Malcolm

On Thu, 2010-01-21 at 00:06 +0100, Martin v. Löwis wrote:
  Why only set an encoding on these streams when they're directly
  connected to a tty?
 
 If you are sending data to the terminal, you can be fairly certain
 that the locale's encoding should be used. It's a convenience feature
 for the interactive mode, so that Unicode strings print correctly.
 
 When sending data to a pipe or to a file, God knows what encoding
 should have been used. If it's any XML file (for example), using the
 locale's encoding would be incorrect, and the encoding declared
 in the XML declaration should be used (or UTF-8 if no declaration
 is included). If it's a HTTP socket, it really should be restricted
 to ASCII in the headers, and then to the content-type. And so on.
 
 So in general, the applications should arrange to the the encoding
 or encode themselves when they write to some output stream. If they
 fail to do so, it's a bug in the application, not in Python.
 
  I'll patch things to remove the isatty conditional if that's acceptable.
 
 It will make your Python release incompatible with everybody else's,
 and will probably lead to moji-bake. Otherwise, it's fine.

Thanks (everyone) for your feedback

It's clear that me unilaterally making this change is extremely
unpopular, so I'm no longer planning to do so: maintaining consistency
of behavior between different downstream distributions of CPython 2.* is
the most important concern here.

For reference I filed the tty patch as http://bugs.python.org/issue7745
(I don't seem to have rights to set it closed-rejected myself).


One of my concerns here is the change of behavior between Python
programs when run at a tty versus within a shell pipeline/cronjob/system
daemon/etc, which I know many people find to be a gotcha; I know many
developers who've been burned by this difference between
development/deployment (myself included).

I suspect I'm reinventing the wheel here, but one way of avoiding this
gotcha is to set PYTHONIOENCODING=ascii, to override the tty locale
setting.

Without this, these two cases have different behavior:
[da...@brick ~]$ python -c 'print u\u03b1\u03b2\u03b3'
αβγ

[da...@brick ~]$ python -c 'print u\u03b1\u03b2\u03b3'|less
Traceback (most recent call last):
  File string, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)


With PYTHONIOENCODING=ascii, the two cases have the same behavior:
[da...@brick ~]$ PYTHONIOENCODING=ascii python -c 'print u\u03b1\u03b2
\u03b3'
Traceback (most recent call last):
  File string, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)

[da...@brick ~]$ PYTHONIOENCODING=ascii python -c 'print u\u03b1\u03b2
\u03b3'|less
Traceback (most recent call last):
  File string, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)

(this is with my site.py reset back to the svn default)

So I think there's a case for suggesting that developers set
PYTHONIOENCODING=ascii in their environment, so ensure that attempts
to write unicode to a std stream using defaults will fail immediately
during the development cycle, rather than on deployment.   (Though,
alas, that will break the corresponding cases [1] for any python3
processes if they ever inherit that envvar).

Hope this is helpful
Dave

[1] in that this works:
[da...@brick ~]$ python3.1 -c 'print(\u03b1\u03b2\u03b3)'
αβγ

but this (naive and contrived) invocation of python3 from python2 fails:
[da...@brick ~]$ PYTHONIOENCODING=ascii python2.6 -c import os;
os.system('python3.1 -c \'print(\\u03b1\u03b2\u03b3\)\'')
Traceback (most recent call last):
  File string, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-2: ordinal not in range(128)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-21 Thread David Malcolm

On Wed, 2010-01-20 at 14:27 -0800, Collin Winter wrote:

[snip]

 At a high level, the Unladen Swallow JIT compiler works by translating a
 function's CPython bytecode to platform-specific machine code, using data
 collected at runtime, as well as classical compiler optimizations, to improve
 the quality of the generated machine code. Because we only want to spend
 resources compiling Python code that will actually benefit the runtime of the
 program, an online heuristic is used to assess how hot a given function is. 
 Once
 the hotness value for a function crosses a given threshold, it is selected for
 compilation and optimization. Until a function is judged hot, however, it runs
 in the standard CPython eval loop, which in Unladen Swallow has been
 instrumented to record interesting data about each bytecode executed. This
 runtime data is used to reduce the flexibility of the generated machine code,
 allowing us to optimize for the common case. For example, we collect data on
 
 - Whether a branch was taken/not taken. If a branch is never taken, we will 
 not
   compile it to machine code.
 - Types used by operators. If we find that ``a + b`` is only ever adding
   integers, the generated machine code for that snippet will not support 
 adding
   floats.
 - Functions called at each callsite. If we find that a particular ``foo()``
   callsite is always calling the same ``foo`` function, we can optimize the
   call or inline it away
 
 Refer to [#us-llvm-notes]_ for a complete list of data points gathered and how
 they are used.

[snip]

To what extent would it be possible to use (conditionally) use full
ahead-of-time compilation as well as JIT?

With my downstream distributor of Python hat on, I'm wondering if it
would be feasible to replace the current precompiled .pyc/.pyo files in
marshal format with .so/.dll files in platform-specific shared-library
format, so that the pre-compiled versions of the stdlib could be
memory-mapped and shared between all Python processes on a system.  This
ought to dramatically reduce the whole-system memory load of the various
Python processes, whilst giving a reduction in CPU usage.  Distributors
of Python could build these shared libraries as part of the packaging
process, so that e.g. all of the Fedora python3 rpm packages would
contain .so files for every .py  (and this could apply to packaged
add-ons as well, so that every module you import would typically be
pre-compiled); startup of a python process would then involve
shared-readonly mmap-ing these files (which would typically be already
paged in if you're doing a lot of Python).

Potentially part of the memory bloat you're seeing could be debug data;
if that's the case, then the debug information could be stripped from
those .so files and shipped in a debuginfo package, to be loaded on
demand by the debugger (we do something like this in Fedora with our
RPMs for regular shared libraries and binaries).

(I wonder if to do this well would require adding annotations to the
code with hints about types to expect, since you'd have to lose the
run-time instrumentation, I think).

I did some research into the benefits of mmap-ing the data in .pyc files
to try to share the immutable data between them.  Executive summary is
that a (rather modest) saving of about 200K of heap usage per python
process is possible that way (with a rewrite of PyStringObject), with
higher savings depending on how many modules you import;  see:
http://dmalcolm.livejournal.com/4183.html

I'd expect to see this approach be more worthwhile when the in-memory
sizes of the modules get larger (hence this email).

[snip]

Hope this is helpful
Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-21 Thread David Malcolm

On Thu, 2010-01-21 at 23:42 +0100, Martin v. Löwis wrote:
  With my downstream distributor of Python hat on, I'm wondering if it
  would be feasible to replace the current precompiled .pyc/.pyo files in
  marshal format with .so/.dll files in platform-specific shared-library
  format, so that the pre-compiled versions of the stdlib could be
  memory-mapped and shared between all Python processes on a system.
 
 I don't think replacing the byte code will be feasible, at least not
 without breaking compatibility (else f.func_code.co_code would stop
 working).

co_code would remain a PyObject* referencing a PyBytesObject instance.

 I also think you are overestimating the potential for sharing: much
 of what lives in pyc files are actual Python objects, which need to
 be reference-counted; doing this in a shared fashion is not feasible.
 
The struct PyObject instances themselves wouldn't be shared; my idea
(for 2.*) was to introduce a new ob_sstate value into PyStringObject
indicating a pointer into a shared memory area, so that this large
immutable data can be shared; something like this:

typedef struct {
PyObject_VAR_HEAD
long ob_shash;
int ob_sstate;
union {
  char ob_sval[1];
  char *ob_sdata;
};

...
} PyStringObject;

In Py3k the ob_sstate has gone away (from PyBytesObject), so another
approach would be needed (e.g. add an indirection to PyBytesObject).



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-21 Thread David Malcolm

On Thu, 2010-01-21 at 22:21 +0100, Martin v. Löwis wrote:
  Where the default *file system encoding* is used (i.e. text files are
  written or read without specifying an encoding)
 
 I think you misunderstand the notion of the *file system encoding*.
 It is *not* a file encoding, but the file *system* encoding, i.e.
 the encoding for file *names*, not for file *content*.
 
 It was used on Windows for Windows 95; it is not used anymore on Windows
 (although it's still used on Unix).
 
 I think there are way too many specific cases where Python 3 will encode
 implicitly to get a complete list from the memory. If you really are
 after a complete list, you'll need to perform a thorough code review.
 For a few examples where some kind of default encoding is applied,
 consider XML and the dbm interfaces.

Thanks for the clarification.

To add to the fun, libraries accessed via wrapper modules may have
their own ideas about filename encodings as well.  For example, GTK's
GLib library uses environment variables G_FILENAME_ENCODING and
G_BROKEN_FILENAMES to when converting between strings and OS calls [1].

Dave

[1] http://library.gnome.org/devel/glib/stable/glib-running.html

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-20 Thread David Malcolm

I'm thinking of making this downstream change to Fedora's site.py (and
possibly in future RHEL releases) so that the default encoding
automatically picks up the encoding from the locale:

 def setencoding():
 Set the string encoding used by the Unicode implementation.  The
 default is 'ascii', but if you're willing to experiment, you can
 change this.
 encoding = ascii # Default value set by _PyUnicode_Init()
-if 0:
+if 1:
 # Enable to support locale aware default string encodings.
 import locale
 loc = locale.getdefaultlocale()
 if loc[1]:
 encoding = loc[1]
 if 0:
 # Enable to switch off string to Unicode coercion and implicit
 # Unicode to string conversion.
 encoding = undefined
 if encoding != ascii:
 # On Non-Unicode builds this will raise an AttributeError...
 sys.setdefaultencoding(encoding) # Needs Python Unicode build !

I've written up extensive notes on the change and the history of the
issue here:
https://fedoraproject.org/wiki/Features/PythonEncodingUsesSystemLocale

Please let me know if there are any errors on that page!

The aim is to avoid strange behavior changes when running a script
within a shell pipeline/cronjob as opposed to at a tty (and to capture
some of the bizarre cornercases, for example, I found the behavior of
the pango/pygtk modules particularly surprising).

I mention it here as a heads-up about the change:
  - in case other distributions may want to do the same (or already do
so, though in my very brief survey no-one else seemed to), and
  - in case doing so breaks things in a way I'm not expecting; can
anyone see any flaws in my arguments?
  - in case other people find my notes on the issue useful

Hope this is helpful; can anyone see any potential problems with this
change?

Dave


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Proposed downstream change to site.py in Fedora (sys.defaultencoding)

2010-01-20 Thread David Malcolm

On Wed, 2010-01-20 at 22:37 +0100, M.-A. Lemburg wrote:
 David Malcolm wrote:
  I'm thinking of making this downstream change to Fedora's site.py (and
  possibly in future RHEL releases) so that the default encoding
  automatically picks up the encoding from the locale:
  
   def setencoding():
   Set the string encoding used by the Unicode implementation.  The
   default is 'ascii', but if you're willing to experiment, you can
   change this.
   encoding = ascii # Default value set by _PyUnicode_Init()
  -if 0:
  +if 1:
   # Enable to support locale aware default string encodings.
   import locale
   loc = locale.getdefaultlocale()
   if loc[1]:
   encoding = loc[1]
   if 0:
   # Enable to switch off string to Unicode coercion and implicit
   # Unicode to string conversion.
   encoding = undefined
   if encoding != ascii:
   # On Non-Unicode builds this will raise an AttributeError...
   sys.setdefaultencoding(encoding) # Needs Python Unicode build !
  
  I've written up extensive notes on the change and the history of the
  issue here:
  https://fedoraproject.org/wiki/Features/PythonEncodingUsesSystemLocale
  
  Please let me know if there are any errors on that page!
  
  The aim is to avoid strange behavior changes when running a script
  within a shell pipeline/cronjob as opposed to at a tty (and to capture
  some of the bizarre cornercases, for example, I found the behavior of
  the pango/pygtk modules particularly surprising).
  
  I mention it here as a heads-up about the change:
- in case other distributions may want to do the same (or already do
  so, though in my very brief survey no-one else seemed to), and
- in case doing so breaks things in a way I'm not expecting; can
  anyone see any flaws in my arguments?
- in case other people find my notes on the issue useful
  
  Hope this is helpful; can anyone see any potential problems with this
  change?
 
 Yes: such a change is unsupported by Python. The code you are
 changing should really have been removed many releases ago -
 it was originally only intended to serve as basis for experimentation
 on choosing the right default encoding.
 
 The only supported default encodings in Python are:
 
  Python 2.x: ASCII
  Python 3.x: UTF-8
 
 If you change these, you are on your own and strange things will
 start to happen. The default encoding does not only affect
 the translation between Python and the outside world, but also
 all internal conversions between 8-bit strings and Unicode.

 Hacks like what's happening in the pango module (setting the
 default encoding to 'utf-8' by reloading the site module in
 order to get the sys.setdefaultencoding() API back) are just
 downright wrong and will cause serious problems since Unicode
 objects cache their default encoded representation.

Thanks for the feedback.

Note that pango isn't even doing the module reload hack; it's written in
C, and going in directly through the C API:
   PyUnicode_SetDefaultEncoding(utf-8);

I should mention that I've seen at least one C module in the wild that
exists merely to do this:

  #include Python.h
  void initutf8_please(void) {
 PyUnicode_SetDefaultEncoding(utf-8);
  }

so that the user could do import utf8_please at the top of their
scripts.

 If all you want to achieve is getting the encodings of
 stdout and stdin correctly setup for pipes, you should
 instead change the .encoding attribute of those (only).
Currently they are set up, but only when connected to a tty, which leads
to surprising behavior changes inside pipes/cronjobs (e.g. piping a
unicode string to less immediately breaks for code points above 127:
less is expecting locale-encoded bytes, but sys.stdout has encoding
ASCII).

Similarly:
[da...@brick ~]$ python -c import sys; print sys.stdout.encoding
UTF-8
[da...@brick ~]$ python -c import sys; print sys.stdout.encoding | cat
None

Why only set an encoding on these streams when they're directly
connected to a tty?  I'll patch things to remove the isatty conditional
if that's acceptable.

(the tty-logic to do it appeared with the initial commit that added
locale-encoding support to sys.std[in|out], in sysmodule.c:
http://svn.python.org/view?view=revrevision=32719
and was later moved from sysmodule.c to pythonrun.c:
http://svn.python.org/view?view=revrevision=33817 
it later grew to affect stderr:
http://svn.python.org/view?view=revrevision=43581
again, only if directly connected to a tty)

Dave

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Static analysis of CPython using coccinelle/spatch

2009-11-18 Thread David Malcolm

On Tue, 2009-11-17 at 13:03 -0800, Brett Cannon wrote:
 On Mon, Nov 16, 2009 at 12:27, David Malcolm dmalc...@redhat.com wrote:
  Has anyone else looked at using Coccinelle/spatch[1] on CPython source
  code?
[snip]

 Running the tool over the code base and reporting the found bugs would
 be appreciated.

Discounting the false positives, the only issue it finds in python
itself (trunk) is the deliberate mistake in Modules/xxmodule.c

I also ran it on a random sample of extension modules and found some
real bugs (only reported downstream so far, within Fedora's bug
tracker):
- DBus python bindings assume in one place that unsigned long is
32 bits wide: https://bugzilla.redhat.com/show_bug.cgi?id=538225
- MySQL-python assumes in one place that sizeof(int) ==
sizeof(long):
https://bugzilla.redhat.com/show_bug.cgi?id=538234
- rpm.ps.append() uses unrecognized 'N' format specifier:
https://bugzilla.redhat.com/show_bug.cgi?id=538218


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] 2to3 for .c code (was Static analysis of CPython using coccinelle/spatch)

2009-11-18 Thread David Malcolm

On Tue, 2009-11-17 at 19:45 -0500, Terry Reedy wrote:
 A.M. Kuchling wrote:
  On Mon, Nov 16, 2009 at 03:27:53PM -0500, David Malcolm wrote:
  Has anyone else looked at using Coccinelle/spatch[1] on CPython source
  code?
  
  For an excellent explanation of Coccinelle, see
  http://lwn.net/Articles/315686/.
 
 For those who have not looked, Coccinelle means ladybug (a bug-eating 
 bug ;-) in French. Its principle use to to take C code and a SmPl file 
 of high-level patch descriptions (fixers, in 2to3 talk) and produce a 
 standard diff file. I wonder if this could be used to help people 
 migrate C extensions to 3.1, by developing a SmPl file with the needed 
 changes dictated by API changes. This is similar to its motivating 
 application to Linux. From
 
 http://coccinelle.lip6.fr/
 
 Coccinelle is a program matching and transformation engine which 
 provides the language SmPL (Semantic Patch Language) for specifying 
 desired matches and transformations in C code. Coccinelle was initially 
 targeted towards performing collateral evolutions in Linux. Such 
 evolutions comprise the changes that are needed in client code in 
 response to evolutions in library APIs, and may include modifications 
 such as renaming a function, adding a function argument whose value is 
 somehow context-dependent, and reorganizing a data structure. 
 
 As I understand it, the problem with C extensions and 3.1 is the current 
 lack of a collateral evolution tool like 2to3 for Python code.
Indeed; I think it may be possible to use Coccinelle for this.  

Here's a .cocci semantic patch to convert non-PyObject* dereferences of
an ob_type field to use Py_TYPE macro instead. 

@@
PyObject *py_obj_ptr;
type T; 
T non_py_obj_ptr;
@@
(
  py_obj_ptr-ob_type
|
- non_py_obj_ptr-ob_type
+ Py_TYPE(non_py_obj_ptr) 
)


I was able to use this to generate the attached patch for the DBus
python bindings.  Note that it leaves dereferences of a PyObject*
untouched, and works inside sub-expressions.  (There's some noise at the
typedef of Server; I don't know why).

Hope this is helpful
Dave
diff --git a/_dbus_bindings/conn.c b/_dbus_bindings/conn.c
index c30f167..157218d 100644
--- a/_dbus_bindings/conn.c
+++ b/_dbus_bindings/conn.c
@@ -389,7 +389,7 @@ static void Connection_tp_dealloc(Connection *self)
 
 DBG(Connection at %p: freeing self, self);
 PyErr_Restore(et, ev, etb);
-(self-ob_type-tp_free)((PyObject *)self);
+(Py_TYPE(self)-tp_free)((PyObject *)self);
 }
 
 /* Connection type object === */
diff --git a/_dbus_bindings/libdbusconn.c b/_dbus_bindings/libdbusconn.c
index 9bd8def..c33bb7c 100644
--- a/_dbus_bindings/libdbusconn.c
+++ b/_dbus_bindings/libdbusconn.c
@@ -73,7 +73,7 @@ DBusPyLibDBusConnection_tp_dealloc(Connection *self)
 }
 
 PyErr_Restore(et, ev, etb);
-(self-ob_type-tp_free)((PyObject *) self);
+(Py_TYPE(self)-tp_free)((PyObject *) self);
 }
 
 PyTypeObject DBusPyLibDBusConnection_Type = {
diff --git a/_dbus_bindings/message.c b/_dbus_bindings/message.c
index a2c04c3..ee0cbd6 100644
--- a/_dbus_bindings/message.c
+++ b/_dbus_bindings/message.c
@@ -53,7 +53,7 @@ static void Message_tp_dealloc(Message *self)
 if (self-msg) {
 dbus_message_unref(self-msg);
 }
-self-ob_type-tp_free((PyObject *)self);
+Py_TYPE(self)-tp_free((PyObject *)self);
 }
 
 static PyObject *
diff --git a/_dbus_bindings/server.c b/_dbus_bindings/server.c
index 7fc4f70..ba39f74 100644
--- a/_dbus_bindings/server.c
+++ b/_dbus_bindings/server.c
@@ -40,7 +40,7 @@ typedef struct {
 PyObject *weaklist;
 
 PyObject *mainloop;
-} Server;
+}Server; 
 
 PyDoc_STRVAR(Server_tp_doc,
 A D-Bus server.\n
@@ -429,7 +429,7 @@ static void Server_tp_dealloc(Server *self)
 
 DBG(Server at %p: freeing self, self);
 PyErr_Restore(et, ev, etb);
-(self-ob_type-tp_free)((PyObject *)self);
+(Py_TYPE(self)-tp_free)((PyObject *)self);
 }
 
 PyDoc_STRVAR(Server_disconnect__doc__,
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Static analysis of CPython using coccinelle/spatch

2009-11-16 Thread David Malcolm

Has anyone else looked at using Coccinelle/spatch[1] on CPython source
code?

It's a GPL-licensed tool for matching semantic patterns in C source
code. It's been used on the Linux kernel for detecting and fixing
problems, and for autogenerating patches when refactoring
(http://coccinelle.lip6.fr/impact_linux.php).  Although it's implemented
in OCaml, it is scriptable using Python.

I've been experimenting with using it on CPython code, both on the core
implementation, and on C extension modules.

As a test, I've written a validator for the mini-language used by
PyArg_ParseTuple and its variants.  My code examines the types of the
variables passed as varargs, and attempts to check that they are
correct, according to the rules here
http://docs.python.org/c-api/arg.html (and in Python/getargs.c)

It can detect this old error (fixed in svn r34931):
buggy.c:12:socket_htons:Mismatching type of argument 1 in i:htons:
expected int * but got unsigned long *

Similarly, it finds the deliberate error in xxmodule.c:
xxmodule.c:207:xx_roj:unknown format char in O#:roj: '#'

(Unfortunately, when run on the full source tree, I see numerous
messages, and as far as I can tell, the others are false positives)

You can see the code here:
http://fedorapeople.org/gitweb?p=dmalcolm/public_git/check-cpython.git;a=tree
and download using anonymous git in this manner:
git clone 
git://fedorapeople.org/home/fedora/dmalcolm/public_git/check-cpython.git

The .cocci file detects invocations of PyArg_ParseTuple and determines
the types of the arguments.  At each matching call site it invokes
python code, passing the type information to validate.py's
validate_types.

(I suspect it's possible to use spatch to detect reference counting
antipatterns; I've also attempted 2to3 refactoring of c code using
semantic patches, but so far macros tend to get in the way).

Alternatively, are there any other non-proprietary static analysis tools
for CPython?

Thoughts?
Dave
 
[1] http://coccinelle.lip6.fr/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] buildtime vs runtime in Distutils

2009-11-16 Thread David Malcolm

On Sun, 2009-11-15 at 12:42 +, Antoine Pitrou wrote:
 Tarek Ziadé ziade.tarek at gmail.com writes:
  
  This cannot work on all platforms, when our Makefile is not shipped
  with python but python-devel. (like Fedora)
 
 This practice is stupid anyway, because it means you have to install
 python-devel even to install pure Python packages with setuptools/distribute.
 Just ask Fedora, Mandriva and friends to change their packaging practice
 (Mandriva already has a bug open for that by the way).
 
 In Debian/Ubuntu, the Makefile is correctly part of the main Python package:
 
 $ dpkg -S /usr/lib/python2.5/config/Makefile
 python2.5: /usr/lib/python2.5/config/Makefile
For what it's worth, this is fixed in Fedora as of python-2.6.4-2.fc13 ;
the bug tracking it was:
https://bugzilla.redhat.com/show_bug.cgi?id=531901



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

54 matches

Mail list logo