from:"Cesare Di Mauro"

Re: [Python-Dev] (name := expression) doesn't fit the narrative of PEP 20

2018-04-28 Thread Cesare Di Mauro

Hi,

2018-04-28 5:08 GMT+02:00 Armin Rigo :

> Hi,
>
> On 26 April 2018 at 07:50, Raymond Hettinger
>  wrote:
> >> [Raymond Hettinger ]
> >>> After re-reading all the proposed code samples, I believe that
> >>> adopting the PEP will make the language harder to teach to people
> >>> who are not already software engineers.
> >
> > (...)
> >
> > Python is special, in part, because it is not one of those languages.
> > It has virtues that make it suitable even for elementary school children.
> > We can show well-written Python code to non-computer folks and walk
> > them through what it does without their brains melting (something I can't
> > do with many of the other languages I've used).  There is a virtue
> > in encouraging simple statements that read like English sentences
> > organized into English-like paragraphs, presenting itself like
> > "executable pseudocode".
>
> I must admit that when I heard about this PEP I thought "this April
> 1st joke was already done long ago".  I'm sorry to discover that, this
> time, it is not actually one.  Thank you, Raymond, for an unlikely
> attempt at reminding people what made Python so special---in your
> opinion, and mine.
>
>
> A bientôt,
>
> Armin.
>

Same feeling here. What I really appreciate of Python from long time is its
readability: the fact that usually I read the code as English-like
sentences.

It was nice to see the usage of the "as" keyword in the try/except
construct as well as in the with one, instead of introducing another bunch
of symbols which will make it more difficult to decode the meaning of the
writing.
Same for the "if/else" ternary operator, which I read like "[give] x if
cond else y", instead of the cryptic "?:" of C-like languages. It was a
nice and wise design decision.

For similar reasons, I did/don't like the @ for matrix multiplication
because it doesn't give me any immediately, useful information which makes
it easier to decode the meaning. A "mul" binary operator would have worked
better, for example.

I hope that Python core developers refuse the temptation to introduce new
operators using symbols for new features: it's a short way to keep
backward-compatibility, for sure, but if the price to pay is the
readability, then I don't think that it's worth to do it.

Regarding the assignment operator, I also find it a (bad, since it's not so
much readable inside expressions) duplicate of the assignment statement. To
be more precise, why should we keep the latter once with the former we can
do the same things (and more)? Then drop the assignment statement and just
leave the operator!

BTW, as a pythonist I've also felt the need to have some way to "bind"
values to variables in some context, but it's pretty much related to
comprehensions, for obvious reasons I think.
I would have appreciated an "as" keyword, only inside such constructs, but
I don't see any value extending it for any generic context, since we
already have the assignment statement which works quite well and doesn't
introduce nasty side-effects "ala C-like languages".
So, IMO it's better to stay as we are instead of introducing another kludge
to the language, if we cannot maintain a good readability.

Cheers,

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python startup time

2017-07-20 Thread Cesare Di Mauro

2017-07-21 4:52 GMT+02:00 Nick Coghlan :

> On 21 July 2017 at 12:44, Nick Coghlan  wrote:
> > We can separately measure the cost of unmarshalling the code object:
> >
> > $ python3 -m perf timeit -s "import typing; from marshal import loads;
> from
> > importlib.util import cache_from_source; cache =
> > cache_from_source(typing.__file__); data = open(cache,
> 'rb').read()[12:]"
> > "loads(data)"
> > .
> > Mean +- std dev: 286 us +- 4 us
>
> Slight adjustment here, as the cost of locating the cached bytecode
> and reading it from disk should really be accounted for in each
> iteration:
>
> $ python3 -m perf timeit -s "import typing; from marshal import loads;
> from importlib.util import cache_from_source" "cache =
> cache_from_source(typing.__spec__.origin); data = open(cache,
> 'rb').read()[12:]; loads(data)"
> .
> Mean +- std dev: 337 us +- 8 us
>
> That will have a bigger impact when loading from spinning disk or a
> network drive, but it's fairly negligible when loading from a local
> SSD or an already primed filesystem cache.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>
Thanks for your tests, Nick. It's quite evident that the marshal code
cannot improve the situation, so I regret from my proposal.

I took a look at the typing module, and there are some small things that
can be optimized, but it'll not change the overall situation unfortunately.

Code execution can be improved. :) However, it requires a massive amount of
time experimenting...

Bests,
Cesare


Mail
priva di virus. www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python startup time

2017-07-20 Thread Cesare Di Mauro

2017-07-20 19:23 GMT+02:00 Victor Stinner <victor.stin...@gmail.com>:

> 2017-07-20 19:09 GMT+02:00 Cesare Di Mauro <cesare.di.ma...@gmail.com>:
> > I assume that Python loads compiled (.pyc and/or .pyo) from the stdlib.
> That's something that also influences the startup time (compiling source vs
> loading pre-compiled modules).
>
> My benchmark was "python3 -m perf command -- python3 -c pass": I don't
> explicitly remove .pyc files, I expect that Python uses prebuilt .pyc
> files from __pycache__.
>
> Victor
>

OK, that should be the best case.

An idea to improve the situation might be to find an alternative structure
for .pyc/pyo files, which allows to (partially) "parallelize" their loading
(not execution, of course), or at least speed-up the process. Maybe a GSoC
project for some student, if no core dev has time to investigate it.

Cesare

<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
Mail
priva di virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email_source=link_campaign=sig-email_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python startup time

2017-07-20 Thread Cesare Di Mauro

2017-07-19 16:26 GMT+02:00 Victor Stinner :

> 2017-07-19 15:22 GMT+02:00 Oleg Broytman :
> > On Wed, Jul 19, 2017 at 02:59:52PM +0200, Victor Stinner <
> victor.stin...@gmail.com> wrote:
> >> "Python is very slow to start on Windows 7"
> >> https://stackoverflow.com/questions/29997274/python-is-
> very-slow-to-start-on-windows-7
> >
> >However hard you are going to optimize Python you cannot fix those
> > "defenders", "guards" and "protectors". :-) This particular link can be
> > excluded from consideration.
>
> Sorry, I didn't read carefully each link I posted. Even for me knowing
> what Python does at startup, it's hard to explain why 3 people have
> different timing: 15 ms, 75 ms and 300 ms for example. In my
> experience, the following things impact Python startup:
>
> * -S option: loading or not the site module
> * Paths in sys.path: PYTHONPATH environment variable for example
> * .pth files files in sys.path
> * Python running in a virtual environment or not
> * Operating system: Python loads different modules at startup
> depending on the OS. Naoki INADA just removed _osx_support from being
> imported in the site module on macOS for example.
>
> My list is likely incomplete.
>
> In the performance benchmark suite, a controlled virtual environment
> is created to have a known set of modules. FYI running Python is a
> virtual environment is slower than "system" python which runs outside
> a virtual environment...
>
> Victor
>
> Hi Victor,

I assume that Python loads compiled (.pyc and/or .pyo) from the stdlib.
That's something that also influences the startup time (compiling source vs
loading pre-compiled modules).

Bests,
Cesare


Mail
priva di virus. www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Micro-optimizations by adding special-case bytecodes?

2017-05-25 Thread Cesare Di Mauro

Hi Ben,

for what you're interested in, you might give a look at WPython 1.0 (
https://code.google.com/archive/p/wpython/downloads ) and 1.1 (
https://code.google.com/archive/p/wpython2/downloads ), but they cover a
lot of optimizations (as you can see from a brief look at the slides):
RETURN_CONST and fusing some opcodes for binary operations are only some of
them.
For this reason, it's also very difficult to micro-benchmark every single
change... :-/

Cheers,
Cesare


Mail
priva di virus. www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

2017-05-25 16:28 GMT+02:00 Ben Hoyt :

> Thanks, Victor. That's very helpful. So RETURN_NONE (and probably
> RETURN_SMALL_CONST) are not worth it, based on your empirical tests. Your
> patch shows how (relatively) straight-forward it is to test out new opcodes.
>
> I'm still optimistic about the value of COMPARE_IS_NONE and
> COMPARE_IS_NOT_NONE, though. Mainly because I've done a quick expansion of
> LOAD_CONST(None) + COMPARE_OP and it's quite a bit more code and many more
> instructions than COMPARE_IS_NONE would be:
>
> LOAD_CONST(None)
> COMPARE_OP
> PyObject *value = ((PyTupleObject *)(consts))->ob_item[oparg];
> value->ob_refcnt++;
> *stack_pointer++ = value;
> FAST_DISPATCH();
> PyObject *right = *--stack_pointer;
> PyObject *left = stack_pointer[-1]
>
> // cmp_outcome(), presumably inlined
> int r = 0;
> switch (compare_oparg) {
> case PyCmp_IS:
> r = (left == right);
> break;
> case PyCmp_IS_NOT:
> r = (left != right);
> break;
> case ...
> }
> PyObject *res = r ? Py_True : Py_False;
> res->ob_refcnt++;
>
> if (--(left)->ob_refcnt == 0)
> _Py_Dealloc(left);
> if (--(right)->ob_refcnt == 0)
> _Py_Dealloc(right);
> stack_pointer[-1] = res;
> if (res == NULL)
> goto error;
>
> PREDICT(POP_JUMP_IF_FALSE);
> PREDICT(POP_JUMP_IF_TRUE);
> DISPATCH();
>
>
> COMPARE_IS_NONE
> PyObject* left = stack_pointer[-1];  // TOP()
> PyObject* res = (left == Py_None) ? Py_True : Py_False;
> res->ob_refcnt++;
> if (--(left)->ob_refcnt == 0)
> _Py_Dealloc(left);
> stack_pointer[-1] = res;  // SET_TOP(res)
> PREDICT(POP_JUMP_IF_FALSE);
> PREDICT(POP_JUMP_IF_TRUE);
> DISPATCH();
>
> You don't have to get the const arg, there are fewer increfs/decrefs, you
> skip a pop, you don't have to test res==NULL (because it's Py_True or
> Py_False, which are never NULL), and if there are separate COMPARE_IS_NONE
> and COMPARE_IS_NOT_NONE you don't have to switch on the compare arg (though
> I'm not sure if that part will be worth it).
>
> For reference, based on a grep, " is None" occurs 2737 times in the
> CPython source tree, and " is not None" 2010 times. And I know personally I
> often use them in loops as well is at the start of functions (for mutable
> default arg handling).
>
> Still, the performance proof will be in the pudding! I might hack these
> two opcodes together and test it at some point.
>
> -Ben
>
> On Thu, May 25, 2017 at 6:47 AM, Victor Stinner 
> wrote:
>
>> Hi Ben,
>>
>> I am not convinced that combining operations will have a significant
>> impact in term of performance. Mark Shanon implemented that in his HotPy
>> project.
>>
>> I proposed a RETURN_NONE opcode to combine LOAD_CONST with RETURN_VALUE.
>> The issue was rejected because I failed to show any speedup.
>>
>> https://bugs.python.org/issue28800
>>
>> I would be interested to restart/finish my registervm project to use
>> register-based bytecode. It allows to implement more optmisations and
>> reduce the number of instructions. In my experience, less instructions =
>> faster code.
>>
>> http://faster-cpython.readthedocs.io/registervm.html
>>
>> Mark's bytecode uses registers but also a stack.
>>
>> Victor
>>
>> Le 24 mai 2017 8:09 PM, "Ben Hoyt"  a écrit :
>>
>>> Hi folks,
>>>
>>> I was looking at some `dis` output today, and I was wondering if anyone
>>> has investigated optimizing Python (slightly) by adding special-case
>>> bytecodes for common expressions or statements involving constants?
>>>
>>> For example, I (and, based on a quick grep of the stdlib, many others)
>>> write "x is None" and "x is not None" very often. Or "return True" or
>>> "return None" or "return 1" and things like that. These all expand into two
>>> bytecodes, which seems pretty non-optimal (LOAD_CONST + COMPARE_OP or
>>> LOAD_CONST + RETURN_VALUE). It seems we could get an easy speedup for these
>>> common cases by adding a peephole optimization and some new opcodes (maybe
>>> COMPARE_IS_SMALL_CONST and RETURN_SMALL_CONST for these cases).
>>>
>>> I'm not proposing to do this yet, as I'd need to

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-18 Thread Cesare Di Mauro

If you feel like I've attacked you, I apologize: it wasn't my intention.
Please, don't get it personal: I only reported my honest opinion, albeit
after a re-read it looks too rude, and I'm sorry for that.

Regarding the post-bytecode optimization issues, they are mainly
represented by the constant folding code, which is still in the peephole
stage. Once it's moved to the proper place (ASDL/AST), then such kind of
issues with the stack calculations disappear, whereas the remaining ones
can be addressed by a fix of the current stackdepth_walk function.

And just to be clear, I've nothing against your code. I simply think that,
due to my experience, it doesn't fit in CPython.

Regards
Cesare

2016-05-18 18:50 GMT+02:00 <zr...@fastmail.com>:

> Your criticisms may very well be true. IIRC though, I wrote that pass
> because what was available was not general enough. The stackdepth_walk
> function made assumptions that, while true of code generated by the current
> cpython frontend, were not universally true. If a goal is to move this
> calculation after any bytecode optimization, something along these lines
> seems like it will eventually be necessary.
>
> Anyway, just offering things already written. If you don't feel it's
> useful, no worries.
>
>
> On Wed, May 18, 2016, at 11:35 AM, Cesare Di Mauro wrote:
>
> 2016-05-17 8:25 GMT+02:00 <zr...@fastmail.com>:
>
> In the project https://github.com/zachariahreed/byteasm I mentioned on
> the list earlier this month, I have a pass that to computes stack usage
> for a given sequence of bytecodes. It seems to be a fair bit more
> agressive than cpython. Maybe it's more generally useful. It's pure
> python rather than C though.
>
>
> IMO it's too big, resource hungry, and slower, even if you convert it in C.
>
> If you take a look at the current stackdepth_walk function which CPython
> uses, it's much smaller (not even 50 lines in simple C code) and quite
> efficient.
>
> Currently the problem is that it doesn't return the maximum depth of the
> tree, but it updates the intermediate/current maximum, and *then* it uses
> it for the subsequent calculations. So, the depth artificially grows, like
> in the reported cases.
>
> It doesn't require a complete rewrite, but spending some time for
> fine-tuning it.
>
> Regards
> Cesare
>
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-18 Thread Cesare Di Mauro

2016-05-17 8:25 GMT+02:00 :

> In the project https://github.com/zachariahreed/byteasm I mentioned on
> the list earlier this month, I have a pass that to computes stack usage
> for a given sequence of bytecodes. It seems to be a fair bit more
> agressive than cpython. Maybe it's more generally useful. It's pure
> python rather than C though.
>

IMO it's too big, resource hungry, and slower, even if you convert it in C.

If you take a look at the current stackdepth_walk function which CPython
uses, it's much smaller (not even 50 lines in simple C code) and quite
efficient.

Currently the problem is that it doesn't return the maximum depth of the
tree, but it updates the intermediate/current maximum, and *then* it uses
it for the subsequent calculations. So, the depth artificially grows, like
in the reported cases.

It doesn't require a complete rewrite, but spending some time for
fine-tuning it.

Regards
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-16 Thread Cesare Di Mauro

2016-05-16 17:55 GMT+02:00 Meador Inge <mead...@gmail.com>:

> On Sun, May 15, 2016 at 2:23 AM, Cesare Di Mauro <
> cesare.di.ma...@gmail.com> wrote:
>
>
>> Just one thing that comes to my mind: is the stack depth calculation
>> routine changed? It was suboptimal, and calculating a better number
>> decreases stack allocation, and increases the frame usage.
>>
>
> This is still a problem and came up again recently:
>
> http://bugs.python.org/issue26549
>
> -- Meador
>

I saw the last two comments of the issues: this is what I was talking about
(in particular the issue opened by Armin applies).

However there's another case where the situation is even worse.

Let me show a small reproducer:

def test(self):
for i in range(self.count):
with self: pass

The stack size reported by Python 2.7.11:
>>> test.__code__.co_stacksize
6

Adding another with statement:
>>> test.__code__.co_stacksize
7

But unfortunately with Python 3.5.1 the problematic is much worse:

>>> test.__code__.co_stacksize
10

>>> test.__code__.co_stacksize
17

Here the situation is exacerbated by the fact that the WITH_CLEANUP
instruction of Python 2.x was split into two (WITH_CLEANUP_START and
WITH_CLEANUP_FINISH) in some Python 3 release.

I don't know why two different instructions were introduced, but IMO it's
better to have one instruction which handles all code finalization of the
with statement, at least in this case. If there are other scenarios where
two different instructions are needed, then ad-hoc instructions like those
can be used.

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-15 Thread Cesare Di Mauro

2016-02-01 17:54 GMT+01:00 Yury Selivanov :

> Thanks for bringing this up!
>
> IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per
> bytecode instead of 8.


No, it used 16, 32, and 48-bit per opcode (1, 2, or 3 16-bit words).


> That allows to minimize the number of bytecodes, thus having some
> performance increase.  TBH, I don't think it was "significantly faster".
>

Please, take a look at the benchmarks, or compile it and check yourself. ;-)

If I were to do some big refactoring of the ceval loop, I'd probably
> consider implementing a register VM.  While register VMs are a bit faster
> than stack VMs (up to 20-30%), they would also allow us to apply more
> optimizations, and even bolt on a simple JIT compiler.
>
> Yury


WPython was an hybrid-VM: it supported both a stack-based and a
register-based approach.

I think that it's needed, since the nature of Python, because you can have
operations with intermixed operands: constants, locals, globals, names.
It's quite difficult to handle all possible cases with a register-based VM.

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-05-15 Thread Cesare Di Mauro

2016-04-13 23:23 GMT+02:00 Victor Stinner :

> Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit
> parameter for function with annotation.
>

I never found 32-bit parameters, and not even 24-bit ones. I think that
their usage is as rare as all planets alignment. ;-)

That's why with in WPython I supported only 8, 16, and 32-bit parameters
(which are 6 bytes long).

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-05-15 Thread Cesare Di Mauro

2016-04-13 18:24 GMT+02:00 Victor Stinner :

> Demur Rumed proposes a different change to use a regular bytecode
> using 16-bit units: an instruction has always one 8-bit argument, it's
> zero if the instruction doesn't have an argument:
>
>http://bugs.python.org/issue26647
>
> According to benchmarks, it looks faster:
>
>   http://bugs.python.org/issue26647#msg263339
>
> IMHO it's a nice enhancement: it makes the code simpler. The most
> interesting change is made in Python/ceval.c:
>
> -if (HAS_ARG(opcode))
> -oparg = NEXTARG();
> +oparg = NEXTARG();
>
> This code is the very hot loop evaluating Python bytecode. I expect
> that removing a conditional branch here can reduce the CPU branch
> misprediction.
>

Correct. The old bytecode format wasn't so much predictable for the CPU.

>
> Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
> instructions. Later, we can discuss if it would be possible to ensure
> that the bytecode is always aligned to 16-bit in memory to fetch the
> two bytes using a uint16_t* pointer.
>
> Maybe we can overallocate 1 byte in codeobject.c and align manually
> the memory block if needed. Or ceval.c should maybe copy the code if
> it's not aligned?
>
> Raymond Hettinger proposes something like that, but it looks like
> there are concerns about non-aligned memory accesses:
>
>http://bugs.python.org/issue25823
>
> The cost of non-aligned memory accesses depends on the CPU
> architecture, but it can raise a SIGBUS on some arch (MIPS and
> SPARC?).
>
> Victor
>

It should not be a problem, since every PyObject is allocated with PyAlloc
(however I don't remember if it's the correct name) which AFAIK guarantees
a base 8 bytes alignment.

So, it's safe to use an unsigned int for keeping/referencing a word at the
time.

The only problem with such approach is related to the processor endianess,
but it can be solved with proper macros (like I did with WPython).

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Wordcode v2

2016-05-15 Thread Cesare Di Mauro

2016-02-17 12:04 GMT+01:00 Antoine Pitrou :

> Demur Rumed  gmail.com> writes:
> > I've personally benchmarked this fork with positive results.
>
> I'm skeptical of claims like this. What did you benchmark exactly, and with
> which results?
> I don't think changing the opcode encoding per se will bring any large
> benefit...
>
> Regards
>
> Antoine.


With WPython I've introduced several optimizations which improved a lot the
execution speed (+25% with PyStone, at the time, compared to CPython 2.6),
but most of the benefits came from the new opcode format.

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Wordcode v2

2016-05-15 Thread Cesare Di Mauro

2016-02-15 8:14 GMT+01:00 Andrew Barnert via Python-Dev <
python-dev@python.org>:

> Despite the name (and inspiration), my fork has very little to do with
> WPython. I'm just focused on simpler (hopefully = faster) fetch code; he
> started with that, but ended up going the exact opposite direction,
> accepting more complicated (and much slower) fetch code as a reasonable
> cost for drastically reducing the number of instructions. (If you double
> the 30% fetch-and-parse overhead per instruction, but cut the number of
> instructions to 40%, the net is a huge win.)


I don't know why you consider slower the WPython's code that fetches the
more complicated instructions. On the contrary, I've structured such
"superinstructions" in order to simplify their decoding. Arguments are
decoded as they are needed in a specific moment, in order to reduce or
completely avoid the usage of temporary variables to keep such values.

Can you provide some example about your claim?

Regarding the WPython goal, it wasn't only about introducing simpler
instructions. As I've written also in my presentation, it's an hybrid VM:
stack and register-based. I've introduced a new instruction format for the
existing CPython's instructions, which are now easier to fetch, decode,
execute, and provide a better density too (for the most common case:
arguments with a maximum of 255 as value/index).
However I've also added superinstructions to better pack more "useful
work", which provides more code density and they are the primary
responsible for improving the execution speed.

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Wordcode v2

2016-05-15 Thread Cesare Di Mauro

2016-02-15 1:20 GMT+01:00 Demur Rumed :

> Saw recent discussion:
> https://mail.python.org/pipermail/python-dev/2016-February/143013.html
>
> I remember trying WPython; it was fast. Unfortunately it feels it came at
> the wrong time when development was invested in getting py3k out the door.
>

Not only that. IMO the primary problem was related to the fact the "patch"
was too big to be reviewed. Unfortunately it was my first attempt, and
having worked alone I introduced too much optimizations and (heavy) changes
to the code. An incremental approach should have worked better, albeit I
believe that such drastic move from the consolidated bytecodes to the new
wordcodes would have produced strong resistance anyway.


> It also had a lot of other ideas like *_INT instructions which allowed
> having oparg to be a constant int rather than needing to LOAD_CONST one.
>

This, specifically, was an experiment that I made with WPython 1.1, which I
recommend to do not follow. There are other, more general, ways to speedup
the execution when dealing with integers.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Speeding up CPython 5-10%

2016-05-15 Thread Cesare Di Mauro

2016-02-02 10:28 GMT+01:00 Victor Stinner :

> 2016-01-27 19:25 GMT+01:00 Yury Selivanov :
> > tl;dr The summary is that I have a patch that improves CPython
> performance
> > up to 5-10% on macro benchmarks.  Benchmarks results on Macbook Pro/Mac
> OS
> > X, desktop CPU/Linux, server CPU/Linux are available at [1].  There are
> no
> > slowdowns that I could reproduce consistently.
>
> That's really impressive, great job Yury :-) Getting non-negligible
> speedup on large macrobenchmarks became really hard in CPython.
> CPython is already well optimized in all corners.


It's long time since I took a look at CPython (3.2), but if it didn't
changed a lot then there might be some corner cases still waiting to be
optimized. ;-)

Just one thing that comes to my mind: is the stack depth calculation
routine changed? It was suboptimal, and calculating a better number
decreases stack allocation, and increases the frame usage.


> It looks like the
> overall Python performance still depends heavily on the performance of
> dictionary and attribute lookups. Even if it was well known, I didn't
> expect up to 10% speedup on *macro* benchmarks.
>

True, but it might be mitigated in some ways, at least for built-in types.
There are ideas about that, but they are a bit complicated to implement.

The problem is with functions like len, which IMO should become attribute
lookups ('foo'.len) or method executions ('foo'.len()). Then it'll be
easier to accelerate their execution, with one of the above ideas.

However such kind of changes belong to Guido, which defines the language
structure/philosophy. IMO something like len should be part of the
attributes exposed by an object: it's more "object-oriented". Whereas other
things like open, file, sum, etc., are "general facilities".

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A new JIT compiler for a faster CPython?

2012-07-18 Thread Cesare Di Mauro

2012/7/18 Steven D'Aprano st...@pearwood.info

 WPython in particular seems to be very promising, and quite fast. I don't
 understand why it doesn't get more attention (although I admit I can't
 criticise, since I haven't installed or used it myself).


 http://www.pycon.it/media/stuff/slides/beyond-bytecode-a-wordcode-based-python.pdf


Yes, that was the reason that brought me to stop the project: lack of
interest from python community. But at the last EuroPython I had the
opportunity to talk to Guido, so I think that I can try to port WPython
(and check for some ideas).

However the fault with WPython was mine: it wasn't a simple patch, so it
was very difficult to review. My bad.

In the Java world, there are byte-code optimizers such as Soot, BLOAT and
 ProGuard which apparently can speed up Java significantly. As far as I can
 tell, in the Python world byte-code optimization is a severely neglected
 area. For good reason? No idea.

 --
  Steven


I think that Python case is different. You can't spend so many times on
optimizing the generating code, because usually the code is compiled at the
execution time. Startup time is an issue for Python, which is influenced so
much by the source-to-bytecode compilation.

Java is statically compiled, and then executed. So you can think about
using better optimizers until the code will finally run.

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A new JIT compiler for a faster CPython?

2012-07-17 Thread Cesare Di Mauro

2012/7/18 Victor Stinner victor.stin...@gmail.com

 I don't expect to run a program 10x faster, but I would be happy if I
 can run arbitrary Python code 25% faster.


If that's your target, you don't need to resort to a
bytecode-to-binary-equivalent compiler. WPython already gave similar
results with Python 2.6.

The idea behind is that using an hybrid stack-register VM, you'll spend
less time on the ceval loop constant stuff (checking for events / GIL
release, etc.). That's because superinstructions aggregates more bytecodes
into a single wordcode, which requires only one decoding phase, avoids
many pushes/pops, and some unnecessary inc/decr reference counting. A
better peephole optimizer is provided, and some other optimizations as well.

There's also room for more optimizations. I have many ideas to improve both
WPython or just the ceval loop. For example, at the last EuroPython sprint
I was working to a ceval optimization that gave about 10% speed improvement
for the CPython 3.3 beta trunk (on my old MacBook Air, running 32-bit
Windows 8 preview), but still needs to be checked for correctness (I'm
spending much more time running and checking the standard tests than for
its implementation ;-)

In the end, I think that a lot can be done to improve the good old CPython
VM, without resorting to a JIT compiler. Lack of time is the enemy...

Regards,
Cesare


 --

 Specialization / tracing JIT can be seen as another project, or at
 least added later.

 Victor
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/cesare.di.mauro%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Cesare Di Mauro

2011/9/1 Ned Batchelder n...@nedbatchelder.com

 When the switchover to the new instruction format happens, what happens to
 sys.settrace() tracing?  Will it report the same sequence of line numbers?
  For a small but important class of program executions, this is more
 important than speed.

  --Ned


A simple solution: when tracing is enabled, the new instruction format will
never be executed (and information tracking disabled as well).

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-09-01 Thread Cesare Di Mauro

2011/9/1 Mark Shannon m...@hotpy.org

 Cesare Di Mauro wrote:

 2011/9/1 Ned Batchelder n...@nedbatchelder.com mailto:
 n...@nedbatchelder.com


When the switchover to the new instruction format happens, what
happens to sys.settrace() tracing?  Will it report the same sequence
of line numbers?  For a small but important class of program
executions, this is more important than speed.

 --Ned


 A simple solution: when tracing is enabled, the new instruction format
 will never be executed (and information tracking disabled as well).

  What happens if tracing is enabled *during* the execution of the new
 instruction format?
 Some sort of deoptimisation will be required in order to recover the
 correct VM state.

 Cheers,
 Mark.


Sure. I don't think that the regular ceval.c loop will be dropped when
executing the new instruction format, so we can intercept a change like
this using the why variable, for example, or something similar that is
normally used to break the regular loop execution.

Anyway, we need to take a look at the code.

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Cesare Di Mauro

2011/8/31 stefan brunthaler ste...@brunthaler.net

  I think that you must deal with big endianess because some RISC can't
 handle
  at all data in little endian format.
 
  In WPython I have wrote some macros which handle both endianess, but
 lacking
  big endian machines I never had the opportunity to verify if something
 was
  wrong.
 
 I am sorry for the temporal lapse of not getting back to this directly
 yesterday, we were just heading out for lunch and I figured it only
 out then but immediately forgot it on our way back to the lab...

 So, as I have already said, I evaluated my optimizations on x86
 (little-endian) and PowerPC 970 (big-endian) and I did not have to
 change any of my instruction decoding during interpretation. (The only
 nasty bug I still remember vividly was that while on gcc for x86 the
 data type char defaults to signed, whereas it defaults to unsigned on
 PowerPC's gcc.) When I have time and access to a PowerPC machine again
 (an ARM might be interesting, too), I will take a look at the
 generated assembly code to figure out why this is working. (I have
 some ideas why it might work without changing the code.)

 If I run into any problems, I'll gladly contact you :)

 BTW: AFAIR, we emailed last year regarding wpython and IIRC your
 optimizations could primarily be summarized as clever
 superinstructions. I have not implemented anything in that area at all
 (and have in fact not even touched the compiler and its peephole
 optimizer), but if parts my implementation gets in, I am sure that you
 could add some of your work on top of that, too.

  Cheers,
 --stefan


You're right. I took a look at our old e-mails, and I found more details
about your work. It's definitely not affected by processor endianess, so you
don't need any check: it just works, because you'll produce the new opcodes
in memory, and consume them in memory as well.

Looking at your examples, I think that WPython wordcodes usage can be useful
only for the most simple ones. That's because superinstructions group
together several actions that need to be splitted again to simpler ones by a
tracing-JIT/compiler like your, if you want to keep it simple. You said that
you added about 400 specialized instructions last year with the usual
bytecodes, but wordcodes will require quite more (this can compromise
performance on CPU with small data caches).

So I think that it'll be better to finish your work, with all tests passed,
before thinking about adding something on top (that, for me, sounds like a
machine code JIT O:-)

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-31 Thread Cesare Di Mauro

2011/8/31 Guido van Rossum gu...@python.org

 On Tue, Aug 30, 2011 at 10:04 PM, Cesare Di Mauro
 cesare.di.ma...@gmail.com wrote:
  It isn't, because motivation to do something new with CPython vanishes,
 at
  least on some areas (virtual machine / ceval.c), even having some ideas
 to
  experiment with. That's why in my last talk on EuroPython I decided to
 move
  on other areas (Python objects).

 Cesare, I'm really sorry that you became so disillusioned that you
 abandoned wordcode. I agree that we were too optimistic about Unladen
 Swallow. Also that the existence of PyPy and its PR machine (:-)
 should not stop us from improving CPython.


I never stopped thinking about new optimization. A lot can be made on
CPython, even without resorting to something like JIT et all.


 I'm wondering if, with your experience in creating WPython, you could
 review Stefan Brunthaler's code and approach (once he's put it up for
 review) and possibly the two of you could even work on a joint
 project?

 --
 --Guido van Rossum (python.org/~guido)



Yes, I can. I'll wait for Stefan to update its source (reaching Python 3.2
at least) as he has intended to do, and that everything is published, in
order to review the code.

I also agree with you that right now it doesn't need to look as
state-of-the-art. First make it work, then make it nicer. ;)

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/30 Antoine Pitrou solip...@pitrou.net


 Changing the bytecode width wouldn't make the interpreter more complex.


It depends on the kind of changes. :)

WPython introduced a very different intermediate code representation that
required a big change on the peepholer optimizer on 1.0 alpha version.
On 1.1 final I decided to completely move that code on ast.c (mostly for
constant-folding) and compiler.c (for the usual peepholer usage: seeking for
some patterns to substitute with better ones) because I found it simpler
and more convenient.

In the end, taking out some new optimizations that I've implemented on the
road, the interpreter code is a bit more complex.


 Some years ago we were waiting for Unladen Swallow to improve itself
 and be ported to Python 3. Now it seems we are waiting for PyPy to be
 ported to Python 3. I'm not sure how let's just wait is a good
 trade-off if someone proposes interesting patches (which, of course,
 remains to be seen).

 Regards

 Antoine.

 It isn't, because motivation to do something new with CPython vanishes, at
least on some areas (virtual machine / ceval.c), even having some ideas to
experiment with. That's why in my last talk on EuroPython I decided to move
on other areas (Python objects).

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/30 Nick Coghlan ncogh...@gmail.com


 Yeah, it's definitely a trade-off - the point I was trying to make is
 that there *is* a trade-off being made between complexity and speed.

 I think the computed-gotos stuff struck a nice balance - the macro-fu
 involved means that you can still understand what the main eval loop
 is *doing*, even if you don't know exactly what's hidden behind the
 target macros. Ditto for the older opcode prediction feature and the
 peephole optimiser - separation of concerns means that you can
 understand the overall flow of events without needing to understand
 every little detail.

 This is where the request to extract individual orthogonal changes and
 submit separate patches comes from - it makes it clear that the
 independent changes *can* be separated cleanly, and aren't a giant
 ball of incomprehensible mud. It's the difference between complex
 (lots of moving parts, that can each be understood on their own and
 are then composed into a meaningful whole) and complicated (massive
 patches that don't work at all if any one component is delayed)

 Eugene Toder's AST optimiser work that I still hope to get into 3.3
 will have to undergo a similar process - the current patch covers a
 bit too much ground and needs to be broken up into smaller steps
 before we can seriously consider pushing it into the core.

 Regards,
 Nick.

 Sometimes it cannot be done, because big changes produces big patches as
well.

I don't see a problem here if the code is well written (as required buy
the Python community :) and the developer is available to talk about his
work to clear some doubts.

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/30 stefan brunthaler ste...@brunthaler.net

 Yes, indeed I have a more straightforward instruction format to allow
 for more efficient decoding. Just going from bytecode size to
 word-code size without changing the instruction format is going to
 require 8 (or word-size) times more memory on a 64bit system. From an
 optimization perspective, the irregular instruction format was the
 biggest problem, because checking for HAS_ARG is always on the fast
 path and mostly unpredictable. Hence, I chose to extend the
 instruction format to have word-size and use the additional space to
 have the upper half be used for the argument and the lower half for
 the actual opcode. Encoding is more efficient, and *not* more complex.
 Using profiling to indicate what code is hot, I don't waste too much
 memory on encoding this regular instruction format.

 Regards,
 --stefan

That seems exactly the WPython approach, albeit I used the new wordcode in
place of the old bytecode. Take a look at it. ;)

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/30 stefan brunthaler ste...@brunthaler.net

  Do I sense that the bytecode format is no longer platform-independent?
  That will need a bit of discussion. I bet there are some things around
  that depend on that.
 
 Hm, I haven't really thought about that in detail and for longer, I
 ran it on PowerPC 970 and Intel Atom  i7 without problems (the latter
 ones are a non-issue) and think that it can be portable. I just stuff
 argument and opcode into one word for regular instruction decoding
 like a RISC CPU, and I realize there might be little/big endian
 issues, but they surely can be conditionally compiled...

 --stefan

I think that you must deal with big endianess because some RISC can't handle
at all data in little endian format.

In WPython I have wrote some macros which handle both endianess, but lacking
big endian machines I never had the opportunity to verify if something was
wrong.

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations continued...

2011-08-30 Thread Cesare Di Mauro

2011/8/31 Terry Reedy tjre...@udel.edu

 I find myself more comfortable with the Cesare Di Mauro's idea of expanding
 to 16 bits as the code unit. His basic idea was using 2, 4, or 6 bytes
 instead of 1, 3, or 6.


It can be expanded to longer than 6 bytes opcodes, if needed. The format is
designed to be flexible enough to accommodate such changes without pains.


 It actually tended to save space because many ops with small ints (which
 are very common) contract from 3 bytes to 2 bytes or from 9(?) (two
 instructions) to 6.


It can pack up to 4 (old) opcodes into one wordcode (superinstruction).
Wordcodes are designed to favor instruction grouping.


 I am sorry he was not able to followup on the initial promising results.


In a few words: lack of interest. Why spending (so much) time to a project
when you see that the community is oriented towards other directions
(Unladen Swallow at first, PyPy in the last period, given the substantial
drop of the former)?

Also, Guido seems to dislike what he finds as hacks, and never showed
interest.

In WPython 1.1 I rolled back the hack that I introduced in PyObject
types (a couple of fields) in 1.0 alpha, to make the code more polished
(but with a sensible drop in the performance). But again, I saw no interest
on WPython, so I decided to put a stop at it, and blocking my initial idea
to  go for Python 3.


 The dis output was probably easier to read than the current output.

 Perhaps he made a mistake in combining the above idea with a shift from
 stack to hybrid stack+register design.

 --
 Terry Jan Reedy

 As I already said, wordcodes are designed to favor grouping. So It was
quite natural to became an hybrid VM. Anyway, both space and performance
gained from this wordcodes property. ;)

Regards

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-24 Thread Cesare Di Mauro

2011/5/24 Stefan Behnel stefan...@behnel.de

 Maciej Fijalkowski, 24.05.2011 13:31:

  CPython was not designed for CPU cache usage as far as I'm aware.


  That's a pretty bold statement to make on this list. Even if it wasn't
 originally designed for (efficient?) CPU cache usage, it's certainly been
 around for long enough to have received numerous performance tweaks in that
 regard.

 Stefan


Maybe a change on memory allocation granularity can help here.

Raising it to 16 and 32 bytes for 32 and 64 bits system respectively
guarantees that an access to ob_refcnt and/or ob_type will put on the cache
line some other information for the same object, which is usually required
by itself (except for very simple ones, such as PyNone, PyEllipsis, etc.).

Think about a long, a tuple, a list, a dictionary, ecc.: all of them have
some critical data after these fields, that most likely will be accessed
after INCRef or type checking.

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] CPython optimization: storing reference counters outside of objects

2011-05-22 Thread Cesare Di Mauro

2011/5/23 Martin v. Löwis mar...@v.loewis.de

  I'm not a compiler/profiling expert so the main question is if such
  design can work, and maybe someone was thinking about something
  similar?

 My expectation is that your approach would likely make the issues
 worse in a multi-CPU setting. If you put multiple reference counters
 into a contiguous block of memory, unrelated reference counters will
 live in the same cache line. Consequentially, changing one reference
 counter on one CPU will invalidate the cached reference counters of
 that cache line on other CPU, making your problem a) actually worse.

 Regards,
 Martin


I don't think that moving ob_refcnt to a proper memory pool will solve the
problem of cache pollution anyway.

ob_refcnt is obviously the most stressed field in PyObject, but it's not the
only one. We have , that is needed to model each object (instance)
behavior, which is massively accessed too, so a cache line will be loaded
as well when the object will be used.

Also, only a few of simple objects have just ob_refcnt and ob_type. Most of
them have other fields too, and accessing them means a line cache load.

Regards,
Cesare

P.S. Memory allocation granularity can help sometimes, leaving some data
(ob_refcnt and/or ob_type) on one cache line, and the other on the next one.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PyLongObject safe changes?

2011-03-20 Thread Cesare Di Mauro

Hi

I have little knowledge of some Python (3.2) internals on objects' internal
structure handling.

Suppose that I have any PyLongObject object (even internal / shared ones)
and that
- I need to change some or all of its internal values (size, sign, digits)
in a critical section;
- the critical section is C code only;
- no CPython APIs will be called (all work is on local vars and object's
internal values);
- the object internal structure is completely restored before exiting from
the critical section.

Is it thread / interpreter safe or something dirty can happen?

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Suggest reverting today's checkin (recursive constant folding in the peephole optimizer)

2011-03-14 Thread Cesare Di Mauro

2011/3/14 Alexander Belopolsky alexander.belopol...@gmail.com

 On Sat, Mar 12, 2011 at 1:08 PM, Raymond Hettinger
 raymond.hettin...@gmail.com wrote:
  I would like to withdraw my suggestion for the recursive constant folding
 patch to be reverted.

 So what is the status of peephole optimization now?  Is it back under
 active development?   Let me quote a tracker comment that I posted two
 years ago and go no response to (''-quote are from Raymond's
 message):

 
 Constant folding promotes more readable code: 24*60*60 is more obvious
 than 86400, prefixing positive numbers with + leads to better visual
 alignment, etc.  Users should not be required to think twice about
 which constant expressions are folded and which are not.

 Here is another surprise with the current peepholer:

  dis(lambda:1+2*3)
   1   0 LOAD_CONST   0 (1)
  3 LOAD_CONST   3 (6)
  6 BINARY_ADD
  7 RETURN_VALUE

  dis(lambda:2*3+1)
  1   0 LOAD_CONST   4 (7)
  3 RETURN_VALUE

 I have a fix in the works, but I will wait for your further comments
 before submitting it.

 
   More importantly, we decided that the peepholer is the wrong place to
   do much of this work.  Most of the peepholer is going to be migrated
   up the chain, after the AST is generated, but before the opcodes are
   generated.  That is a faster, more reliable, and more general
   approach.
 

 I agree.   Constant folding, is an interesting case because peepholer
 has to duplicate a subset of eval logic.  I wonder if the new approach
 could eliminate that.


I followed a different approach. Constant folding in WPython is made between
ASDL evalutation and AST building.

The idea is to intercept constant values and apply the operations
generating a new value instead of generating the classic AST node (a BinOp
for a binary operation, for example).

This way there's no need to parse the AST tree seeking for cases where to
apply the constant folding logic.

It's faster, because you don't need an additional pass through the AST:
you'll do it while building the AST...

It consumes less memory too, since you don't need to generate complex AST
nodes that must be discarded after applying the folding (which generates new
nodes). Think about a tuple of constant values, for example: you have to
generate a Tuple AST structure from the ASDL, then an AST constant folder
will generate the tuple. In WPython the tuple is generated immediately,
directly from the ADSL seq structure.

It's also efficient, since expressions such as 1 + 2 * 3 can be completely
folded generating 7, instead of 1 + 6 of the (classic) peepholer. That's
because, when parsing the ASDL structures, nodes are evaluated in respect of
operator precedence, so first we evaluate 2 * 3, which produces 6 applying
the folding, and then 1 + 6, which produces 7 in the end.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Suggest reverting today's checkin (recursive constant folding in the peephole optimizer)

2011-03-11 Thread Cesare Di Mauro

2011/3/12 Benjamin Peterson benja...@python.org

2011/3/11 Raymond Hettinger raymond.hettin...@gmail.com:
Today, there was a significant check-in to the peephole optimizer that I
think should be reverted:
http://hg.python.org/cpython/rev/14205d0fee45/
The peephole optimizer pre-dated the introduction of the abstract syntax
tree. Now that we have an AST, the preferred way to implement additional
optimizations is with AST manipulation, upstream from code generation.
This
approach is faster and much more reliable than the more brittle approach
of disassembling, analyzing, and rewriting the bytecode created by the
compiler.

The problem is no such AST optimizer exists. It's one thing avoid
changing old code because an improved version is in the works or
available (say argparse in lieu of getopt) and quite another when no
replacement code exists. At the moment, there is little reason not to
accept progressive improvements (with sights on overall design as
usual) to the code.

IMO, Ast or not ast statically optimizing python in any meaningful way
is a impossible task anyway. So, a better solution would be to just
rip the thing out.

--
Regards,
Benjamin

It's not true. I already moved almost all peephole optimizations
(introducing others, as well) from peephole.c to
ast.chttp://code.google.com/p/wpython2/source/browse/Python/ast.c?repo=wpython11and
compiler.chttp://code.google.com/p/wpython2/source/browse/Python/compile.c?repo=wpython11in
WPython 1.1.

Take a look at pages 21-23 of
thishttp://wpython2.googlecode.com/files/Cleanup%20and%20new%20optimizations%20in%20WPython%201.1.pdf
.

Also, optimizations can be done not only for numbers, but even for tuples,
lists, dictionaries, and... slices (pag. 22). See pages 21-24 of
thishttp://wpython2.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf
.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] short fetch for NEXTARG macro (was: one byte byte code arguments)

2011-01-31 Thread Cesare Di Mauro

2011/1/31 Antoine Pitrou solip...@pitrou.net

 On Mon, 31 Jan 2011 13:28:39 +0100
 Jurjen N.E. Bos jurjen@hetnet.nl wrote:
  I just did it: my first python source code hack.
  I replaced the NEXTARG and PEEKARG macros in ceval.c using a cast to
  short pointer, and lo and behold, a crude measurement indicates one
  to two percent speed increase.
  That isn't much, but it is virtually for free!
 
  Here are the macro's I used:
  #define NEXTARG() (next_instr +=2, *(short*)next_instr[-2])
  #define PEEKARG() (*(short*)next_instr[1])

 Some architectures forbid unaligned access, so this can't be used as-is.

 Regards

 Antoine.


WPython already addressed it (
http://code.google.com/p/wpython2/source/browse/Python/ceval.c?repo=wpython11):

#ifdef WORDS_BIGENDIAN
#define NEXTOPCODE() oparg = *next_instr++; \
opcode = oparg  8; oparg = 0xff
#else
#define NEXTOPCODE() oparg = *next_instr++; \
opcode = oparg  0xff; oparg = 8
#endif

Shorts alignament is also guaranted due to wordcodes (
http://wpython2.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdfpag.12).

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Byte code arguments from two to one byte: did anyone try this?

2011-01-31 Thread Cesare Di Mauro

2011/1/31 Terry Reedy tjre...@udel.edu

 On 1/31/2011 5:31 AM, Steven D'Aprano wrote:

 Jurjen N.E. Bos wrote:

 I was impressed by the optimizations already in there, but I still
 dare to suggest an optimization that from my estimates might shave off
 a few cycles, speeding up Python about 5%.
 The idea is simple: change the byte code argument values from two
 bytes to one.



 Interesting. Have you seem Cesare Di Mauro's WPython project, which
 takes the opposite strategy?

 http://code.google.com/p/wpython2/


 The two strategies could be mixed. Some 'word codes' could consist of a
 bytecode + byte arg, and others a real word code. Maybe WPython does that
 already. Might end up being slower though.

 --
  Terry Jan Reedy


Yes, WPython already does it (
http://wpython2.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdfpag.7)
, but on average it was faster (pag. 28).

Cesare


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2011-01-02 Thread Cesare Di Mauro

2011/1/1 Ned Batchelder n...@nedbatchelder.com

  On 12/31/2010 12:51 PM, Cesare Di Mauro wrote:

 Aggressive optimizations can be enabled with explicit options, in order
 to leave normal debugger-prone code.

 I wish the Python compiler would adopt a strategy of being able to disable
 optimizations.  I wrote a bug about a leaky abstraction optimization
 messing up coverage testing 2.5 years ago, and it was closed as won't fix:
 http://bugs.python.org/issue2506.  The debate there centered around, but
 that line isn't executed, because it's been optimized away.  It's common in
 sophisticated compilers (as in, any C compiler) to be able to choose whether
 you want optimizations for speed, or disabling optimizations for debugging
 and reasoning about the code.  Python would benefit from the same choice.

   --Ned.


Command line parameters and/or environment variables are suitable for this,
but they aren't immediate and, also, have global effect.

I wish an explicit (Explicit is better than implicit) and a finer control
over optimizations, with a per-module usage:

from __compiler__ import disable_peepholer, strict_syntax, static_builtins,
globals_as_fasts

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2011-01-02 Thread Cesare Di Mauro

2011/1/3 Alex Gaynor alex.gay...@gmail.com

 No, it's singularly impossible to prove that any global load will be any
 given
 value at compile time.  Any optimization based on this premise is wrong.

 Alex


That's your opinion, but I have very different ideas.

Of course we can't leave the problem only on the compiler shoulders, but I
think that can be ways to threat builtins as static variables, and globals
like local (fast) variables too, taking into account changes on the
builtins' and modules dictionaries.

But it doesn't make sense to invest time in these things: JITs are becoming
a good alternative, and may be they will be ready soon to take the CPython
place as the mainstream implementation.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2010-12-31 Thread Cesare Di Mauro

2010/12/31 Maciej Fijalkowski fij...@gmail.com

 On Fri, Dec 31, 2010 at 12:00 PM, Maciej Fijalkowski fij...@gmail.com
 wrote:
  OK, but is it mandatory? For example, in the above code, I can unroll
 the
  loop because I found that range is the usual built-in, 5 is a low-enough
  constant,
 
  How do you know xrange is xrange and not something else?
 
  Cheers,
  fijal
 

 Err, misread. How do you know that range is a builtin you're thinking
 about and not some other object?

 Cheers,
 fijal


By a special opcode which could do this work. ]:-)

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2010-12-31 Thread Cesare Di Mauro

2010/12/31 Ethan Furman et...@stoneleaf.us

 Cesare Di Mauro wrote:

 

 2010/12/29 Martin v. Löwis wrote:

 

 Am 28.12.2010 18:08, schrieb Lukas Lueg:

 Also, the load_fast in lne 22 to reference x could be taken out of the

  loop as x will always point to the same object


  That's not true; a debugger may change the value of x.


 Another example. I can totally remove the variable i, just using the
 stack, so a debugger (or, in general, having the tracing enabled) cannot
 even find something to change about it.


 -1

 Debugging is challenging enough as it is -- why would you want to make it
 even more difficult?

 ~Ethan~


With a good test suite you can forget debuggers.

In more than 6 years of Python programming, I have used it only two times
(to debug an ANTLR generated parser).

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2010-12-31 Thread Cesare Di Mauro

2010/12/31  s...@pobox.com


 Another example. I can totally remove the variable i, just using the
 stack, so a debugger (or, in general, having the tracing enabled)
   cannot even find something to change about it.

   Ethan -1

Ethan Debugging is challenging enough as it is -- why would you want to
Ethan make it even more difficult?

 snarky
 I don't know.  Maybe he wants his program to run faster.
  /snarky


:D

Aggressive optimizations can be enabled with explicit options, in order to
leave normal debugger-prone code.


 If you use print statements for the bulk of your debugging (many people
 do),
 unrolling loops doesn't affect your debugging ability.

  Skip


It's a common practice. Also IDEs helps a lot, and advanced interactive
shells too (such as DreamPie).

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2010-12-30 Thread Cesare Di Mauro

2010/12/28 Lukas Lueg lukas.l...@googlemail.com

 Consider the following code:

 def foobar(x):
for i in range(5):
x[i] = i

 The bytecode in python 2.7 is the following:

  2   0 SETUP_LOOP  30 (to 33)
  3 LOAD_GLOBAL  0 (range)
  6 LOAD_CONST   1 (5)
  9 CALL_FUNCTION1
 12 GET_ITER
   13 FOR_ITER16 (to 32)
 16 STORE_FAST   1 (i)

  3  19 LOAD_FAST1 (i)
 22 LOAD_FAST0 (x)
 25 LOAD_FAST1 (i)
 28 STORE_SUBSCR
 29 JUMP_ABSOLUTE   13
   32 POP_BLOCK
   33 LOAD_CONST   0 (None)
 36 RETURN_VALUE

 Can't we optimize the LOAD_FAST in lines 19 and 25 to a single load
 and put the reference twice on the stack? There is no way that the
 reference of i might change in between the two lines. Also, the
 load_fast in lne 22 to reference x could be taken out of the loop as x
  will always point to the same object


Yes, you can, but you need:
- a better AST evaluator (to mark symbols/variables with proper attributes);
- a better optimizer (usually located on compile.c) which has a global
vision (not limited to single instructions and/or single expressions).

It's not that simple, and the results aren't guaranteed to be good.

Also, consider that Python, as a dynamic-and-not-statically-compiled
language need to find a good trade-off between compilation time and
execution.

Just to be clear, a C program is usually compiled once, then executed, so
you can spend even *hours* to better optimize the final binary code.

With a dynamic language, usually the code is compiled and the executed as
needed, in realtime. So it isn't practical neither desirable having to
wait too much time before execution begins (the startup problem).

Python stays in a gray area, because modules are usually compiled once
(when they are first used), and executed many times, but it isn't the only
case.

You cannot assume that optimization techniques used on other (static)
languages can be used/ported in Python.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Possible optimization for LOAD_FAST ?

2010-12-30 Thread Cesare Di Mauro

2010/12/29 Martin v. Löwis mar...@v.loewis.de

 Am 28.12.2010 18:08, schrieb Lukas Lueg:
  Also, the
  load_fast in lne 22 to reference x could be taken out of the loop as x
  will always point to the same object

 That's not true; a debugger may change the value of x.

 Regards,
 Martin


OK, but is it mandatory? For example, in the above code, I can unroll the
loop because I found that range is the usual built-in, 5 is a low-enough
constant, and the body is made by a simple statement.

Another example. I can totally remove the variable i, just using the stack,
so a debugger (or, in general, having the tracing enabled) cannot even find
something to change about it.

And so on with other optimization examples that can be possible.

Are they legal with Python? I think that we need to make it clear what
happens in such cases.

My idea is that it should be made implementation-specific. What happens with
local variables and the generated code must depend on the specific compiler
 virtual machine, in order to have a greater flexibility.

IMHO the most important thing should be that, under normal conditions, the
executed code have the expected behavior.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw support?

2010-08-13 Thread Cesare Di Mauro

2010/8/13 Greg Ewing greg.ew...@canterbury.ac.nz

 Cesare Di Mauro wrote:

  You must suggest at least an equivalent free alternative to make the
 switch convenient.

 Otherwise we are talking about philosophy or religion, and nobody will
 change his ideas.




I think the point is that *because* people don't want to change
 their ideas, it would be good to have a mingw-based alternative.
 Otherwise everyone is forced to convert to the Windows religion.

 --
 Greg


I like to use Windows because it's a comfortable and productive environment,
certainly not because someone forced me to use it.

Also, I have limited time, so I want to spend it the better I can, focusing
on solving real problems. Setup, Next, Next, Finish, and I want it working
without thinking about anything else.

It's a philosophy similar to Python: you don't need to know if the platform
where it's running is 32 or 64 bits, little or big endian, the operating
system, and so on. Just launch it and start typing code: it'll work.

It can be also a matter of taste. I like graphical environments since the
old Amiga days. If I need a shell, I greatly prefer Python.

Anyway, for Windows there's cygwin too, and Python works. But after some
months I replaced it with native Windows tools (with VisualStudio on top): I
work much, much better this way.

If someone is interested in a mingw port, he should consider about having
decent alternatives to what a Windows user can found on his platform,
otherwise it'll be just pure exercise or a faith matter, since nobody will
use it concretely on a daily work.

Give users a better choice, and I don't see logical reasons because they'll
not change their mind.

My 2 cents.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] mingw support?

2010-08-12 Thread Cesare Di Mauro

2010/8/12  li...@gabriel-striewe.de

 On Thu, Aug 12, 2010 at 05:38:52PM +0900, David Cournapeau wrote:
  On Wed, Aug 11, 2010 at 10:21 PM, Sturla Molden stu...@molden.no
 wrote:
  
   David Cournapeau:
   Autotools only help for posix-like platforms. They are certainly a big
   hindrance on windows platform in general,
  
   That is why mingw has MSYS.
 
  I know of MSYS, but it is not very pleasant to use, if only because it
  is extremely slow. When I need to build things for windows, I much
  prefer cross compiling to using MSYS. I also think that cross
  compilation is more useful than native mingw build alone - there are
  patches for cross compilation, but I don't know their current status,
 
  cheers,
 
  David

 My argument goes that one of the biggest differences between the
 GNU/Linux and the Windows way of computing is the barrier between user
 and programmer. In the Windows way, you are either a user or a
 programmer. On Linux, just by the way you can download software and
 run ./configure   make  make install, you are encouraged to look at the
 source code and by this you might in the not-so-long run start
 reporting bugs to mailing lists and see that there are actually people
 who might be able to sort out the bugs and that you might become one
 of them.

 The Windows way, you think those bugs are unavoidable and start making
 jokes out of a feeling of frustration and helplessness.

 That's where msys/mingw is supposed to come in, if only it was easier
 to install, so that new Programmers don't get locked in in the
 Microsoft compiler products and thereby the divide between the
 software communities gets wider and wider.

 Don't get me wrong, I think the python project is doing a great job in
 terms of cross-platform portability, but things would be easier if
 there was an easy combination of msys, mingw and autotools.

 And by the way, I think the way the big linux distros like fedora and
 mandrake distribute software is more similar to the windows way of
 computing.

 Gabriel


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/cesare.di.mauro%40gmail.com


Anyway Visual Studio, even with the Express Edition, is simpler and more
productive for a Windows programmer.

You must suggest at least an equivalent free alternative to make the
switch convenient.

Otherwise we are talking about philosophy or religion, and nobody will
change his ideas.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Drive suffix

2010-08-05 Thread Cesare Di Mauro

2010/8/5 Greg Ewing greg.ew...@canterbury.ac.nz

 James Mills wrote:

 Windows
 is one of the only Operating Systems with a File system that reuiqres
 this [A-Z]:\ syntax.


 There's also VMS, but it uses a colon too. Also its
 pathnames are funky enough in other ways that it
 needs its own os-specific pathname routines.

 I'm not aware of any system that's just like Windows
 except that it uses something other than colons.

 --
 Greg


AmigaOS / AROS / MorphOS uses colon too as a volume (or device) separator:

dir Ram Disk:System/Local Preferences

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations...

2010-07-23 Thread Cesare Di Mauro

2010/7/23 Stefan Behnel stefan...@behnel.de

 stefan brunthaler, 23.07.2010 08:48:

 If we take for instance the BINARY_ADD instruction, the interpreter
 evaluates the actual operand types and chooses the matching operation
 implementation at runtime, i.e., operands that are unicode strings
 will be concatenated via unicode_concatenate, for float operands on
 the other hand, the interpreter would end up invoking float_add via
 binary_op1. Now, a very efficient way to achieve purely interpretative
 inline caching is to quicken the type-generic BINARY_ADD instruction
 to a type-dependent FLOAT_ADD instruction (this technique, i.e.,
 inline caching via quickening, is the primary contribution of my ECOOP
 paper). Hence, I have a very simple code generator, that generates
 type-dependent interpreter instructions in a pre-compile step of the
 interpreter, and uses runtime type information to quicken/rewrite
 instructions.
 Aside of the operators, I have implemented this quickening technique
 for FOR_ITER, COMPARE_OP and CALL_FUNCTION instructions.

 This sounds like wpython (a CPython derivative with a wider set of byte
 code commands) could benefit from it.


WPython 1.1 does it at compile time, if you enable the new experimental
integer opcodes flag.

Similar optimizations were introduced with new opcodes for specialized
string interpolation and joins, which are common operations in Python.

It also added a new opcode GET_GENERATOR which internally uses a faster
function call, which is used also by (the modified) BUILD_CLASS for the same
reason (cut some unnecessary checks and code).

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python 3 optimizations...

2010-07-23 Thread Cesare Di Mauro

2010/7/23 stefan brunthaler ste...@brunthaler.net

  This sounds like wpython (a CPython derivative with a wider set of byte
 code
  commands) could benefit from it.
 
 I am aware of the wpython project of Cesare di Mauro.


wpython has reached 1.1 final version. If you are interested, you can find
it here: http://code.google.com/p/wpython2/ and you can download the new
slides that cover the improvements over 1.0 alpha.


 I change the
 instruction format from bytecode to wordcode, too (because it allows
  for more efficient instruction decoding).


Did you used wpython wordcode format, or a new one?


 Contrary to his approach,
 however, I do not change the instruction encoding to pack in
 additional optimizations. (I hope to have put that correctly; I have
  seen his slides about a year ago.)


Yes, you're right. wpython approach is to encode as much information as it
can to save space, decoding time, specialize some opcodes, etc..

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Can Python implementations reject semantically invalid expressions?

2010-07-02 Thread Cesare Di Mauro

2010/7/2 Guido van Rossum gu...@python.org

 On Fri, Jul 2, 2010 at 10:28 AM, Nick Coghlan ncogh...@gmail.com wrote:
  On Sat, Jul 3, 2010 at 3:13 AM, Craig Citro craigci...@gmail.com
 wrote:
  1/0 is much faster to type than raise SomeError and serves the same
  purpose sometimes for debugging purposes.  Let's not forget that not
  all code is written for eternity :)
 
 
  Doesn't raise do the same thing for just two extra characters?
 
  No, raise on its own is only valid in an exception handler. Writing
  1/0 is at least somewhat common as an idiom for forcing a
  ZeroDivisionError in examples and in test harnesses (I know I have
  used it for both of those things many times).
 
  Given the diverse range of uses Python is put to, moving things from
  runtime to compile time can definitely have significant unexpected
  consequences (hence why many of us would be hesitant to consider an
  implementation that made such changes to be an actual Python
  implementation).

 +1 on not changing this.

 For one, this will most likely break a large amount of 3rd party and
 stdlib software -- there are tons of statements like this that are
 practically unreachable or intentional.

 Second, I don't think it's going to make the kind of difference the OP
 is thinking of. Since Python is totally dynamic, and doesn't have
 macros, the only cases that would be caught would be things you are
 unlikely to type by accident -- like 1/0 or 1+1. In other languages
 that have this behavior, there is usually a benefit where the
 arguments involved are *variables* whose type is known to the
 compiler, so it will catch things like (simple C example)

 #define FOO 0
 main() {
  printf(%d\n, 1/FOO);
 }

 However the equivalent Python

 FOO = 0
 def main():
  print 1/FOO

 cannot be rejected at compile time because there is insufficient
 evidence that the value of FOO won't be changed before main() is
 called.

 I even reject the substitution of raise ZeroDivisionError for 1/0
 since (a) nobody cares about such an optimization, and (b) it would
 break introspection and invalidate tests. (We have a long history of
 constant propagation in expressions causing subtle bugs. This could be
 worse.)

 --
 --Guido van Rossum (python.org/~guido)


from __future__ import compile_checks

Cesare Di Mauro
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] WPython 1.1 was released

2010-06-23 Thread Cesare Di Mauro

2010/6/23 Steven D'Aprano st...@pearwood.info

 On Wed, 23 Jun 2010 08:12:36 pm Cesare Di Mauro wrote:
  I've released WPython 1.1, which brings many optimizations and
  refactorings.

 For those of us who don't know what WPython is, and are too lazy, too
 busy, or reading their email off-line, could you give us a one short
 paragraph description of what it is?

 Actually, since I'm none of the above, I'll answer my own question:
 WPython is an implementation of Python that uses 16-bit wordcodes
 instead of byte code, and claims to have various performance benefits
 from doing so.

 It looks like good work, thank you.

 --
 Steven D'Aprano


Hi Steven,

sorry, I made a mistake, assuming that the project was known.

WPython is a CPython 2.6.4 implementation that uses wordcodes instead of
bytecodes. A wordcode is a word (16 bits, two bytes, in this case) used to
represent VM opcodes. This new encoding enabled to simplify the execution of
the virtual machine main cycle, improving understanding, maintenance, and
extensibility; less space is required on average, and execution speed is
improved too.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] WPython 1.1 was released

2010-06-23 Thread Cesare Di Mauro

2010/6/23 Terry Reedy tjre...@udel.edu

 On 6/23/2010 7:28 AM, Cesare Di Mauro wrote:
 WPython is a CPython 2.6.4 implementation that uses wordcodes instead
 of bytecodes. A wordcode is a word (16 bits, two bytes, in this case)

 I suggest you specify the base version (2.6.4) on the project page as that
 would be very relevant to many who visit. One should not have to download
 and look at the source to discover to discover if they should bother
 downloading the code. Perhaps also add a sentence as to the choice (why not
 3.1?).

 --
 Terry Jan Reedy


Thanks for the suggestions. I've updated the main project accordingly. :)

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Scope object (Re: nonlocals() function?)

2010-04-08 Thread Cesare Di Mauro

2010/4/7 Steven D'Aprano st...@pearwood.info

 On Tue, 6 Apr 2010 04:25:08 pm Cesare Di Mauro wrote:

  It will certainly. There's MUCH that can be optimized to let CPython
  squeeze more performance from static analysis (even a gross one) on
  locals.
 [...]
  They are just dummy examples, but can make it clear how far
  optimizations can go with static analysis on locals. Python is a
  language that make it possible to use such analysis at compile time,
  and I think it is a very good thing.

 I'm not opposed to the idea of optimisations in general (far from it!)
 but in case anyone is thinking about doing any work in this area,
 please be careful about floating point optimisations. E.g. given a float
 x, you can't assume that x*0 == 0. Nor can you assume that 0-x is the
 same as -x. (The second is *almost* always correct, except for one
 float value.)

 See, for example, the various writings by Professor Kahan:

 http://www.drdobbs.com/184410314
 http://www.cs.berkeley.edu/~wkahan/

 Most of the issues discussed apply to languages that deal with floats at
 a lower level than Python does, but still, simple minded optimizations
 will break corner cases no matter what language you use.

 --
 Steven D'Aprano


Thanks for the useful links.

I never applied such kind of optimizations, and I think I'll never to do it
anyway. :)

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Scope object (Re: nonlocals() function?)

2010-04-06 Thread Cesare Di Mauro

2010/4/6 Antoine Pitrou solip...@pitrou.net

 Greg Ewing greg.ewing at canterbury.ac.nz writes:
 
  Maybe it would be better to deprecate globals() and locals()
  and replace them with another function called something like
  scope().

 It is useful to distinguish between globals (i.e., module-level variables)
 and
 locals, so replacing them with scope() would not be better IMO.

  It would return a mapping object that looks up
  names in the current scope. It could also improve on locals()
  by being writable.

 If you can prove that making locals() (or its replacement) writable doesn't
 complicate the interpreter core too much, then why not. Otherwise -1 :-)

 Regards

 Antoine.


It will certainly. There's MUCH that can be optimized to let CPython squeeze
more performance from static analysis (even a gross one) on locals.

Example:

def f():
a = 1
b = 2
return a + b

can be reduced to something similar to:

def f():
a = 1
b = 2
return 3

and, more aggressively, like:

def f():
return 3

They are just dummy examples, but can make it clear how far optimizations
can go with static analysis on locals. Python is a language that make it
possible to use such analysis at compile time, and I think it is a very good
thing.

Obviously the last example brings questions regards the language semantic:
is it right to suppress unused or not useful local variables? A
conservative answer will be clearly NO. But I hope that a future language
specification will fix some aspects, putting clear what you can expect from
the language itself, and what is closet to the implementation.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Scope object (Re: nonlocals() function?)

2010-04-06 Thread Cesare Di Mauro

2010/4/6 Greg Ewing greg.ew...@canterbury.ac.nz

 Cesare Di Mauro wrote:

  It will certainly. There's MUCH that can be optimized to let CPython
 squeeze more performance from static analysis (even a gross one) on locals.


 But can the existing locals() function be implemented in
 the face of such optimisations?

 If it can, then a locals view object shouldn't be too much
 harder.

 If it can't, then you have already given up full CPython
 compatibility.

 --
 Greg


A read-only locals view can be a good comprise, because at least the first
example I showed can be approached well.

For the second example, there's no full compatibility with the current
CPython implementation.

But implementations can change over the time: we can clearly define that on
future CPython versions no assumptions must be made about locals usage,
and in general about instructions generation.
The most important thing is that the function f() does what is called to do:
return the numeric constant 3.

This gives us the opportunity to schedule more efficient optimizations,
without losing generality about the language (only some weird tricks will
not be supported).

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-02-01 Thread Cesare Di Mauro

2010/2/1 Collin Winter collinwin...@google.com

 I believe these VMs would have little overlap. I cannot imagine that
 Unladen Swallow's needs have much in common with Stackless's, or with
 those of a hypothetical register machine to replace the current stack
 machine.

 Let's consider that last example in more detail: a register machine
 would require completely different bytecode. This would require
 replacing the bytecode compiler, the peephole optimizer, and the
 bytecode eval loop. The frame object would need to be changed to hold
 the registers and a new blockstack design; the code object would have
 to potentially hold a new bytecode layout.

 I suppose making all this pluggable would be possible, but I don't see
 the point. This kind of experimentation is ideal for a branch: go off,
 test your idea, report your findings, merge back. Let the branch be
 long-lived, if need be. The Mercurial migration will make all this
 easier.

  Getting the right would certainly require a major effort, but it
  would also reduce the need to have several branches of C-based
  Python implementations.

 If such a restrictive plugin-based scheme had been available when we
 began Unladen Swallow, I do not doubt that we would have ignored it
 entirely. I do not like the idea of artificially tying the hands of
 people trying to make CPython faster. I do not see any part of Unladen
 Swallow that would have been made easier by such a scheme. If
 anything, it would have made our project more difficult.

  Collin Winter


I completely agree. Working with wpython I have changed a lot of code
ranging from the ASDL grammar to the eval loop, including some library
module and tests (primarily the Python-based parser and the disassembly
tools; module finder required work, too).
I haven't changed the Python objects or the object model (except in the
alpha release; then I dropped this invasive change), but I've added some
helper functions in object.c, dict.c, etc.

A pluggable VM isn't feasible because we are talking about a brand new
CPython (library included), to be chosen each time.

If approved, this model will limit a lot the optimizations that can be
implemented to make CPython running faster.

Cesare Di Mauro
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-30 Thread Cesare Di Mauro

I'm back with some tests that I made with the U-S test suite.

2010/1/30 Scott Dial
scott+python-...@scottdial.comscott%2bpython-...@scottdial.com


 Cesare, just FYI, your Hg repository has lost the execute bits on some
 files (namely ./configure and ./Parser/asdl_c.py), so it does not
 quite build out-of-the-box.


Unfortunately, I haven't found a solution to this problem. If somebody
working with Windows and Mercurial (I use TortoiseHg graphical client) can
give help on this issue, I'll release wpython 1.1 final.


 I took the liberty of cloning your repo into my laptop's VirtualBox
 instance of Ubuntu. I ran the default performance tests from the U-S
 repo, with VirtualBox at highest priority. As a sanity check, I ran it
 against the U-S trunk. I think the numbers speak for themselves.

  --
 Scott Dial
 sc...@scottdial.com
 scod...@cs.indiana.edu


I downloaded U-S test suite, and made some benchmarks with my machine.
Django and Spambayes tests didn't run:

Running django...
INFO:root:Running D:\Projects\wpython\wpython10_test\PCbuild\python
performance/bm_django.py -n 100
Traceback (most recent call last):
File perf.py, line 1938, in module
main(sys.argv[1:])
File perf.py, line 1918, in main
options)))
File perf.py, line 1193, in BM_Django
return SimpleBenchmark(MeasureDjango, *args, **kwargs)
File perf.py, line 590, in SimpleBenchmark
*args, **kwargs)
File perf.py, line 1189, in MeasureDjango
return MeasureGeneric(python, options, bm_path, bm_env)
File perf.py, line 960, in MeasureGeneric
inherit_env=options.inherit_env)
File perf.py, line 916, in CallAndCaptureOutput
raise RuntimeError(Benchmark died:  + err)
RuntimeError: Benchmark died: Traceback (most recent call last):
File performance/bm_django.py, line 25, in module
from django.template import Context, Template
ImportError: No module named template

Running spambayes...
INFO:root:Running D:\Projects\wpython\wpython10_test\PCbuild\python
performance/bm_spambayes.py -n 50
Traceback (most recent call last):
File perf.py, line 1938, in module
main(sys.argv[1:])
File perf.py, line 1918, in main
options)))
File perf.py, line 1666, in BM_spambayes
return SimpleBenchmark(MeasureSpamBayes, *args, **kwargs)
File perf.py, line 590, in SimpleBenchmark
*args, **kwargs)
File perf.py, line 1662, in MeasureSpamBayes
return MeasureGeneric(python, options, bm_path, bm_env)
File perf.py, line 960, in MeasureGeneric
inherit_env=options.inherit_env)
File perf.py, line 916, in CallAndCaptureOutput
raise RuntimeError(Benchmark died:  + err)
RuntimeError: Benchmark died: Traceback (most recent call last):
File performance/bm_spambayes.py, line 18, in module
from spambayes import hammie, mboxutils
ImportError: No module named spambayes

Anyway, I run all others with wpython 1.0 final:

C:\Temp\unladen-swallow-testsC:\temp\Python-2.6.4\PCbuild\python perf.py -r
-b default,-django,-spambayes C:\temp\Python-2.6.4\PCbuild\python
D:\Projects\wpython\wpython10_test\PCbuild\python

Report on Windows Conan post2008Server 6.1.7600 x86 AMD64 Family 15 Model 12
Stepping 0, AuthenticAMD
Total CPU cores: 1

### 2to3 ###
Min: 43.408000 - 38.528000: 1.1267x faster
Avg: 44.448600 - 39.391000: 1.1284x faster
Significant (t=10.582185)
Stddev: 0.84415 - 0.65538: 1.2880x smaller
Timeline: http://tinyurl.com/ybdwese

### nbody ###
Min: 1.124000 - 1.109000: 1.0135x faster
Avg: 1.167630 - 1.148190: 1.0169x faster
Not significant
Stddev: 0.09607 - 0.09544: 1.0065x smaller
Timeline: http://tinyurl.com/yex7dfv

### slowpickle ###
Min: 1.237000 - 1.067000: 1.1593x faster
Avg: 1.283800 - 1.109070: 1.1575x faster
Significant (t=11.393574)
Stddev: 0.11086 - 0.10596: 1.0462x smaller
Timeline: http://tinyurl.com/y8t5ess

### slowspitfire ###
Min: 2.079000 - 1.928000: 1.0783x faster
Avg: 2.148920 - 1.987540: 1.0812x faster
Significant (t=7.731224)
Stddev: 0.15384 - 0.14108: 1.0904x smaller
Timeline: http://tinyurl.com/yzexcqa

### slowunpickle ###
Min: 0.617000 - 0.568000: 1.0863x faster
Avg: 0.645420 - 0.590790: 1.0925x faster
Significant (t=7.087322)
Stddev: 0.05478 - 0.05422: 1.0103x smaller
Timeline: http://tinyurl.com/ycsoouq


I also made some tests with wpython 1.1, leaving bytecode peepholer enabled:

C:\Temp\unladen-swallow-testsC:\temp\Python-2.6.4\PCbuild\python perf.py -r
-b default,-django,-spambayes C:\temp\Python-2.6.4\PCbuild\python
D:\Projects\wpython\wpython_test\PCbuild\python

Report on Windows Conan post2008Server 6.1.7600 x86 AMD64 Family 15 Model 12
Stepping 0, AuthenticAMD
Total CPU cores: 1

### 2to3 ###
Min: 43.454000 - 39.912000: 1.0887x faster
Avg: 44.301000 - 40.766800: 1.0867x faster
Significant (t=8.188533)
Stddev: 0.65325 - 0.71041: 1.0875x larger
Timeline: http://tinyurl.com/ya5z9mg

### nbody ###
Min: 1.125000 - 1.07: 1.0514x faster
Avg: 1.169270 - 1.105530: 1.0577x faster
Significant (t=4.774702)
Stddev: 0.09655 - 0.09219: 1.0473x smaller
Timeline: http://tinyurl.com/y8udjmk

### slowpickle ###
Min: 1.235000 - 1.094000: 1.1289x faster
Avg: 1.275860 - 1.132740:

Re: [Python-Dev] default of returning None hurts performance?

2010-01-29 Thread Cesare Di Mauro

Hi Antoine,

Wpython already addressed this with two new opcodes.

RETURN_CONST constant_index

which is an equivalent of:

LOAD_CONST constant_index
RETURN_VALUE

and with:

CALL_PROC_RETURN_CONST function_arguments, constant_index

which is an equivalent of:

CALL_FUNCTION function_arguments
POP_TOP
LOAD_CONST constant_index
RETURN_VALUE

The new opcodes handle common patterns that I found with a stats tool I
made, and helps both on space size and execution speed.

Regards,

Cesare

2009/9/1 Antoine Pitrou solip...@pitrou.net

 Gregory P. Smith greg at krypto.org writes:
 
  I was just wondering if a bytecode for a superinstruction of the common
 sequence:
  6 POP_TOP
  7 LOAD_CONST   0 (None)
  10 RETURN_VALUE
  might be worth it.

 I think superinstructions in general would be a good thing to experiment,
 as
 wpython showed. Direct addressing (via a pseudo register file combining
 locals
 and constants) would eliminate many bookkeeping-related opcodes in common
 bytecode.

 Regards

 Antoine.


 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/cesare.dimauro%40a-tono.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-29 Thread Cesare Di Mauro

2010/1/29  s...@pobox.com


Cesare I think that wpython as a proof-of-concept have done its work,
Cesare showing its potentials.

 If you haven't alreayd is there any chance you can run the Unladen Swallow
 performance test suite and post the results?  The code is separate from U-S
 and should work with wpython:

http://unladen-swallow.googlecode.com/svn/tests

 --
 Skip Montanaro - s...@pobox.com - http://www.smontanaro.net/


I work on a Windows machine, so I don't know if I can run the U-S test suite
on it (the first time I tried, it failed since U-S used a module available
on Unix machines only).

If it works now, I can provide results with wpython 1.0 final and the
current 1.1 I'm working on (which has additional optimizations; I've also
moved all peephole optimizer code on compile.c).

Anyway, Mart Sõmermaa provided some
resultshttp://www.mail-archive.com/python-dev@python.org/msg43294.htmlbased
on wpython 1.0 alpha (you can find the wpython 1.0 final
here http://code.google.com/p/wpython2/downloads/list).

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-29 Thread Cesare Di Mauro

2010/1/29  s...@pobox.com


Cesare ... (you can find the wpython 1.0 final here
Cesare http://code.google.com/p/wpython2/downloads/list).

 I tried downloading it.  Something about wpython10.7z and wpython10_fix.7z.
 What's a 7z file?  What tool on my Mac will unpack that?  Can I build and
 run wpython on my Mac or is it Windows only?

 Thx,

 Skip


You can find 7-Zip tools here http://www.7-zip.org/download.html.

If you use Mercurial, you can grab a local copy this way:

hg clone https://wpython10.wpython2.googlecode.com/hg/ wpython2-wpython10

Wpython is intended to run on any platform where CPython 2.6.4 runs.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-29 Thread Cesare Di Mauro

2010/1/29 Antoine Pitrou solip...@pitrou.net

 Cesare Di Mauro cesare.di.mauro at gmail.com writes:
 
  If python dev community is interested, I can work on a 3.x branch,
 porting
  all optimizations I made (and many others that I've planned to implement)
 one
  step at the time, in order to carefully check and validate any change
 with
  expert people monitoring it.

 We are certainly more interested in a 3.x branch than in a 2.x one ;-)
 You can start by cloning http://code.python.org/hg/branches/py3k/

 Or you could submit patches piecewise on http://bugs.python.org


I prefer to make a branch with Mercurial, which I found a comfortable tool.
:)


 I think the first step would be to switch to 16-bit bytecodes. It would be
 uncontroversial (the increase in code size probably has no negative effect)
 and
 would provide the foundation for all of your optimizations.


I agree. At the beginning I need to disable the peepholer, so performances
are better to be compared when all peephole optimizations will be ported to
the wordcode model.

I'll make the branch after I release wpython 1.1, which I'll do ASAP.


 Are you going to PyCon?

  Antoine.


No, I don't. But if there's a python-dev meeting, I can make a (long) jump.
May be it can be easier to talk about the superinstructions model, and I can
show and comment all optimizations that I made.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-29 Thread Cesare Di Mauro

2010/1/29  s...@pobox.com


 One strong suggestion for future releases: Please put a top-level directory
 in your archives.  It is annoying to expect that only to have an archive
 expand into the current directory without creating a directory of its own.
 I've been burned often enough that I always check before expanding source
 archives from new (to me) sources, so no harm, no foul in this case.

 Skip


You're right. Excuse me. I'll do it next time.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-29 Thread Cesare Di Mauro

2010/1/29 Nick Coghlan ncogh...@gmail.com

 I wouldn't consider changing from bytecode to wordcode uncontroversial -
 the potential to have an effect on cache hit ratios means it needs to be
 benchmarked (the U-S performance tests should be helpful there).


It's quite strange, but from the tests made it seems that wpython perform
better with old architectures (such as my Athlon64 socket 754), which have
less resources like caches.

It'll be interesting to check how it works on more limited ISAs. I'm
especially curious about ARMs.


 It's the same basic problem where any changes to the ceval loop can have
 surprising performance effects due to the way they affect the compiled
 switch statements ability to fit into the cache and other low level
 processor weirdness.

 Cheers,
 Nick.


Sure, but consider that with wpython wordcodes require less space on
average. Also, less instructions are executed inside the ceval loop, thanks
to some natural instruction grouping.

For example, I recently introduced in wpython 1.1 a new opcode to handle
more efficiently expression generators. It's mapped as a unary operator, so
it exposes interesting properties which I'll show you with an example.

def f(a):
return sum(x for x in a)

With CPython 2.6.4 it generates:

  0 LOAD_GLOBAL 0 (sum)
  3 LOAD_CONST 1 (code object genexpr at 00512EC8, file stdin, line
1)
  6 MAKE_FUNCTION 0
  9 LOAD_FAST 0 (a)
12 GET_ITER
13 CALL_FUNCTION 1
16 CALL_FUNCTION 1
19 RETURN_VALUE

With wpython 1.1:

0 LOAD_GLOBAL 0 (sum)
1 LOAD_CONST 1 (code object genexpr at 01F13208, file stdin, line 1)
2 MAKE_FUNCTION 0
3 FAST_BINOP get_generator a
5 QUICK_CALL_FUNCTION 1
6 RETURN_VALUE

The new opcode is GET_GENERATOR, which is equivalent (but more efficient,
using a faster internal function call) to:

GET_ITER
CALL_FUNCTION 1

The compiler initially generated the following opcodes:

LOAD_FAST 0 (a)
GET_GENERATOR

then the peepholer recognized the pattern UNARY(FAST), and produced the
single opcode:

FAST_BINOP get_generator a

In the end, the ceval loop executes a single instruction instead of three.
The wordcode requires 14 bytes to be stored instead of 20, so it will use 1
data cache line instead of 2 on CPUs with 16 bytes lines data cache.

The same grouping behavior happens with binary operators as well. Opcodes
aggregation is a natural and useful concept with the new wordcode structure.

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-29 Thread Cesare Di Mauro

2010/1/29 Antoine Pitrou solip...@pitrou.net

 Actually, wordcode could allow accesses in the eval loop to be done on
 aligned words, so as to fetch operands in one step on little-endian CPUs
 (instead of recombining bytes manually).

  Regards

 Antoine.


I think that big-endians CPUs can get benefits too, since a single word load
operation is needed, followed by an instruction such as ROL #8 to adjust
the result (supposing that the compiler is smart enough to recognize the
pattern).

Using bytecodes, two loads are needed to retrieve the two bytes, and some
SHIFT  OR instructions to combine them getting the correct word. Loads are
generally more expensive / limited.

All that not counting the operations needed to advance the instruction
pointer.

Regards,

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-29 Thread Cesare Di Mauro

2010/1/30  exar...@twistedmatrix.com

 On 10:55 pm, collinwin...@google.com wrote:


 That people are directly munging CPython
 bytecode means that CPython should provide a better, more abstract way
 to do the same thing that's more resistant to these kinds of changes.


 It might be helpful to hear more about how the wordcode implementation
 differs from the bytecode implementation.  It's challenging to abstract from
 a single data point. :)

  Jean-Paul


Wordcodes structure is simple. You can find information
herehttp://wpython2.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf.
Slide 6 provide a description, 7 details about the structure and some
examples. Slide 9 explains how I mapped most bytecodes grouping in 6
families.

However, wordcodes internals can be complicated to pretty print, because
they may carry many information. You can take a look at
opcode.hhttp://code.google.com/p/wpython2/source/browse/Include/opcode.h?repo=wpython10and
dis.pyhttp://code.google.com/p/wpython2/source/browse/Lib/dis.py?repo=wpython10(function
common_disassemble) to understand why this happens.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-29 Thread Cesare Di Mauro

2010/1/30 Scott Dial
scott+python-...@scottdial.comscott%2bpython-...@scottdial.com


 Cesare, just FYI, your Hg repository has lost the execute bits on some
 files (namely ./configure and ./Parser/asdl_c.py), so it does not
 quite build out-of-the-box.


That's probably because I worked on Windows. I have to address this issue.
Thanks.


 I took the liberty of cloning your repo into my laptop's VirtualBox
 instance of Ubuntu. I ran the default performance tests from the U-S
 repo, with VirtualBox at highest priority. As a sanity check, I ran it
 against the U-S trunk. I think the numbers speak for themselves.

  --
  Scott Dial


I see. I don't know why you got those numbers. Until now, what I saw are
better performances on average with wpython.

In a previous mail, Collin stated that when they implemented wordcode con
U-S they got benefits from the new opcode structure.

May be more tests on different hardware / platforms will help.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-28 Thread Cesare Di Mauro

Hi Collin,

Thanks for the useful links.

I think that superinstructions require a bit more work, because they aren't
just opcode arguments rearrangement. For example, in wpython 1.1 (that I'll
release next month) I've introduced a CALL_SUB opcode to handle all kind of
function types, so the 2 words pack together:
- the opcode (CALL_SUB);
- the function type and flags (normal, VAR, KW, procedure);
- the number of arguments;
- the number of keywords arguments.

Superinstructions aren't intended to be a simple drop-in replacement of
existing bytecodes. They can carry new ideas and implement them in a
versatile and efficient way.

Anyway, I don't think to continue with wpython: 1.1 will be the last version
I'll release (albeit I initially have planned 1.2 and 1.3 for this year, and
2.0 for 2011) for several reasons.

2.7 is the last planned 2.x release, and once it got alpha state, there's no
chance to introduce wordcodes model in it.

3.2 or later will be good candidates, but I don't want to make a new project
and fork again. Forking is a waste of time and resources (I spent over 1
year of my spare time just to prove an idea).

I think that wpython as a proof-of-concept have done its work, showing its
potentials.

If python dev community is interested, I can work on a 3.x branch, porting
all optimizations I made (and many others that I've planned to implement)
one step at the time, in order to carefully check and validate any change
with expert people monitoring it.

Cesare

2010/1/26 Collin Winter collinwin...@google.com

Hi Cesare,

On Tue, Jan 26, 2010 at 12:29 AM, Cesare Di Mauro
cesare.di.ma...@gmail.com wrote:
Hi Collin,

One more question: is it easy to support more opcodes, or a different
opcode
structure, in Unladen Swallow project?

I assume you're asking about integrating WPython. Yes, adding new
opcodes to Unladen Swallow is still pretty easy. The PEP includes a
section on this,

http://www.python.org/dev/peps/pep-3146/#experimenting-with-changes-to-python-or-cpython-bytecode
,
though it doesn't cover something more complex like converting from
bytecode to wordcode, as a purely hypothetical example ;) Let me know
if that section is unclear or needs more data.

Converting from bytecode to wordcode should be relatively
straightforward, assuming that the arrangement of opcode arguments is
the main change. I believe the only real place you would need to
update is the JIT compiler's bytecode iterator (see

http://code.google.com/p/unladen-swallow/source/browse/trunk/Util/PyBytecodeIterator.cc
).
Depending on the nature of the changes, the runtime feedback system
might need to be updated, too, but it wouldn't be too difficult, and
the changes should be localized.

Collin Winter

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-26 Thread Cesare Di Mauro

Hi Collin,

2010/1/25 Collin Winter collinwin...@google.com

 Hi Cesare,

 On Sat, Jan 23, 2010 at 1:09 PM, Cesare Di Mauro
 cesare.di.ma...@gmail.com wrote:
  Hi Collin
 
  IMO it'll be better to make Unladen Swallow project a module, to be
  installed and used if needed, so demanding to users the choice of having
 it
  or not. The same way psyco does, indeed.
  Nowadays it requires too much memory, longer loading time, and fat
 binaries
  for not-so-great performances. I know that some issues have being worked
 on,
  but I don't think that they'll show something comparable to the current
  CPython status.

 You're proposing that, even once the issues of memory usage and
 startup time are addressed, Unladen Swallow should still be an
  extension module? I don't see why.


Absolutely not, of course.


 You're assuming that these issues
 cannot be fixed, which I disagree with.


No, it's my belief, from what I see until now, that it'll be very difficult
to have a situation that is comparable to the current one, in the terms that
I've talked about (memory usage, load time, and binaries size).

I hope to make a mistake. :)



 I think maintaining something like a JIT compiler out-of-line, as
 Psyco is, causes long-term maintainability problems. Such extension
 modules are forever playing catchup with the CPython code, depending
 on implementation details that the CPython developers are right to
  regard as open to change.


I agree (especially for psyco), but ceval.c has a relatively stable code
(not mine, however :D).

It also limits what kind of optimizations
 you can implement or forces those optimizations to be implemented with
 workarounds that might be suboptimal or fragile. I'd recommend reading
 the Psyco codebase, if you haven't yet.


Optimizations are surely a point in favor of integrating U.S. project in the
main core.

Psyco, as I said before, is quite a mess. It's hard to add new back-ends for
other architectures. It's a bit less difficult to keep it in sync with
opcode changes (except for big changes), and a port to Python 3.x may be
suitable (but I don't know if the effort makes sense).

As others have requested, we are working hard to minimize the impact
 of the JIT so that it can be turned off entirely at runtime. We have
 an active issue tracking our progress at
 http://code.google.com/p/unladen-swallow/issues/detail?id=123.


I see, thanks.


  Introducing C++ is a big step, also. Aside the problems it can bring on
 some
  platforms, it means that C++ can now be used by CPython developers.

 Which platforms, specifically? What is it about C++ on those platforms
 that is problematic? Can you please provide details?


Others have talked about it.


  It
  doesn't make sense to force people use C for everything but the JIT part.
 In
  the end, CPython could become a mix of C and C++ code, so a bit more
  difficult to understand and manage.

 Whether CPython should allow wider usage of C++ or whether developer
 should be force[d] to use C is not our decision, and is not part of
 this PEP. With the exception of Python/eval.c, we deliberately have
 not converted any CPython code to C++ so that if you're not working on
 the JIT, python-dev's workflow remains the same. Even within eval.cc,
 the only C++ parts are related to the JIT, and so disappear completely
 with configured with --without-llvm (or if you're not working on the
 JIT).

 In any case, developers can easily tell which language to use based on
 file extension. The compiler errors that would result from compiling
 C++ with a C compiler would be a good indication as well.


OK, if CPython will be compilable without using C++ at all, I retire what I
said.


  What I see is that LLVM is a too big project for the goal of having
 just a
  JIT-ed Python VM. It can be surely easier to use and integrate into
 CPython,
  but requires too much resources

 Which resources do you feel that LLVM would tax, machine resources or
 developer resources? Are you referring to the portions of LLVM used by
 Unladen Swallow, or the entire wider LLVM project, including the
 pieces Unladen Swallow doesn't use at runtime?


No, I'm referring to the portions of LLVM used by U.S..

Regards resources, I was talking about memory, loading time, and binaries
size (both with static and dynamic compilation).



  (on the contrary, Psyco demands little
  resources, give very good performances, but seems to be like a mess to
  manage and extend).

 This is not my experience. For the workloads I have experience with,
 Psyco doubles memory usage while only providing a 15-30% speed
 improvement. Psyco's benefits are not uniform.


I made only computation intensive (integers, floats) tasks with Psyco, and
it worked fine.

I haven't made tests with U.S. benchmark suite.



 Unladen Swallow has been designed to be much more maintainable and
 easier to extend and modify than Psyco: the compiler and its attendant
 optimizations are well-tested (see Lib/test/test_llvm.py

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-26 Thread Cesare Di Mauro

Hi Collin,

One more question: is it easy to support more opcodes, or a different opcode
structure, in Unladen Swallow project?

Thanks,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-26 Thread Cesare Di Mauro

Hi Skip

For relatively stable code I talk about recent years.

My experience with CPython is limited, of course.

Cesare

2010/1/26 s...@pobox.com


Cesare ... but ceval.c has a relatively stable code ...

 I believe you are mistaken on several counts:

* The names of the functions in there have changed over time.

* The suite of byte code operations have changed dramatically over the
  past ten years or so.

* The relationship between the code in ceval.c and the Python threading
  model has changed.

 Any or all of these aspects of the virtual machine, as well I'm sure as
 many
 other things I've missed would have to be tracked by any extension module
 which hoped to supplant or augment its function in some way.

 Skip

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-24 Thread Cesare Di Mauro

2010/1/24 Floris Bruynooghe floris.bruynoo...@gmail.com

 Introducing C++ is a big step, but I disagree that it means C++ should
 be allowed in the other CPython code.  C++ can be problematic on more
 obscure platforms (certainly when static initialisers are used) and
 being able to build a python without C++ (no JIT/LLVM) would be a huge
 benefit, effectively having the option to build an old-style CPython
 at compile time.  (This is why I ased about --without-llvm being able
 not to link with libstdc++).

 Regards
 Floris


That's why I suggested the use of an external module, but if I have
understood correctly ceval.c needs to be changed using C++ for some parts.

If no C++ is required compiling the classic, non-jitted, CPython, my thought
was wrong, of course.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-23 Thread Cesare Di Mauro

Hi Collin

IMO it'll be better to make Unladen Swallow project a module, to be
installed and used if needed, so demanding to users the choice of having it
or not. The same way psyco does, indeed.

Nowadays it requires too much memory, longer loading time, and fat binaries
for not-so-great performances. I know that some issues have being worked on,
but I don't think that they'll show something comparable to the current
CPython status.

Introducing C++ is a big step, also. Aside the problems it can bring on some
platforms, it means that C++ can now be used by CPython developers. It
doesn't make sense to force people use C for everything but the JIT part. In
the end, CPython could become a mix of C and C++ code, so a bit more
difficult to understand and manage.

What I see is that LLVM is a too big project for the goal of having just a
JIT-ed Python VM. It can be surely easier to use and integrate into CPython,
but requires too much resources (on the contrary, Psyco demands little
resources, give very good performances, but seems to be like a mess to
manage and extend).

I know that a new, custom JIT code is an hard project to work on, requiring
long time, but the harry to have something faster to the current CPython can
bring to a mammoth that runs just a bit bitter.

Anyway, it seems that performance is a sensible argument for the Python
community. I think that a lot can be made to squeeze out more speed, working
both on CPython internals and on the JIT side.

Best regards,
Cesare

2010/1/20 Collin Winter collinwin...@google.com

 Hello python-dev,

 I've just committed the initial draft of PEP 3146, proposing to merge
 Unladen Swallow into CPython's source tree and roadmap. The initial
 draft is included below. I've also uploaded the PEP to Rietveld at
 http://codereview.appspot.com/186247, where individual fine-grained
 updates will be tracked. Feel free to comment either in this thread or
 on the Rietveld issue. I'll post periodic summaries of the
 discussion-to-date.

 We're looking forward to discussing this with everyone.

 Thanks,
 Collin Winter

 [snip...]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] wpython is back

2009-12-03 Thread Cesare Di Mauro

2009/11/27 Christian Heimes li...@cheimes.de

 Cesare Di Mauro wrote:
 
  You'll find some at page 28
  here
 http://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf
 
  ..
 
  Mart made more interesting
  oneshttp://www.mail-archive.com/python-dev@python.org/msg43282.html
 with
  Unladen benchmarks.

 The PDF document sounded interesting and I was tempted to test WPython.
 Unfortunately it doesn't compile on my box:

 $ make
 gcc -pthread -c -fno-strict-aliasing -DNDEBUG -g -fwrapv -O3 -Wall
 -Wstrict-prototypes  -I. -IInclude -I./Include   -DPy_BUILD_CORE -o
 Python/ast.o Python/ast.c


 Python/ast.c:30: warning: ‘enum _expr_const’ declared inside parameter
 list
 Python/ast.c:30: warning: its scope is only this definition or
 declaration, which is probably not what you want

 Python/ast.c:335: warning: ‘enum _expr_const’ declared inside parameter
 list
 Python/ast.c:335: error: parameter 2 (‘constant’) has incomplete type

 Python/ast.c: In function ‘Const’:

 Python/ast.c:341: error: ‘Const_kind’ undeclared (first use in this
 function)

 Python/ast.c:341: error: (Each undeclared identifier is reported only
 once
 Python/ast.c:341: error: for each function it appears in.)

 Python/ast.c:342: error: ‘union anonymous’ has no member named ‘Const’

 Python/ast.c:343: error: ‘union anonymous’ has no member named ‘Const’

 Python/ast.c: In function ‘set_context’:

 Python/ast.c:457: error: ‘Const_kind’ undeclared (first use in this
 function)

 Python/ast.c: At top level:

 Python/ast.c:591: warning: ‘enum _expr_const’ declared inside parameter
 list
 Python/ast.c:590: error: conflicting types for ‘seq_for_testlist’

 Python/ast.c:29: note: previous declaration of ‘seq_for_testlist’ was here
 [...]

 $ gcc --version
 gcc (Ubuntu 4.4.1-4ubuntu8) 4.4.1
 $ uname -a
 Linux hamiller 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:05:01 UTC
 2009 x86_64 GNU/Linux


I have created a new project at Google Code:
http://code.google.com/p/wpython2/ using Mercurial for the repository.

The master (Python 2.6.4) code is located into the default repository:
https://wpython2.googlecode.com/hg/

The wpython (version 1.0) clone is in:
https://wpython10.wpython2.googlecode.com/hg/

Sources are available in:
http://code.google.com/p/wpython2/downloads/list

wpython 1.0 is an almost complete replacement for Python 2.6.4 (except for
Doc/library.dis.rst, which I'll update later, when I stop adding or changing
opcodes).

I have changed the ASDL grammar (in Parser/Python.asdl) so that there's no
need to overwrite Include/Python-ast.h, and I've added full support for
constants to the AST code (I left Num_kind and Str_kind untouched right now,
but I plan to remove them in the next release, since Const_kind is able to
hold any kind of constant object).

Now you shouldn't have problems compiling it.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] wpython is back

2009-11-27 Thread Cesare Di Mauro

2009/11/27 Christian Heimes li...@cheimes.de

 Cesare Di Mauro wrote:
 
  You'll find some at page 28
  here
 http://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf
 
  ..
 
  Mart made more interesting
  oneshttp://www.mail-archive.com/python-dev@python.org/msg43282.html
 with
  Unladen benchmarks.

 The PDF document sounded interesting and I was tempted to test WPython.
 Unfortunately it doesn't compile on my box:

  [...]


That's because Include/Python-ast.h file is autogenerated from the ASDL
grammar file the first time that you try to compile wpython.

You need to replace it with the one bundled with wpython.

It's a known problem that I'll address ASAP.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] wpython is back

2009-11-26 Thread Cesare Di Mauro

Hi Mart

I'm back with some news about wpython. I completed all the work that I was
committed to do till the end of the year. I made a lot of changes to the
code, that I'll report here.

First, I added several conditional compilation sections that enable or
disable almost every optimization I introduced into the project. Everything
is controlled by a new include file, wpython.h, which holds a lot of
#DEFINEs for each one of them.
Every #DEFINE has a brief explanation, and some report an example with
Python code disassembled, showing what happens.
It can be useful both to document the code (also to access to the interested
parts), and to let people test the effect of all optimizations. There are
also a couple of #DEFINEs which are useful to enable or disable all
superinstructions, or to make wpython work like CPython (with all new
optimizations and superinstructions disabled).

Full tracing support required a big effort, due to the missing
SETUP_LOOP/POP_BLOCK instructions used in FOR_ITER blocks. It was a pain in
the neck to let them work, but I think I have found a good solution fot it.
If I remember correctly, Collin asked in the past about performance with
testing enabled. I believe that speed is comparable to CPython, since I can
trace FOR_ITER blocks enter/exit with very little time spent intercepting
them; stack unrolling (for forward jumps cases) is fast too.

Restoring Python object model required much of the work. I reverted all the
changes that I made to many PyObjects, and just added some accessory code
only to a few of them. There are no more hacks, and code is quite polite;
only CodeObject required one line of code change in the hash function, to
let it calculate hash correctly for the constants tuple (because it can hold
lists and dictionaries now, which usally aren't hashable).
Every file in Include/ and Objects/ that I modified has only 1 diff (except
frameobject.c, for tracing code), so it's easy so see what is changed and
the extra helper functions that I added to introduce lists and dictionaries
in the consts tuple.

In the meanwhile I've added a little optimization for lists and dictionaries
used in for loops. Writing this:

def f():
for x in ['a', 'b', 'c']: print x

generates the following (word)code with the previous wpython:

LOAD_CONST (['a', 'b', 'c'])
DEEP_LIST_COPY
GET_ITER
FOR_ITER

because ['a', 'b', 'c'] is a mutable object, and a copy must be made before
using it.

Now it'll be:

LOAD_CONST (['a', 'b', 'c'])
GET_ITER
FOR_ITER

So code is reduced and memory consumption too, because there's no need clone
the list. The trick works only for lists and dictionaries that holds
non-mutable objects, but I found it's a common pattern in Python code.

I've also updated the source to the latest Python 2.x version, 2.6.4.

All tests pass, both with Debug and Release code, on Visual Studio Express
with 32 bit code (I can't compile 64 bits versions with it).

There are only a few open issues.

test_syntax.py required some changes in the doctest (adding full filesystem
path) to let them pass correctly. It's really strange, but... works now!

test_compile.py has 2 tests disabled in test_compile_ast:

#['forblock', for n in [1, 2, 3]:\n print n\n],
#[fname, fcontents],

that's because there's no support for constants (except Num_kind and
Str_kind) in the current ASTs code. However code compiles well, except that
it cannot make use of the new constant folding code.

I haven't updated Doc/library/dis.rst, which is exactly the same of CPython.
I'll do it when I stop introducing or changing opcodes.

Right now wpython requires manual patching of Include/Python-ast.h, with the
following lines:

enum _expr_kind {BoolOp_kind=1, BinOp_kind=2, UnaryOp_kind=3, Lambda_kind=4,
[...]
List_kind=18, Tuple_kind=19, Const_kind=20};

enum _expr_const {no_const=0, mutable_const=1, content_const=3,
pure_const=7};

struct _expr {

enum _expr_kind kind;
union {
[...]

struct {
object c;
enum _expr_const constant;
} Const;

} v;
int lineno;
int col_offset;
};

They are need to let ast.c handle constants for the new constant folding
code.
I greatly appreciate any help to let it be generated automatically with ASDL
grammar.


That's all about the new code. Now the weird and stupid part. A few days I
got a new gmail account, but accidentally I removed the google account that
I've used to create the wpython at Google Code. I definitely lost project
ownership, so I can't tag the old code and put the new one in trunk.
I'll thank very much if someone that works or has contacts with Google can
ask for moving ownership from my old account (cesare at pronto do it) to my
new (the one which I've using now to write this mail), so I'll commit ASAP.
Alternatively, I need to create a new project at Google Code.

I hope that the community will appreciate the work (when I'll upload it :-).
I know that it's a young project, but I think it's mature enough to take a
look

Re: [Python-Dev] wpython is back

2009-11-26 Thread Cesare Di Mauro

2009/11/27 Guido van Rossum gu...@python.org

 It's a Python implementation that uses wordcode instead of bytecode.

 http://code.google.com/p/wpython/

 I don't see any benchmarks though.


You'll find some at page 28
herehttp://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf
.

Mart made more interesting
oneshttp://www.mail-archive.com/python-dev@python.org/msg43282.htmlwith
Unladen benchmarks.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A wordcode-based Python

2009-11-04 Thread Cesare Di Mauro

Hi Mart

I had some problems and little time to dedicate to wpython in the last
period, but I restarted again with it in the last month.

Currently I'm working on changing and documenting the code so that almost
every optimization can be selected. So you'll compile it enabling only the
ones you are interested in.

I've also investigated about some ideas which Antoine told me on grouping
together FASTs and CONSTs in order to reduce bytecodes, but I've found that
the suggested solution brings some problems with the current function call
implementation that can hurt performance on some situations (mostly with
recursive ones, because usually they need to create new frames, and
constants references must be copied and INCREFed).
Since it will require huge changes to the current code base, I don't know if
it's worth the effort just to verify the idea. I'll think about it when the
project will be finalized.

My plan is to finish the current work in a few days, and then remove the
(may be ugly) hacks that I made to the Python object model that were needed
to let tuples, lists and dictionaries be loaded as CONSTs.
May be a the end of the month it'll be fixed (and the diffs against CPython
will be reduced a lot, since a few files results changed).

Next, I need to changed the trace code (in frameobject.c) to let the
test_trace.py pass (at this time two tests are disabled because the VM
crashes).

Finally, I think to update the code base to 2.6.4.

I think to release everything at the end of the year, but if someone is
interested I can do a partial release at the end of November.

Regarding your tests, they are very interesting, particularly for regex_v8
that showed an unexpected result for me. I'll investigate about it after
I'll release wpython.

I you have any questions, I'm at your disposal (thanks for your tests!)

Cesare

2009/11/4 Mart Sõmermaa mrts.py...@gmail.com

 On Tue, May 12, 2009 at 8:54 AM, Cesare Di Mauro
 cesare.dima...@a-tono.com wrote:
  Also, I checked out wpython at head to run Unladen Swallow's
  benchmarks against it, but it refuses to compile with either gcc 4.0.1
  or 4.3.1 on Linux (fails in Python/ast.c). I can send you the build
  failures off-list, if you're interested.
 
  Thanks,
  Collin Winter
 
  I'm very interested, thanks. That's because I worked only on Windows
  machines, so I definitely need to test and fix it to let it run on any
 other
  platform.
 
  Cesare

 Re-animating an old discussion -- Cesare, any news on the wpython front?

 I did a checkout from http://wpython.googlecode.com/svn/trunk and
 was able to ./configure and make successfully on my 64-bit Linux box
 as well as to run the Unladen benchmarks.

 Given svn co http://svn.python.org/projects/python/tags/r261 in py261
 and svn co http://wpython.googlecode.com/svn/trunk in wpy,

 $ python unladen-tests/perf.py -rm --benchmarks=-2to3,all py261/python
 wpy/python

 gives the following results:

 Report on Linux foo 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16
 14:05:01 UTC 2009 x86_64
 Total CPU cores: 2

 ai:
 Min: 0.640516 - 0.586532: 9.20% faster
 Avg: 0.677346 - 0.632785: 7.04% faster
 Significant (t=4.336740, a=0.95)
 Stddev: 0.05839 - 0.08455: 30.94% larger

 Mem max: 7412.000 - 6768.000: 9.52% smaller
 Usage over time: http://tinyurl.com/ykwhmcc


 call_simple:
 Min: 1.880816 - 1.701622: 10.53% faster
 Avg: 1.944320 - 1.778701: 9.31% faster
 Significant (t=14.323045, a=0.95)
 Stddev: 0.09885 - 0.06000: 64.74% smaller

 Mem max: 8100.000 - 6636.000: 22.06% smaller
 Usage over time: http://tinyurl.com/yzsswgp


 django:
 Min: 1.287158 - 1.315700: 2.17% slower
 Avg: 1.330423 - 1.366978: 2.67% slower
 Significant (t=-4.475769, a=0.95)
 Stddev: 0.05663 - 0.05885: 3.78% larger

 Mem max: 15508.000 - 16228.000: 4.44% larger
 Usage over time: http://tinyurl.com/yfpbmjn


 iterative_count:
 Min: 0.211620 - 0.124646: 69.78% faster
 Avg: 0.222778 - 0.159868: 39.35% faster
 Significant (t=9.291635, a=0.95)
 Stddev: 0.04239 - 0.05279: 19.69% larger

 Mem max: 7388.000 - 6680.000: 10.60% smaller
 Usage over time: http://tinyurl.com/yj7s8h4


 normal_startup:
 Min: 1.060017 - 0.991366: 6.92% faster
 Avg: 1.189612 - 1.170067: 1.67% faster
 Significant (t=2.002086, a=0.95)
 Stddev: 0.06942 - 0.06864: 1.13% smaller

 Mem max: 3252.000 - 4648.000: 30.03% larger
 Usage over time: http://tinyurl.com/ygo3bwt


 pickle:
 Min: 2.027566 - 1.948784: 4.04% faster
 Avg: 2.051633 - 2.043656: 0.39% faster
 Not significant
 Stddev: 0.03095 - 0.07348: 57.88% larger

 Mem max: 8544.000 - 7340.000: 16.40% smaller
 Usage over time: http://tinyurl.com/ykg9dn2


 pickle_dict:
 Min: 1.658693 - 1.656844: 0.11% faster
 Avg: 1.689483 - 1.698176: 0.51% slower
 Not significant
 Stddev: 0.16945 - 0.09403: 80.20% smaller

 Mem max: 6716.000 - 7636.000: 12.05% larger
 Usage over time: http://tinyurl.com/yjhyame


 pickle_list:
 Min: 0.919083 - 0.894758: 2.72% faster
 Avg: 0.956513 - 0.921314: 3.82% faster
 Significant (t=2.131237, a=0.95)
 Stddev: 0.12744

Re: [Python-Dev] special method lookup: how much do we care?

2009-05-14 Thread Cesare Di Mauro

On Sun, May 10, 2009 11:51PM, Nick Coghlan wrote:
 However lots of developers rely on CPython ref counting as well, no
 matter how many times they're told not to do that if they want to
 support alternative interpreters.

 Cheers,
 Nick.

From socket.py:

# Wrapper around platform socket objects. This implements
# a platform-independent dup() functionality. The
# implementation currently relies on reference counting
# to close the underlying socket object.
class _socketobject(object):


You don't know how much time I've spent trying to understand why
test_httpserver.py hanged indefinitely when I was experimenting with new
opcodes in my VM.

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A wordcode-based Python

2009-05-12 Thread Cesare Di Mauro

On Mon, May 11, 2009 10:27PM, Antoine Pitrou wrote:

Hi Antoine

 Hi,

 WPython is a re-implementation of (some parts of) Python, which drops
 support for bytecode in favour of a wordcode-based model (where a is
 word
 is 16 bits wide).

 This is great!
 Have you planned to port in to the py3k branch? Or, at least, to trunk?

It was my idea too, but first I need to take a deep look at what parts
of code are changed from 2.6 to 3.0.
That's because I don't know how much work is required for this
forward port.

 Some opcode and VM optimizations have gone in after 2.6 was released,
 although
 nothing as invasive as you did.

:-D Interesting.

 About the CISC-y instructions, have you tried merging the fast and const
 arrays
 in frame objects? That way, you need less opcode space (since e.g.
 BINARY_ADD_FAST_FAST will cater with constants as well as local
 variables).

 Regards

 Antoine.

It's an excellent idea, that needs exploration.

Running my stats tools against all .py files found in Lib and Tools
folders, I discovered that the maximum index used for fast/locals
is 79, and 1853 for constants.

So if I find a way to easily map locals first and constants following
in the same array, your great idea can be implemented saving
A LOT of opcodes and reducing ceval.c source code.

I'll work on that after the two releases that I planned.

Thanks for your precious suggestions!

Cesare

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A wordcode-based Python

2009-05-12 Thread Cesare Di Mauro

Hi Collin

On Mon, May 11, 2009 11:14PM, Collin Winter wrote:
 Hi Cesare,

 On Mon, May 11, 2009 at 11:00 AM, Cesare Di Mauro
 cesare.dima...@a-tono.com wrote:
 At the last PyCon3 at Italy I've presented a new Python implementation,
 which you'll find at http://code.google.com/p/wpython/

 Good to see some more attention on Python performance! There's quite a
 bit going on in your changes; do you have an
 optimization-by-optimization breakdown, to give an idea about how much
 performance each optimization gives?

I planned it in the next release that will come may be next week.

I'll introduce some #DEFINEs and #IFs in the code, so that
only specific optimizations will be enabled.

 Looking over the slides, I see that you still need to implement
 functionality to make test_trace pass, for example; do you have a
 notion of how much performance it will cost to implement the rest of
 Python's semantics in these areas?

Very little. That's because there are only two tests on test_trace that
don't pass.

I think that the reason stays in the changes that I made in the loops.
With my code SETUP_LOOP and POP_BREAK are completely
removed, so the code in settrace will failt to recognize the loop and
the virtual machine crashes.

I'll fix it in the second release that I have planned.

 Also, I checked out wpython at head to run Unladen Swallow's
 benchmarks against it, but it refuses to compile with either gcc 4.0.1
 or 4.3.1 on Linux (fails in Python/ast.c). I can send you the build
 failures off-list, if you're interested.

 Thanks,
 Collin Winter

I'm very interested, thanks. That's because I worked only on Windows
machines, so I definitely need to test and fix it to let it run on any other
platform.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A wordcode-based Python

2009-05-12 Thread Cesare Di Mauro

On Thu, May 12, 2009 01:40PM, Antoine Pitrou wrote:

 Hi Cesare,

 Cesare Di Mauro cesare.dimauro at a-tono.com writes:

 It was my idea too, but first I need to take a deep look at what parts
 of code are changed from 2.6 to 3.0.
 That's because I don't know how much work is required for this
 forward port.

 If you have some questions or need some help, send me a message.

 Regards

 Antoine.

OK, thanks. :)

Another note. Fredrik Johansson let me note just few minutes ago that I've
compiled my sources without PGO optimizations enabled.

That's because I used Visual Studio Express Edition.

So another gain in performances can be obtained. :)

cheers
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] A wordcode-based Python

2009-05-12 Thread Cesare Di Mauro

On Tue, May 12, 2009 05:27 PM, Collin Winter wrote:
 On Tue, May 12, 2009 at 4:45 AM, Cesare Di Mauro
 cesare.dima...@a-tono.com wrote:
 Another note. Fredrik Johansson let me note just few minutes ago that
 I've
 compiled my sources without PGO optimizations enabled.

 That's because I used Visual Studio Express Edition.

 So another gain in performances can be obtained. :)

 FWIW, Unladen Swallow experimented with gcc 4.4's FDO and got an
 additional 10-30% (depending on the benchmark). The training load is
 important, though: some training sets offered better performance than
 others. I'd be interested in how MSVC's PGO compares to gcc's FDO in
 terms of overall effectiveness. The results for gcc FDO with our
 2009Q1 release are at the bottom of
 http://code.google.com/p/unladen-swallow/wiki/Releases.

 Collin Winter

Unfortunately I can't test PGO, since I use the Express Editions of VS.
May be Martin or othe mainteners of the Windows versions can help here.

However it'll be difficult to find a good enough profile for the binaries
distributed for the official Python. FDO brings to quite different results
based on the profile selected.

cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] A wordcode-based Python

2009-05-11 Thread Cesare Di Mauro

At the last PyCon3 at Italy I've presented a new Python implementation,
which you'll find at http://code.google.com/p/wpython/

WPython is a re-implementation of (some parts of) Python, which drops
support for bytecode in favour of a wordcode-based model (where a is word
is 16 bits wide).

It also implements an hybrid stack-register virtual machine, and adds a
lot of other optimizations.

The slides are available in the download area, and explain the concept of
wordcode, showing also how work some optimizations, comparing them with
the current Python (2.6.1).

Unfortunately I had not time to make extensive benchmarks with real code,
so I've included some that I made with PyStone, PyBench, and a couple of
simple recoursive function calls (Fibonacci and Factorial).

This is the first release, and another two are scheduled; the first one to
make it possibile to select (almost) any optimization to be compiled (so
fine grained tests will be possibile).

The latter will be a rewrite of the constant folding code (specifically
for tuples, lists and dicts), removing a current hack to the python type
system to make them hashable for the constants dictionary used by
compile.c.

Then I'll start writing some documentation that will explain what parts of
code are related to a specific optimization, so that it'll be easier to
create patches for other Python implementations, if needed.

You'll find a bit more informations in the README FIRST! file present
into the project's repository.

I made so many changes to the source of Python 2.6.1, so feel free to ask
me for any information about them.

Cheers
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-07 Thread Cesare Di Mauro

On Apr 07, 2009 at 02:10AM, Steven D'Aprano st...@pearwood.info wrote:

 On the other hand, I'm with Guido when he wrote it is certainly not
 right to choose speed over correctness. This is especially a problem
 for floating point optimizations, and I urge Cesare to be conservative
 in any f.p. optimizations he introduces, including constant folding.

The principle that I followed on doing constant folding was: do what Python
will do without constant folding enabled.

So if Python will generate

LOAD_CONST  1
LOAD_CONST  2
BINARY_ADD

the constant folding code will simply replace them with a single

LOAD_CONST  3

When working with such kind of optimizations, the temptation is to
apply them at any situation possible. For example, in other languages
this

a = b * 2 * 3

will be replaced by

a = b * 6

In Python I can't do that, because b can be an object which overloaded
the * operator, so it *must* be called two times, one for 2 and one for 3.

That's the way I choose to implement constant folding.

The only difference at this time is regards invalid operations, which will
raise exceptions at compile time, not at running time.

So if you write:

a = 1 / 0

an exception will be raised at compile time.

I decided to let the exception be raised immediately, because I think that
it's better to detect an error at compile time than at execution time.

However, this can leed to incompatibilities with existing code, so in the
final implementation I will add a flag to struct compiling (in ast.c) so that
this behaviour can be controlled programmatically (enabling or not the
exception raising).

I already introduced a flag in struct compiling to control the constant
folding, that can be completely disabled, if desired.

 So... +1 on the general principle of constant folding, -0.5 on any such
 optimizations which change the semantics of a f.p. operation. The only
 reason it's -0.5 rather than -1 is that (presumably) anyone who cares
 about floating point correctness already knows to never trust the
 compiler.

As Raymond stated, there's no loss in precision working with constant
folding code on float datas. That's because there will be a rounding and
a store of computed values each time that a result is calculated.

Other languages will use FPU registers to hold results as long as
possibile, keeping full 80 bit precision (16 bit exponent + 64 bit
mantissa).
That's not the Python case.

Cesare

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-07 Thread Cesare Di Mauro

In data 07 aprile 2009 alle ore 17:19:25, s...@pobox.com ha scritto:


 Cesare The only difference at this time is regards invalid operations,
 Cesare which will raise exceptions at compile time, not at running
 Cesare time.

 Cesare So if you write:

 Cesare a = 1 / 0

 Cesare an exception will be raised at compile time.

 I think I have to call *bt* here.  This is a common technique used
 during debugging.  Insert a 1/0 to force an exception (possibly causing the
 running program to drop into pdb).  I think you have to leave that in.

 Skip

Many tests rely on this, and I have changed them from something like:

try:
   1 / 0
except:
  

to

try:
  a = 1; a / 0
except:
  

But I know that it's a major source of incompatibilities, and in the final
code I'll enabled it only if user demanded it (through a flag).

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-07 Thread Cesare Di Mauro

On Tue, Apr 7, 2009 06:25PM, Guido van Rossum wrote:
 Well I'm sorry Cesare but this is unacceptable. As Skip points out
 there is plenty of code that relies on this.

Guido, as I already said, in the final code the normal Python behaviour
will be kept, and the stricter one will be enabled solely due to a flag
set by the user.

 Also, consider what
 problem you are trying to solve here. What is the benefit to the
 user of moving this error to compile time? I cannot see any.

 --Guido

In my experience it's better to discover a bug at compile time rather than
at running time.

Cesare

 On Tue, Apr 7, 2009 at 8:19 AM, Cesare Di Mauro
 cesare.dima...@a-tono.com wrote:
 In data 07 aprile 2009 alle ore 17:19:25, s...@pobox.com ha scritto:


     Cesare The only difference at this time is regards invalid
 operations,
     Cesare which will raise exceptions at compile time, not at running
     Cesare time.

     Cesare So if you write:

     Cesare a = 1 / 0

     Cesare an exception will be raised at compile time.

 I think I have to call *bt* here.  This is a common technique used
 during debugging.  Insert a 1/0 to force an exception (possibly causing
 the
 running program to drop into pdb).  I think you have to leave that in.

 Skip

 Many tests rely on this, and I have changed them from something like:

 try:
   1 / 0
 except:
  

 to

 try:
  a = 1; a / 0
 except:
  

 But I know that it's a major source of incompatibilities, and in the
 final
 code I'll enabled it only if user demanded it (through a flag).

 Cesare
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/guido%40python.org




 --
 --Guido van Rossum (home page: http://www.python.org/~guido/)


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-07 Thread Cesare Di Mauro

On Tue, Apr 7, 2009 07:22PM, Guido van Rossum wrote:
 In my experience it's better to discover a bug at compile time rather
 than
 at running time.

 That's my point though, which you seem to be ignoring: if the user
 explicitly writes 1/0 it is not likely to be a bug. That's very
 different than 1/x where x happens to take on zero at runtime --
 *that* is likely  bug, but a constant folder can't detect that (at
 least not for Python).

 --
 --Guido van Rossum (home page: http://www.python.org/~guido/)

I agree. My only concern was about user mistyping that can leed to an
error interceptable by a stricter constant folder.

But I admit that it's a rarer case compared to an explicit exception
raising such the one you showed.

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-06 Thread Cesare Di Mauro

On Mar 29, 2009 at 05:36PM, Guido van Rossum gu...@python.org wrote:

 - Issue #5593: code like 1e16+2. is optimized away and its result stored 
 as
 a constant (again), but the result can vary slightly depending on the 
 internal
 FPU precision.

 I would just not bother constant folding involving FP, or only if the
 values involved have an exact representation in IEEE binary FP format.

The Language Reference says nothing about the effects of code optimizations.
I think it's a very good thing, because we can do some work here with constant
folding.

If someone wants to preserve precision with floats, it can always use a 
temporary
variable, like in many other languages.

 These problems have probably been there for a long time and almost no one 
 seems
 to complain, but I thought I'd report them here just in case.

 I would expect that constant folding isn't nearly effective in Python
 as in other (less dynamic) languages because it doesn't do anything
 for NAMED constants. E.g.

 MINUTE = 60

 def half_hour():
 return MINUTE*30

 This should be folded to return 1800 but doesn't because the
 compiler doesn't know that MINUTE is a constant.

I completely agree. We can't say nothing about MINUTE at the time half_hour
will be executed. The code here must never been changed.

 Has anyone ever profiled the effectiveness of constant folding on
 real-world code? The only kind of constant folding that I expect to be
 making a diference is things like unary operators, since e.g. x = -2
 is technically an expression involving a unary minus.

At this time with Python 2.6.1 we have these results:
def f(): return 1 + 2 * 3 + 4j
dis(f)

  1   0 LOAD_CONST   1 (1)
  3 LOAD_CONST   5 (6)
  6 BINARY_ADD
  7 LOAD_CONST   4 (4j)
 10 BINARY_ADD
 11 RETURN_VALUE

def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)
dis(f)

  1   0 LOAD_CONST   1 ('a')
  3 LOAD_CONST   7 (('b', 'c'))
  6 BUILD_LIST   2
  9 LOAD_CONST   4 (1)
 12 LOAD_CONST   8 (6)
 15 BINARY_ADD
 16 BINARY_MULTIPLY
 17 RETURN_VALUE

With proper constant folding code, both functions can be reduced
to a single LOAD_CONST and a RETURN_VALUE (or, definitely, by
a single instruction at all with an advanced peephole optimizer).

I'll show you it at PyCon in Florence, next month.

 ISTM that historically, almost every time we attempted some new form
 of constant folding, we introduced a bug.

I found a very rich test battery with Python, which helped me a lot in my
work of changing the ast, compiler, peephole, and VM.
If they aren't enough, we can expand them to add more test cases.

But, again, the Language Reference says nothing about optimizations.

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-06 Thread Cesare Di Mauro

On Lun, Apr 6, 2009 16:43, Antoine Pitrou wrote:
 Cesare Di Mauro cesare.dimauro at a-tono.com writes:
 def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)
 [...]

 With proper constant folding code, both functions can be reduced
 to a single LOAD_CONST and a RETURN_VALUE (or, definitely, by
 a single instruction at all with an advanced peephole optimizer).

 Lists are mutable, you can't optimize the creation of list literals by
 storing
 them as singleton constants.

 Regards

 Antoine.

You are right, I've mistyped the example.

def f(): return ('a', ('b', 'c')) * (1 + 2 * 3)

generates a single instruction (depending on the threshold used to limit
folding of sequences), whereas

def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)

needs three.

Sorry for the mistake.

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] pyc files, constant folding and borderline portability issues

2009-04-06 Thread Cesare Di Mauro

On Mon, Apr 6, 2009 18:57, s...@pobox.com wrote:

 Cesare At this time with Python 2.6.1 we have these results:
 Cesare def f(): return 1 + 2 * 3 + 4j
 ...
 Cesare def f(): return ['a', ('b', 'c')] * (1 + 2 * 3)

 Guido can certainly correct me if I'm wrong, but I believe the main point
 of
 his message was that you aren't going to encounter a lot of code in Python
 which is amenable to traditional constant folding.  For the most part,
 they
 will be assigned to symbolic constants, which, unlike C preprocessor
 macros aren't really constants at all.  Consequently, the opportunity for
 constant folding is minimal and probably introduces more opportunities for
 bugs than performance improvements.

 Skip

I can understand Guido's concern, but you worked as well on constant
folding, and you know that there's space for optimizations here.

peephole.c have some code for unary, binary, and tuple/list folding; they
worked fine. Why mantaining unuseful and dangerous code, otherwise?

I know that bugs can come out doing such optimizations, but Python have a
good tests battery that can help find them. Obviously tests can't give us
100% insurance that everything works as expected, but they are very good
starting point.

Bugs can happen at every change on the code base, but code base changes...

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] test_io fails on test_1686475

2009-03-02 Thread Cesare Di Mauro

On Mar, 02 2009 at 00:13AM, Amaury Forgeot d'Arc amaur...@gmail.com wrote:

 Hello,

 On Sun, Mar 1, 2009 at 23:04, Cesare Di Mauro cesare.dima...@a-tono.com 
 wrote:
 Running the test suite with Python 2.6.1 32 bit (compiled in DEBUG mode
 with Visual Studio Express Edition 2008) on Vista x64, I've got an assert
 error:

 test_1686475 (__main__.StatAttributeTests) ... Assertion failed:
 (__int64)(int)((in / 1000) - secs_between_epochs) == ((in / 1000)
 - secs_between_epochs), file ..\Modules\posixmodule.c, line 790

 I have no idea about this failure. Any hint?

 The failing assertion comes from this code in posixmodule.c:

   /* XXX Win32 supports time stamps past 2038; we currently don't */
   *time_out = Py_SAFE_DOWNCAST((in / 1000) - secs_between_epochs,
 __int64, int);

 the test (btw, it's in test_os.py) is trying
 os.stat(rc:\pagefile.sys)

 Can you please check the three time stamps of this file (creation,
 update, access)?


You are right. The last modified timestamp had 2099 as year value (the maximum
that I can set on Windows), because of some tests with dates which I made at the
time.

However, they are correct timestamps for Windows files, so I think that at least
the API on posixmodule.c should not fail when working with them. I don't know if
there's a way to handle them correctly.

Also may be that test_os.py need to be changed in some way, because the
pagefile can be put in any partition, and there are Windows installations which
lack it, because the virtual memory can be disabled too.

Cheers
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] test_io fails on test_1686475

2009-03-01 Thread Cesare Di Mauro

Running the test suite with Python 2.6.1 32 bit (compiled in DEBUG mode
with Visual Studio Express Edition 2008) on Vista x64, I've got an assert
error:

test_1686475 (__main__.StatAttributeTests) ... Assertion failed:
(__int64)(int)((in / 1000) - secs_between_epochs) == ((in / 1000)
- secs_between_epochs), file ..\Modules\posixmodule.c, line 790

I have no idea about this failure. Any hint?

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Challenge: Please break this! [Now with blog post]

2009-02-24 Thread Cesare Di Mauro

On Feb, 24 2009 at 12:11PM, Antoine Pitrou solip...@pitrou.net wrote:

 tav tav at espians.com writes:

 I've fixed this hole in safelite.py, but would be interested to know
 if there are other non-user-initiated dynamically imported modules?

 You'd better make __builtins__ read-only, it will plug a whole class of 
 attacks
 like this.

I found very useful adding objects to the builtins namespace, but I'll prefer a
standard and controlled way to do so. Something like a built-in function
install, like the following which I use:

import __builtin__, types

_ValidBuiltinTypes = (types.BuiltinFunctionType, types.ClassType,
  types.FunctionType, types.GeneratorType,
  types.TypeType, functools.partial)

def install(*Args, **Keys):
  '''Installs the given parameters in the builtins namespace.
  From Args will be installed only valid types (classes, functions and types),
  taking their __name__ attribute.
  Every keyword-value cuple from Keys will be installed as is.'''

  _NameSpace = __builtin__.__dict__

  for Arg in Args:
if isinstance(Arg, _ValidBuiltinTypes):
  _NameSpace[Arg.__name__] = Arg

  for Key, Value in Keys.iteritems():
_NameSpace[Key] = Value


With a built-in install function a granular control can be implemented by
the running Python implementation.

Also, having builtins read only by default can be used in future compiler
and virtual machine implementations to gain interesting optimizations.

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Python-acceleration instructions on ARM

2009-02-11 Thread Cesare Di Mauro

On Feb, 11 2009 at 04:11:AM, Benjamin M. Schwartz bmsch...@fas.harvard.edu 
wrote:

 Brett Cannon wrote:
 On Tue, Feb 10, 2009 at 18:45, Benjamin Schwartz
 bmsch...@fas.harvard.eduwrote:

 ...
 According to ARM [4]:

 Jazelle RCT can be used to significantly reduce the code bloat
 associated
 with AOT and JIT compilation, making AOT technology viable on mass-market
 devices. It can also be used to support execution environments beyond Java,
 such as Microsoft .NET Compact Framework, Python and others.

 Jazelle RCT provides an excellent target for any run-time compilation
 technology, including JIT and AOT for .NET MSIL, Python and Perl as well as
 Java. ARM is working with leading software providers to enable solutions
 ready for market with Jazelle RCT.
 ...
 Question:
 ARM is specifically claiming that these instructions can be used to
 accelerate Python interpretation.

 Wow, really? One of the links below mention that?

 Yes.  The quotes above from [4], as well as the white paper [6].  No
 specific data, just these broad claims.

 What would the process be to incorporate the use of ThumbEE instructions
 into CPython?


 Well, this all depends on how you try to integrate the instructions. If you
 hide it behind the macro or in a clean way that does not penalize skipping
 the instructions then you write a patch. But if this can't be done it would
 be better to maintain an external set of patches against trunk for this.

 Interesting.  Sugar Labs will probably not attempt this if we would have
 to maintain a patched interpreter forever.  However, I hope it will be
 possible to integrate into CPython in a manner that does not uglify the
 code or affect other architectures.

 Anyone else interested in ARM?  ThumbEE support would benefit anyone
 running Python on recent ARM chips.  Maybe we need to create a working
 group/project team/whatever.

 [4] http://www.arm.com/products/multimedia/java/jazelle_architecture.html
 [6] http://www.arm.com/pdfs/JazelleRCTWhitePaper_final1-0_.pdf

It's not useful for CPython, since it's based on a loop which evaluates a 
bytecode
at the time.

You have to rewrite the virtual machine implementing a JIT compiler that
generates Thumb-EE instructions. But it's a big effort, since ceval.c works in a
completely different manner.

I don't know if a form of JIT will be implemented in future CPython
implementations, but if a step in this direction will be made, writing a 
back-end
that uses Thumb-EE will be much easier.

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Expression optimizations

2009-02-10 Thread Cesare Di Mauro

In peephole.c I noticed some expression optimizations:

/* not a is b --  a is not b
   not a in b --  a not in b
   not a is not b --  a is b
   not a not in b --  a in b
*/

So, it seems that an operation can be changed to another one which is logically 
equivalent.

Could it be applyable to other operations as well? So, if I wrote:

  c = not(a  b)

the compiler and/or peephole optimizer can generate bytecodes instructions 
which, instead, execute the following operation:

  c = a = b

Is it right?

Thanks a lot
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Expression optimizations

2009-02-10 Thread Cesare Di Mauro

On Mar, Feb 10, 2009 at 05:38 PM, Daniel Stutzbach wrote:
 On Tue, Feb 10, 2009 at 10:24 AM, Cesare Di Mauro
 cesare.dima...@a-tono.com
 wrote:

 Could it be applyable to other operations as well? So, if I wrote:
  c = not(a  b)
 the compiler and/or peephole optimizer can generate bytecodes
 instructions
 which, instead, execute the following operation:
  c = a = b


 Those two expressions are equivalent for integers, but not necessarily
 equivalent for objects that define their own comparison operator.

OK, so I can make assumptions only for built-in types.

Thank you

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Expression optimizations

2009-02-10 Thread Cesare Di Mauro

On Mar, Feb 10, 2009 06:24 PM, Daniel Stutzbach wrote:
 On Tue, Feb 10, 2009 at 11:16 AM, Steve Holden st...@holdenweb.com
 wrote:

 That's true, but the same *could* be said about the existing
 optimizations for objects that define their own __contains__.


 No, because there isn't a __not_contains__, so you cannot define the
 inverse
 operation differently.  not a in b and a not in b have exactly the
 same
 effects.


Interesting. So at least for is and in operators it is possible to
play with the not operator.

Thanks
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Expression optimizations

2009-02-10 Thread Cesare Di Mauro

On Mar, Feb 10, 2009 08:15 PM, Raymond Hettinger wrote:

 - Original Message -
 From: Cesare Di Mauro cesare.dima...@a-tono.com
 To: Python-Dev python-dev@python.org
 Sent: Tuesday, February 10, 2009 8:24 AM
 Subject: [Python-Dev] Expression optimizations

 In peephole.c I noticed some expression optimizations:

 /* not a is b --  a is not b
not a in b --  a not in b
not a is not b --  a is b
not a not in b --  a in b
 */

 So, it seems that an operation can be changed to another one which is
 logically equivalent.

 Could it be applyable to other operations as well? So, if I wrote:

  c = not(a  b)

 the compiler and/or peephole optimizer can generate bytecodes
 instructions which, instead, execute the following operation:

  c = a = b

 Is it right?

 We've only done conservative transformations that do not change which
 magic methods get called.  The is / isnot transformations are
 invisible to the programmer and always semantically neutral.

OK, and the same apply to the in operator, if I have understood
correctly the other messages.

 Your
 proposed transformation is changes which methods get called and
 makes assumptions that the usual relationships between comparison
 operators holds (but it might not given rich comparisons, for
 example, sets use the comparison operators for subset/superset tests).

Raymond, I'm not proposing any changes to the language.

I'm playing with the virtual machine and I have some ideas about possibile
optimizations that could be applyed. But I need to verify them, so
understanding what is possible and what is not, is a primary goal for me.
;)

Thanks for you patience
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Expression optimizations

2009-02-10 Thread Cesare Di Mauro

On Mar, Feb 10, 2009 09:42PM, Daniel Stutzbach wrote:
 On Tue, Feb 10, 2009 at 2:36 PM, Cesare Di Mauro
 cesare.dima...@a-tono.comwrote:

 OK, so I can make assumptions only for built-in types.


 Yes, but even there you have to be careful of odd corner-cases, such as:

 nan = float('nan')
 nan  nan
 False
 nan = nan
 False

Ah, I missed it. OK, and the same apply for decimals, I suppose.

Thanks
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Expression optimizations

2009-02-10 Thread Cesare Di Mauro

It's bizarre enough, since I have got a different result (with Python
2.6.1, 32 bit):

 x = 1e6
 y = x/x
 x
inf
 y
nan
 cmp(y, y)
0
 cmp(x/x, x/x)
1

:D

Cesare

On Mar, Feb 10, 2009 10:02PM, Dino Viehland wrote:
 And slightly unrelated, but just showing how bizarre floats are:

 x = 1e6
 y = x/x
 cmp(y, y)
 0
 cmp(x/x, x/x)
 -1

 Yeah object identity checks!

 From: python-dev-bounces+dinov=microsoft@python.org
 [mailto:python-dev-bounces+dinov=microsoft@python.org] On Behalf Of
 Daniel Stutzbach
 Sent: Tuesday, February 10, 2009 12:43 PM
 To: cesare.dima...@a-tono.com
 Cc: Python-Dev
 Subject: Re: [Python-Dev] Expression optimizations

 On Tue, Feb 10, 2009 at 2:36 PM, Cesare Di Mauro
 cesare.dima...@a-tono.com wrote:
 OK, so I can make assumptions only for built-in types.

 Yes, but even there you have to be careful of odd corner-cases, such as:

 nan = float('nan')
 nan  nan
 False
 nan = nan
 False
 --
 Daniel Stutzbach, Ph.D.
 President, Stutzbach Enterprises, LLC


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Expression optimizations

2009-02-10 Thread Cesare Di Mauro

On Mar, Feb 10, 2009 10:20PM, Raymond Hettinger wrote:
 [Cesare Di Mauro]
 I'm playing with the virtual machine and I have some ideas about
 possibile
 optimizations that could be applyed. But I need to verify them, so
 understanding what is possible and what is not, is a primary goal for
 me.

 The best way to understand what is possible is to disassemble bytecode
 and the look at *exactly* how those are executed by ceval.c.  That makes
 it possible to identify which transformations are semantically neutral.

I've already done it, but ceval.c isn't enough. It makes use of external
functions to do some works, like PyObject_RichCompare, for example. So
I've asked some information here.

 FWIW, I think the path of peephole optimizing been mostly exhausted.

I think so too, but there's room for a few optimizations (introducing some
new opcodes).

 A much more fertile field of search is to examine what can be done with
 the AST.  Sections of the tree may provide more context so that a broader
 range of simplifications and transformations are possible.

I completely agree.

 This should discussion probably be taken off python-dev and moved to
 to comp.lang.python until the exercise has grown beyond playing with
 the virtual machine.

 Raymond

I've already rewritten ceval.c, opcode.h and some other files of Python
2.6.1 to implement my ideas. I think it's a little bit beyond playing
with the VM, and I hope to present my work to the next PyCon at Florence,
in Italy, if my paper will be accepted.

Now I'm trying to understand how compiler.c works, to fit in my changes (I
took a look at peephole.c, and it'll be easier, fortunately, but I'll do
it later).

It's not easy for me, since I'm alone, I'm working to a code which is not
mine, and a bit complicated too.

But if you think that this mailing list is not the correct place to ask
for, I'll move to comp.lang.python, as you have suggested.

Thanks,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Why a STACKADJ(-1) in UNARY_NOT on ceval.c?

2009-02-05 Thread Cesare Di Mauro

Looking at the UNARY_NOT case in ceval.c:

case UNARY_NOT:
v = TOP();
err = PyObject_IsTrue(v);
Py_DECREF(v);
if (err == 0) {
Py_INCREF(Py_True);
SET_TOP(Py_True);
continue;
}
else if (err  0) {
Py_INCREF(Py_False);
SET_TOP(Py_False);
err = 0;
continue;
}
STACKADJ(-1);
break;


I don't understand why there's a STACKADJ(-1) at its end. Looking at the
code, we know that if the CPU arrives to the STACKADJ, it's because of an
error condition in the PyObject_IsTrue that sets err to a  0 value, so
exiting the big switch statement, an error will be raised.

So the question is, why there's the need to skip the top stack PyObject?
It's a different behaviour comparing it to the all other unary operators.
For example:

case UNARY_NEGATIVE:
v = TOP();
x = PyNumber_Negative(v);
Py_DECREF(v);
SET_TOP(x);
if (x != NULL) continue;
break;

There's no STACKADJ instruction on errors.

Can someone explain it?

Thanks a lot

Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] progress: compiling python2.5 under msys (specifically but not exclusively under wine) with msvcr80

2009-01-21 Thread Cesare Di Mauro

Have you made some benchmarks like pystone?

Cheers,
Cesare

On Wed, Jan 21, 2009 08:50PM, Luke Kenneth Casson Leighton wrote:
 this is a progress report on compiling python using entirely free
 software tools, no proprietary compilers or operating systems
 involved, yet still linking and successfully running with msvcr80
 assemblies.  manifests and rc files, which are compiled to internal
 resources, have been added.
 various sections which are uniquely identifed by _MSC_VER = 1400 etc
 have had to be enabled with corresponding MSVCRT_VERSION = 0x0800 -
 in particular, signal handling (PyOS_getsig()).

 currently, under wine with msvcr80, there looks like there is a bug
 with a common theme related to threads, but here's a short list:
 test_array.py is blocking, test_bz2.py is hanging and test_cmd_line.py
 causes a segfault; test_ctypes is _still_ a bundle of fun. for those
 people who use native win32 platforms who are compiling up this code,
 you should have better luck.

 significantly, the wine developers have been absolutely fantastic, and
 have fixed several bugs in wine, sometimes within hours, that were
 found as a result of running the extremely comprehensive python
 regression tests.

 the python regression tests are a credit to the collaborative
 incremental improvement process of free software development.

 i look forward to seeing the same incremental improvement applied to
 the development of python, evidence of which would be clearly seen by
 the acceptance of one of the following patches, one of which is dated
 2003:
 http://bugs.python.org/issue3754
 http://bugs.python.org/issue841454
 http://bugs.python.org/issue3871
 http://bugs.python.org/issue4954
 http://bugs.python.org/issue5010

 for those people wishing to track and contribute to the development of
 python for win32 using entirely free software tools, either under wine
 or native windows, there is a git repository, here, slightly
 illogically named pythonwine because that's where i started from
 (cross-compiling python under wine, so i could get at the wine
 registry from python).  obviously, since then, things have... moved on
 :)

 http://github.com/lkcl/pythonwine/tree/python_2.5.2_wine

 l.
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/cesare.dimauro%40a-tono.com



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Add python.exe to PATH environment variable

2008-09-03 Thread Cesare Di Mauro

On 03 sep 2008 at 00:50:13, M.-A. Lemburg [EMAIL PROTECTED] wrote:

 There already is a menu entry that starts the Python interpreter
 on Windows, so why not use that ?

Because i need to start Python from folders which have
files that define a specific environment.

I have several servers and applications that develop and test this way.

 Also .py files are automatically associated with the last installed
 Python interpreter, so the double-clicking on .py files works and is
 probably the most common way of starting a Python file on Windows.

99% of time I run Python from a command prompt (on specific
directories).

I use the default menu entry only when I have to play with Python to test some
pieces of code.

 Adding paths to the PATH variable is not easy on Windows, esp. if
 you want to support multiple Windows versions. The global PATH
 settings are not read from autoexec.bat anymore (only once at boot
 time). Instead those environment variables are managed via the
 registry.

 See e.g.

 http://agiletesting.blogspot.com/2005/06/handling-path-windows-registry-value.html

 for how to setup PATH to your liking using Python.

 The problem is: how to undo those changes without accidentally
 undoing an explicit change made by the user ?

 BTW: Adding the Python dir to the PATH per default would cause
 problems for users who regularly have multiple different
 Python installations on a machine. If this is done, it should
 be an install option and not forced.

Let the user to decide to update or not the PATH envar by marking a
chechbox in the setup process, displaying that doing that the
changes will NOT be reverted when uninstalling it.

Cheers,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

1 2 >

1 - 100 of 113 matches

Mail list logo