Re: [pypy-dev] NumPyPy vs NumPy

2016-08-05 Thread Papa, Florin
> We usually hang out on IRC, you can find me there most evenings European time.

> The zip file is not a very iterative-freindly format for improving the 
> benchmarks, how can 
> I contribute to your work?
> - There should be some kind of shell script that downloads and installs the 
> packages from 
> a known source so anyone else can reproduce
> - You should add np.__config__.show() to the scripts so the output reflects 
> the external 
> libraries used
> - Examine other suites to find how they display the basic 
> python/computer/environment 
> variables in use when you run the bencmarks

> - how many cores are in use? How much memory?

> - You should check the result, for instance AFAICT the dsums benchmark does 
> not run to 
> completion on numpypy, bincount is not implemented Matti

Thank you for your feedback.

I started the process of making the benchmarks open source, so that we can 
easily collaborate. Until then, I will make the modifications you suggested.

Regards,
Florin

___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-08-01 Thread Armin Rigo
Hi,

On 1 August 2016 at 10:24, Maciej Fijalkowski  wrote:
>> Does this mean that the main direction is to support NumPy (through 
>> improving cpyext)
>> instead of maintaining NumPyPy? Is NumPy (with cpyext) fully supported in 
>> PyPy, or are there
>> any known compatibility issues?
>
> The main progress is to merge the two - we want to support NumPy (via
> cpyext) and we want things that are fast in numpypy (array access
> predominantly) to be used via numpypy

Yes, and your benchmarks reinforce the impression that
numpy-via-cpyext is faster in a lot of cases.  Moreover it is more
compatible with CPython's numpy, because supporting it fully is "only"
a matter of us improving the general cpyext compatibility layer.  Some
benchmarks like "extractint" show cases where numpy-via-cpyext suffers
from high levels of crossing the cpyext boundary.  As fijal says we
want to ultimately add some things from numpypy into numpy-via-cpyext,
maybe by patching or special-casing some methods like
ndarray.__getitem__ after the module is imported.

By the way, it would make a cool project for someone new to the pypy
code base (<= still trying to recruit help in making numpy, although
it turned out to be very difficult in the past).


A bientôt,

Armin.
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-08-01 Thread Maciej Fijalkowski
On Mon, Aug 1, 2016 at 10:02 AM, Papa, Florin  wrote:
> Hi Armin,
>
>>The table also shows that PyPy NumPyPy is really slower, even with 
>>vectorization enabled.
>>It seems that the current focus of our work, on continuing to improve cpyext 
>>instead of
>>numpypy, is a good idea.
>
> Does this mean that the main direction is to support NumPy (through improving 
> cpyext)
> instead of maintaining NumPyPy? Is NumPy (with cpyext) fully supported in 
> PyPy, or are there
> any known compatibility issues?
>
> Regards,
> Florin

Hi Florin

The main progress is to merge the two - we want to support NumPy (via
cpyext) and we want things that are fast in numpypy (array access
predominantly) to be used via numpypy
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-08-01 Thread Papa, Florin
Hi Armin,

>The table also shows that PyPy NumPyPy is really slower, even with 
>vectorization enabled.
>It seems that the current focus of our work, on continuing to improve cpyext 
>instead of
>numpypy, is a good idea.

Does this mean that the main direction is to support NumPy (through improving 
cpyext)
instead of maintaining NumPyPy? Is NumPy (with cpyext) fully supported in PyPy, 
or are there
any known compatibility issues?

Regards,
Florin
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-07-29 Thread Armin Rigo
Hi,

On 27 July 2016 at 10:35, Papa, Florin  wrote:
> I am sorry, I mistakenly switched the header of the table, the middle column 
> is actually the result for PyPy NumPyPy.

The resulting table makes sense to me: it shows that PyPy NumPy (with
cpyext) is, in most case, running at the same speed as CPython NumPy;
and the rare exceptions can be guessed to be because these benchmarks
happen to invoke a much larger number of CPython C API calls than all
the others.

The table also shows that PyPy NumPyPy is really slower, even with
vectorization enabled.  It seems that the current focus of our work,
on continuing to improve cpyext instead of numpypy, is a good idea.


A bientôt,

Armin.
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-07-28 Thread Matti Picus

On 28/07/2016 8:05 AM, Papa, Florin wrote:

Hi Matti,

Thank you for your reply and for indicating additional numpy benchmarks.

...

We can continue this discussion any place you consider suitable (if the mailing 
list is not the place for this).



Regards,
Florin
We usually hang out on IRC, you can find me there most evenings European 
time.


The zip file is not a very iterative-freindly format for improving the 
benchmarks, how can I contribute to your work?
- There should be some kind of shell script that downloads and installs 
the packages from a known source so anyone else can reproduce
- You should add np.__config__.show() to the scripts so the output 
reflects the external libraries used
- Examine other suites to find how they display the basic 
python/computer/environment variables in use when you run the bencmarks


- how many cores are in use? How much memory?

- You should check the result, for instance AFAICT the dsums benchmark 
does not run to completion on numpypy, bincount is not implemented

Matti

___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-07-27 Thread Matti Picus

On 27/07/2016 3:35 AM, Papa, Florin wrote:


I am sorry, I mistakenly switched the header of the table, the middle column is 
actually the result for PyPy NumPyPy. The correct table is this:

Benchmark   CPython NumPy   PyPy NumPyPy PyPy NumPy
cauchy  1   5.838852812 4.866947551
pointbypoint1   4.922654347 0.981008211
numrand 1   2.478997019 1.082185897
rowmean 1   2.512893263 1.062233015
dsums   1   33.58240465 1.013388981
vectsum 1   1.738446611 0.771660704
cauchy  1   2.168377906 0.887388291
polarcoords 1   1.030962402 0.500905427
vectsort1   2.214586698 0.973727924
arange  1   2.045342386 0.69941044
vectoradd   1   5.447667037 1.513217941
extractint  1   1.655717606 2.671712185
float2int   1   3.1688  0.905406988
insertzeros 1   2.375043445 1.037504453

The results were gathered without vectorization, I will provide the results 
with vectorization as soon as I have them.

Sorry again for the mistake.

Regards,
Florin
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev
Thanks for taking the time to test this. You asked in the first message 
"Is there an official benchmark suite for NumPy or a more relevant 
workload to compare against CPython? What is NumPyPy's maturity / 
adoption rate from your knowledge?"


There is no official numpy benchmark, since there is really no "typical" 
numpy workload. Numpy is used as a common container for data processing, 
and each field has its own cases that interest it, for instance a 
workload done by CAFFE for neural network processing is much different 
that one done by OpenCV for image processing, which is different that 
the natural language processing done in NLTK, even though for the most 
part all three of these use numpy. There are a few numpy benchmarks 
available;


https://github.com/serge-sans-paille/numpy-benchmarks (needs to be 
adapted to pypy's slow warmup time)

http://yarikoptic.github.io/numpy-vbench  (also AFAICT never run on PyPy)
https://bitbucket.org/mikefc/numpy-benchmark.git

I would expect numpypy to shine in cases where there is heavy use of 
python together with numpy. Your benchmarks are at the other extreme; 
they demonstrate that our reimplementation of the numpy looping ufuncs 
is slower than C, but do not test the python-numpy interaction nor how 
well the JIT can optimize python code using numpy. For your tests 
Richard's suggestion of turning on vectorization may show a large 
improvement, as it brings numpypy's optimizations closer to the ones 
done by a good C compiler. But even so, it is impressive that without 
vectorization we are only 2-4 times slower than the heavily vectorized c 
implementation, and that the cpyext emulation layer seems not to matter 
that much in your benchmarks.


In general, timeit does a bad job for pypy benchmarks since it does not 
allow for warmup time and is geared to measure a minimum. Your data 
demonstrates some of the pitfalls of benchmarking - note that you show 
two very different results for your "cauchy" benchmark. You may want to 
check out the perf module http://perf.readthedocs.io for a more 
sophisticated way of running benchmarks or read 
https://arxiv.org/abs/1602.00602, which summarizes the problems 
benchmarking.


In order to continue this discussion, could you create a repository with 
these benchmarks and a set of instructions how to reproduce them? You do 
not say what platform you use, what machine you ran the tests on, 
whether you used MKL/BLAS, what versions of pypy and cpython you used, 
... Once we have a conveniently reproducible way to have this 
conversation we may be able to make progress toward reaching some 
operative conclusions, but I'm not sure a mailing list is the best place 
these days.


Matti
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-07-27 Thread Papa, Florin
I am sorry, I mistakenly switched the header of the table, the middle column is 
actually the result for PyPy NumPyPy. The correct table is this:

Benchmark   CPython NumPy   PyPy NumPyPy PyPy NumPy
cauchy  1   5.838852812 4.866947551
pointbypoint1   4.922654347 0.981008211
numrand 1   2.478997019 1.082185897
rowmean 1   2.512893263 1.062233015
dsums   1   33.58240465 1.013388981
vectsum 1   1.738446611 0.771660704
cauchy  1   2.168377906 0.887388291
polarcoords 1   1.030962402 0.500905427
vectsort1   2.214586698 0.973727924
arange  1   2.045342386 0.69941044
vectoradd   1   5.447667037 1.513217941
extractint  1   1.655717606 2.671712185
float2int   1   3.1688  0.905406988
insertzeros 1   2.375043445 1.037504453

The results were gathered without vectorization, I will provide the results 
with vectorization as soon as I have them. 

Sorry again for the mistake.

Regards,
Florin
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-07-27 Thread Yury V. Zaytsev

Hi Florin,

On Wed, 27 Jul 2016, Papa, Florin wrote:

The table contains run time values, normalized to the CPython Numpy 
results. This means that a value of 1 is equal to the CPython NumPy 
result, less than 1 means faster than CPython NumPy and more than 1 is 
slower than CPython NumPy.


Thank you for the explanation!

I think this supports my assessment though, as I can't see how your 
conclusion can be justified on the basis of this table:


"NumPyPy performance seems to be significantly slower compared to CPython 
NumPy or even PyPy NumPy"


In fact, NumPyPy performance seems to be significantly *faster* compared 
to CPython NumPy and, in any case, PyPy NumPy (with the exception of a few 
benchmarks, such as "cauchy", which should be investigated).


I'd also be very curious as to whether you've tried the vectorizer 
already, or these results were obtained without it.


--
Sincerely yours,
Yury V. Zaytsev
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-07-27 Thread Papa, Florin
Hi Yury,

The table contains run time values, normalized to the CPython Numpy results. 
This means that a value of 1 is equal to the CPython NumPy result, less than 1 
means faster than CPython NumPy and more than 1 is slower than CPython NumPy.

Let's consider the following line in the table:
Benchmark   CPython NumPy   PyPy NumPy  PyPy NumPyPy
cauchy  1   5.838852812 4.866947551

Here, PyPy NumPy is 5.83 times slower than CPython NumPy and PyPy NumPyPy is 
4.86 times slower than CPython NumPy.

Hope this makes the results table more clear. 

Regards,
Florin
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-07-27 Thread Richard Plangger
Hi,

> Is there an official benchmark suite for NumPy or a more relevant workload to 
> compare against CPython? What is NumPyPy's maturity / adoption rate from your 
> knowledge?

I do not think there is. I have been looking for something similar for
over a year. It seems though people tend to make their own benchmarks
for their own jit compiler, stressing their optimization. (Well, I did
too for my thesis)

Having that said, it would be beneficial task to sit down and extract
such a benchmark set not targeting a special jit/aot compiler, but
rather thinking about real world application workloads.

> I have been working with NumPyPy to evaluate its performance and it
seems significantly slower compared to CPython NumPy or even PyPy NumPy
(installed with pip).

I agree with Yury, there are 2-3 benchmarks for NumPyPy where it
performs worse than cpython. All others are not significant.

Have you tried turning on the beta verion of the vectorizer in NumPyPy?
(command is $ pypy --jit vec=1 program.py args)

Cheers,
Richard



signature.asc
Description: OpenPGP digital signature
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] NumPyPy vs NumPy

2016-07-27 Thread Yury V. Zaytsev

Hi,

On Wed, 27 Jul 2016, Papa, Florin wrote:

I have been working with NumPyPy to evaluate its performance and it 
seems significantly slower compared to CPython NumPy or even PyPy NumPy 
(installed with pip).


After having a brief look at the your table, I'm very confused by this 
assessment:


To me, it seems that PyPy NumPyPy is equal or significantly faster than 
CPython NumPy on most benchmarks, but substantially slower on just a few 
of them.


PyPy NumPy is slower than CPython NumPy on all benchmarks, with some being 
not that bad, and some pretty bad, but this is absolutely to be expected, 
and in fact nevertheless very impressive, considering that it runs via 
CPyExt...


Am I completely misinterpreting your numbers?!

--
Sincerely yours,
Yury V. Zaytsev
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev