Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-04 Thread Sebastian Berg
On Fr, 2016-11-04 at 15:42 -0400, Matthew Harrigan wrote:
> I didn't notice identity before.  Seems like frompyfunc always sets
> it to None.  If it were zero maybe it would work as desired here.
> 
> In the writing your own ufunc doc, I was wondering if the pointer to
> data could be used to get a constant at runtime.  If not, what could
> that be used for?
> static void double_logit(char **args, npy_intp *dimensions,
> npy_intp* steps, void* data)
> Why would the numerical accuracy be any different?  The subtraction
> and square operations look identical and I thought np.sum just calls
> np.add.reduce, so the reduction step uses the same code and would
> therefore have the same accuracy.
> 

Sorry, did not read it carefully, I guess `c` is the mean, so you are
doing the two pass method.

- Sebastian


> Thanks
> 
> On Fri, Nov 4, 2016 at 1:56 PM, Sebastian Berg  s.net> wrote:
> > On Fr, 2016-11-04 at 13:11 -0400, Matthew Harrigan wrote:
> > > I was reading this and got thinking about if a ufunc could
> > compute
> > > the sum of squared differences in a single pass without a
> > temporary
> > > array.  The python code below demonstrates a possible approach.
> > >
> > > import numpy as np
> > > x = np.arange(10)
> > > c = 1.0
> > > def add_square_diff(x1, x2):
> > >     return x1 + (x2-c)**2
> > > ufunc = np.frompyfunc(add_square_diff, 2, 1)
> > > print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
> > > print(np.sum(np.square(x-c)))
> > >
> > > I have (at least) 4 questions:
> > > 1. Is it possible to pass run time constants to a ufunc written
> > in C
> > > for use in its inner loop, and if so how?
> > 
> > I don't think its anticipated, since a ufunc could in most cases
> > use a
> > third argument, but a 3 arg ufunc can't be reduced. Not sure if
> > there
> > might be some trickery possible.
> > 
> > > 2. Is it possible to pass an initial value to reduce to avoid the
> > > clean up required for the first element?
> > 
> > This is the identity normally. But the identity can only be 0, 1 or
> > -1
> > right now I think. The identity is what the output array gets
> > initialized with (which effectively makes it the first value passed
> > into the inner loop).
> > 
> > > 3. Does that ufunc work, or are there special cases which cause
> > it to
> > > fall apart?
> > > 4. Would a very specialized ufunc such as this be considered for
> > > incorporating in numpy since it would help reduce time and memory
> > of
> > > functions already in numpy?
> > >
> > 
> > Might be mixing up things, however, IIRC the single pass approach
> > has a
> > bad numerical accuracy, so that I doubt that it is a good default
> > algorithm.
> > 
> > - Sebastian
> > 
> > 
> > > Thank you,
> > > Matt
> > > ___
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@scipy.org
> > > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> > 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Chris Barker
On Fri, Nov 4, 2016 at 10:36 AM, Nathaniel Smith  wrote:

> On Nov 4, 2016 10:32 AM, "Stephan Hoyer"  wrote:
> > fromiter dynamically resizes a NumPy array, like a Python list, except
> with a growth factor of 1.5
>


> Oh, right, and the dtype argument is mandatory, which is what makes this
> possible.
>
Couldn't it determine the dtype from the first element, and then barf later
if an incompatible one shows up?

And then we could adapt this code to np.array() and get nice performance
with no extra functions to think about calling...

And off the top of my head, I can't think of why it couldn't be generalized
to the nd case as well.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-04 Thread Matthew Harrigan
I didn't notice identity before.  Seems like frompyfunc always sets it to
None.  If it were zero maybe it would work as desired here.

In the writing your own ufunc doc, I was wondering if the pointer to data
could be used to get a constant at runtime.  If not, what could that be
used for?

static void double_logit(char **args, npy_intp *dimensions,
npy_intp* steps, void* data)

Why would the numerical accuracy be any different?  The subtraction and
square operations look identical and I thought np.sum just calls
np.add.reduce, so the reduction step uses the same code and would therefore
have the same accuracy.

Thanks

On Fri, Nov 4, 2016 at 1:56 PM, Sebastian Berg 
wrote:

> On Fr, 2016-11-04 at 13:11 -0400, Matthew Harrigan wrote:
> > I was reading this and got thinking about if a ufunc could compute
> > the sum of squared differences in a single pass without a temporary
> > array.  The python code below demonstrates a possible approach.
> >
> > import numpy as np
> > x = np.arange(10)
> > c = 1.0
> > def add_square_diff(x1, x2):
> > return x1 + (x2-c)**2
> > ufunc = np.frompyfunc(add_square_diff, 2, 1)
> > print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
> > print(np.sum(np.square(x-c)))
> >
> > I have (at least) 4 questions:
> > 1. Is it possible to pass run time constants to a ufunc written in C
> > for use in its inner loop, and if so how?
>
> I don't think its anticipated, since a ufunc could in most cases use a
> third argument, but a 3 arg ufunc can't be reduced. Not sure if there
> might be some trickery possible.
>
> > 2. Is it possible to pass an initial value to reduce to avoid the
> > clean up required for the first element?
>
> This is the identity normally. But the identity can only be 0, 1 or -1
> right now I think. The identity is what the output array gets
> initialized with (which effectively makes it the first value passed
> into the inner loop).
>
> > 3. Does that ufunc work, or are there special cases which cause it to
> > fall apart?
> > 4. Would a very specialized ufunc such as this be considered for
> > incorporating in numpy since it would help reduce time and memory of
> > functions already in numpy?
> >
>
> Might be mixing up things, however, IIRC the single pass approach has a
> bad numerical accuracy, so that I doubt that it is a good default
> algorithm.
>
> - Sebastian
>
>
> > Thank you,
> > Matt
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > https://mail.scipy.org/mailman/listinfo/numpy-discussion
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ufunc for sum of squared difference

2016-11-04 Thread Sebastian Berg
On Fr, 2016-11-04 at 13:11 -0400, Matthew Harrigan wrote:
> I was reading this and got thinking about if a ufunc could compute
> the sum of squared differences in a single pass without a temporary
> array.  The python code below demonstrates a possible approach.
> 
> import numpy as np
> x = np.arange(10)
> c = 1.0
> def add_square_diff(x1, x2):
>     return x1 + (x2-c)**2
> ufunc = np.frompyfunc(add_square_diff, 2, 1)
> print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
> print(np.sum(np.square(x-c)))
> 
> I have (at least) 4 questions:
> 1. Is it possible to pass run time constants to a ufunc written in C
> for use in its inner loop, and if so how?

I don't think its anticipated, since a ufunc could in most cases use a
third argument, but a 3 arg ufunc can't be reduced. Not sure if there
might be some trickery possible.

> 2. Is it possible to pass an initial value to reduce to avoid the
> clean up required for the first element?

This is the identity normally. But the identity can only be 0, 1 or -1
right now I think. The identity is what the output array gets
initialized with (which effectively makes it the first value passed
into the inner loop).

> 3. Does that ufunc work, or are there special cases which cause it to
> fall apart?
> 4. Would a very specialized ufunc such as this be considered for
> incorporating in numpy since it would help reduce time and memory of
> functions already in numpy?
> 

Might be mixing up things, however, IIRC the single pass approach has a
bad numerical accuracy, so that I doubt that it is a good default
algorithm.

- Sebastian


> Thank you,
> Matt
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Nathaniel Smith
On Nov 4, 2016 10:32 AM, "Stephan Hoyer"  wrote:
>
> On Fri, Nov 4, 2016 at 10:24 AM, Nathaniel Smith  wrote:
>>
>> Are you sure fromiter doesn't make an intermediate list or equivalent?
It has to collect all the values before it can know the shape or dtype of
the array to put them in.
>
> fromiter dynamically resizes a NumPy array, like a Python list, except
with a growth factor of 1.5 (rather than 1.25):
>
https://github.com/numpy/numpy/blob/bb59409abf5237c155a1dc4c4d5b31e4acf32fbe/numpy/core/src/multiarray/ctors.c#L3721


Oh, right, and the dtype argument is mandatory, which is what makes this
possible.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Stephan Hoyer
On Fri, Nov 4, 2016 at 10:24 AM, Nathaniel Smith  wrote:

> Are you sure fromiter doesn't make an intermediate list or equivalent? It
> has to collect all the values before it can know the shape or dtype of the
> array to put them in.
>
fromiter dynamically resizes a NumPy array, like a Python list, except with
a growth factor of 1.5 (rather than 1.25):
https://github.com/numpy/numpy/blob/bb59409abf5237c155a1dc4c4d5b31e4acf32fbe/numpy/core/src/multiarray/ctors.c#L3721
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Nathaniel Smith
Are you sure fromiter doesn't make an intermediate list or equivalent? It
has to collect all the values before it can know the shape or dtype of the
array to put them in.

On Nov 4, 2016 5:26 AM, "Francesc Alted"  wrote:



2016-11-04 13:06 GMT+01:00 Neal Becker :

> I find I often write:
> np.array ([some list comprehension])
>
> mainly because list comprehensions are just so sweet.
>
> But I imagine this isn't particularly efficient.
>

Right.  Using a generator and np.fromiter() will avoid the creation of the
intermediate list.  Something like:

np.fromiter((i for i in range(x)))  # use xrange for Python 2


>
> I wonder if numpy has a "better" way, and if not, maybe it would be a nice
> addition?
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ANN: pandas v0.19.1 released!

2016-11-04 Thread Joris Van den Bossche
Hi all,

I'm pleased to announce the release of pandas 0.19.1.
This is a bug-fix release from 0.19.0 and includes some small regression
fixes, bug fixes and performance improvements. We recommend that all users
upgrade to this version.

See the v0.19.1 Whatsnew page
 for an
overview of all bugs that have been fixed in 0.19.1.

Thanks to all contributors!

Joris

---

*How to get it:*

Source tarballs and windows/mac/linux wheels are available on PyPI (thanks
to Christoph Gohlke for the windows wheels, and to Matthew Brett for
setting up the mac/linux wheels).
Conda packages are already available via the conda-forge channel (conda
install pandas -c conda-forge). It will be available on the main channel
shortly.

*Issues:*

Please report any issues on our issue tracker: https://github.com/pydata/
pandas/issues

*Thanks to all the contributors of the 0.19.1 release:*

   - Adam Chainz
   - Anthonios Partheniou
   - Arash Rouhani
   - Ben Kandel
   - Brandon M. Burroughs
   - Chris
   - chris-b1
   - Chris Warth
   - David Krych
   - dubourg
   - gfyoung
   - Iván Vallés Pérez
   - Jeff Reback
   - Joe Jevnik
   - Jon M. Mease
   - Joris Van den Bossche
   - Josh Owen
   - Keshav Ramaswamy
   - Larry Ren
   - mattrijk
   - Michael Felt
   - paul-mannino
   - Piotr Chromiec
   - Robert Bradshaw
   - Sinhrks
   - Thiago Serafim
   - Tom Bird
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] ufunc for sum of squared difference

2016-11-04 Thread Matthew Harrigan
I was reading this
 and got
thinking about if a ufunc could compute the sum of squared differences in a
single pass without a temporary array.  The python code below demonstrates
a possible approach.

import numpy as np
x = np.arange(10)
c = 1.0
def add_square_diff(x1, x2):
return x1 + (x2-c)**2
ufunc = np.frompyfunc(add_square_diff, 2, 1)
print(ufunc.reduce(x) - x[0] + (x[0]-c)**2)
print(np.sum(np.square(x-c)))

I have (at least) 4 questions:
1. Is it possible to pass run time constants to a ufunc written in C for
use in its inner loop, and if so how?
2. Is it possible to pass an initial value to reduce to avoid the clean up
required for the first element?
3. Does that ufunc work, or are there special cases which cause it to fall
apart?
4. Would a very specialized ufunc such as this be considered for
incorporating in numpy since it would help reduce time and memory of
functions already in numpy?

Thank you,
Matt
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Robert Kern
On Fri, Nov 4, 2016 at 6:36 AM, Neal Becker  wrote:
>
> Francesc Alted wrote:
>
> > 2016-11-04 13:06 GMT+01:00 Neal Becker :
> >
> >> I find I often write:
> >> np.array ([some list comprehension])
> >>
> >> mainly because list comprehensions are just so sweet.
> >>
> >> But I imagine this isn't particularly efficient.
> >>
> >
> > Right.  Using a generator and np.fromiter() will avoid the creation of
the
> > intermediate list.  Something like:
> >
> > np.fromiter((i for i in range(x)))  # use xrange for Python 2
> >
> >
> Does this generalize to >1 dimensions?

No.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Neal Becker
Francesc Alted wrote:

> 2016-11-04 14:36 GMT+01:00 Neal Becker :
> 
>> Francesc Alted wrote:
>>
>> > 2016-11-04 13:06 GMT+01:00 Neal Becker :
>> >
>> >> I find I often write:
>> >> np.array ([some list comprehension])
>> >>
>> >> mainly because list comprehensions are just so sweet.
>> >>
>> >> But I imagine this isn't particularly efficient.
>> >>
>> >
>> > Right.  Using a generator and np.fromiter() will avoid the creation of
>> the
>> > intermediate list.  Something like:
>> >
>> > np.fromiter((i for i in range(x)))  # use xrange for Python 2
>> >
>> >
>> Does this generalize to >1 dimensions?
>>
> 
> A reshape() is not enough?  What do you want to do exactly?
> 

I was thinking about:
x = np.array ([[L1] L2]) where L1,L2 take the form of a list comprehension,
as a means to create a 2-D array (in this example)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Ryan May
On Fri, Nov 4, 2016 at 9:04 AM, Stephan Hoyer  wrote:

> On Fri, Nov 4, 2016 at 7:12 AM, Francesc Alted  wrote:
>
>> Does this generalize to >1 dimensions?
>>>
>>
>> A reshape() is not enough?  What do you want to do exactly?
>>
>
> np.fromiter takes scalar input and only builds a 1D array. So it actually
> can't combine multiple values at once unless they are flattened out in
> Python. It could be nice to add support for non-scalar inputs, stacking
> them similarly to np.array. Likewise, it could be nice to add an axis
> argument, so it can work similarly to np.stack.
>

 itertools.product, itertools.permutation, etc. with np.fromiter (and
reshape) is probably also useful here, though it doesn't solve the
non-scalar problem.

Ryan

-- 
Ryan May
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Daπid
On 4 November 2016 at 16:04, Stephan Hoyer  wrote:
>
> But, we also don't have an unstack function. This would mostly be syntactic
> sugar, but I think it would be a nice addition. Such a function actually
> exists in TensorFlow:
> https://g3doc.corp.google.com/third_party/tensorflow/g3doc/api_docs/python/array_ops.md?cl=head#unstack

That link is behind a login wall. This is the public version:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/array_ops.md
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Stephan Hoyer
On Fri, Nov 4, 2016 at 7:12 AM, Francesc Alted  wrote:

> Does this generalize to >1 dimensions?
>>
>
> A reshape() is not enough?  What do you want to do exactly?
>

np.fromiter takes scalar input and only builds a 1D array. So it actually
can't combine multiple values at once unless they are flattened out in
Python. It could be nice to add support for non-scalar inputs, stacking
them similarly to np.array. Likewise, it could be nice to add an axis
argument, so it can work similarly to np.stack.

More generally, you might want to iterate and rebuild over arbitrary
dimension(s) of an array. Something like
np.stack([x for x in np.unstack(y, axis)], axis)

But, we also don't have an unstack function. This would mostly be syntactic
sugar, but I think it would be a nice addition. Such a function actually
exists in TensorFlow:
https://g3doc.corp.google.com/third_party/tensorflow/g3doc/api_docs/python/array_ops.md?cl=head#unstack
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Francesc Alted
2016-11-04 14:36 GMT+01:00 Neal Becker :

> Francesc Alted wrote:
>
> > 2016-11-04 13:06 GMT+01:00 Neal Becker :
> >
> >> I find I often write:
> >> np.array ([some list comprehension])
> >>
> >> mainly because list comprehensions are just so sweet.
> >>
> >> But I imagine this isn't particularly efficient.
> >>
> >
> > Right.  Using a generator and np.fromiter() will avoid the creation of
> the
> > intermediate list.  Something like:
> >
> > np.fromiter((i for i in range(x)))  # use xrange for Python 2
> >
> >
> Does this generalize to >1 dimensions?
>

A reshape() is not enough?  What do you want to do exactly?


>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Neal Becker
Francesc Alted wrote:

> 2016-11-04 13:06 GMT+01:00 Neal Becker :
> 
>> I find I often write:
>> np.array ([some list comprehension])
>>
>> mainly because list comprehensions are just so sweet.
>>
>> But I imagine this isn't particularly efficient.
>>
> 
> Right.  Using a generator and np.fromiter() will avoid the creation of the
> intermediate list.  Something like:
> 
> np.fromiter((i for i in range(x)))  # use xrange for Python 2
> 
> 
Does this generalize to >1 dimensions?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] array comprehension

2016-11-04 Thread Francesc Alted
2016-11-04 13:06 GMT+01:00 Neal Becker :

> I find I often write:
> np.array ([some list comprehension])
>
> mainly because list comprehensions are just so sweet.
>
> But I imagine this isn't particularly efficient.
>

Right.  Using a generator and np.fromiter() will avoid the creation of the
intermediate list.  Something like:

np.fromiter((i for i in range(x)))  # use xrange for Python 2


>
> I wonder if numpy has a "better" way, and if not, maybe it would be a nice
> addition?
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] array comprehension

2016-11-04 Thread Neal Becker
I find I often write:
np.array ([some list comprehension])

mainly because list comprehensions are just so sweet.

But I imagine this isn't particularly efficient.

I wonder if numpy has a "better" way, and if not, maybe it would be a nice 
addition?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Branching NumPy 1.12.x

2016-11-04 Thread Ralf Gommers
On Fri, Nov 4, 2016 at 6:04 AM, Charles R Harris 
wrote:

> Hi All,
>
> I'm thinking that it is time to branch NumPy 1.12.x. I haven't got
> everything in it that I would have liked, in particular __numpy_ufunc__,
> but I think there is plenty of material and not branching is holding up
> some of the more risky stuff.  My current thoughts on __numpy_ufunc__ is
> that it would be best to work it out over the 1.13.0 release cycle,
> starting with enabling it again right after the branch. Julian's work on
> avoiding temporary copies and Pauli's overlap handling PR are two other
> changes I've been putting off but don't want to delay further. There are
> some other smaller things that I had scheduled for 1.12.0, but would like
> to spend more time looking at. If there are some things that you think just
> have to be in 1.12.0, please mention them, but I'd rather aim at getting
> 1.13.0 out in a timely manner.
>
> Thoughts?
>

That's a really good plan I think.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion