Re: [Numpy-discussion] Rebinning numpy array

2011-11-14 Thread Sturla Molden
Fit a poisson distribution (radioactive decay is a Poisson process),
recompute lambda for whatever bin-size you need, and compute
the new (estimated) bin counts by maximum likehood. It basically
becomes a contrained optimization problem.

Sturla


Den 13.11.2011 17:04, skrev Johannes Bauer:
 Hi group,

 I have a rather simple problem, or so it would seem. However I cannot
 seem to find the right solution. Here's the problem:

 A Geiger counter measures counts in distinct time intervals. The time
 intervals are not of constant length. Imaging for example that the
 counter would always create a table entry when the counts reach 10. Then
 we would have the following bins (made-up data for illustration):

 Seconds   Counts  Len CPS
 0 - 4410  44  0.23
 44 - 120  10  76  0.13
 120 - 140 10  20  0.5
 140 - 200 10  60  0.16

 So we have n bins (in this example 4), but they're not equidistant. I
 want to rebin samples to make them equidistant. For example, I would
 like to rebin into 5 bins of 40 seconds time each. Then the rebinned
 example (I calculate by hand so this might contain errors):

 0-40  9.09
 40-80 5.65
 80-1205.26
 120-160   13.33
 160-200   6.66

 That means, if a destination bin completely overlaps a source bin, its
 complete value is taken. If it overlaps partially, linear interpolation
 of bin sizes should be used.

 It is very important that the overall count amount stays the same (in
 this case 40, so my numbers seem to be correct, I checked that). In this
 example I increased the bin size, but usually I will want to decrease
 bin size (even dramatically).

 Now my pathetic attempts look something like this:

 interpolation_points = 4000
 xpts = [ time.mktime(x.timetuple()) for x in self.getx() ]

 interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points)
 interpolatedy = numpy.interp(interpolatedx, xpts, self.gety())

 self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in
 interpolatedx ]
 self._yreformatted = interpolatedy

 This works somewhat, however I see artifacts depending on the
 destination sample size: for example when I have a spike in the sample
 input and reduce the number of interpolation points (i.e. increase
 destination bin size) slowly, the spike will get smaller and smaller
 (expected behaviour). After some amount of increasing, the spike however
 will magically reappear. I believe this to be an interpolation artifact.

 Is there some standard way to get from a non-uniformally distributed bin
 distribution to a unifomally distributed bin distribution of arbitrary
 bin width?

 Best regards,
 Joe
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rebinning numpy array

2011-11-14 Thread Chris.Barker
On 11/13/11 9:55 AM, Olivier Delalleau wrote:
 idea, since it will throw out a lot of information if you decrease the
 number of bins:

I agree -- I'd think about looking at a smooth interpolation -- maybe 
kernel density estimation?

On 11/14/11 8:12 AM, Sturla Molden wrote:
 Fit a poisson distribution (radioactive decay is a Poisson process),

even better -- if you have a physical process that fits a given function 
form -- us it!

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rebinning numpy array

2011-11-13 Thread Robert Kern
On Sun, Nov 13, 2011 at 16:04, Johannes Bauer dfnsonfsdu...@gmx.de wrote:
 Hi group,

 I have a rather simple problem, or so it would seem. However I cannot
 seem to find the right solution. Here's the problem:

 A Geiger counter measures counts in distinct time intervals. The time
 intervals are not of constant length. Imaging for example that the
 counter would always create a table entry when the counts reach 10. Then
 we would have the following bins (made-up data for illustration):

 Seconds         Counts  Len     CPS
 0 - 44          10      44      0.23
 44 - 120        10      76      0.13
 120 - 140       10      20      0.5
 140 - 200       10      60      0.16

 So we have n bins (in this example 4), but they're not equidistant. I
 want to rebin samples to make them equidistant. For example, I would
 like to rebin into 5 bins of 40 seconds time each. Then the rebinned
 example (I calculate by hand so this might contain errors):

 0-40            9.09
 40-80           5.65
 80-120          5.26
 120-160         13.33
 160-200         6.66

 That means, if a destination bin completely overlaps a source bin, its
 complete value is taken. If it overlaps partially, linear interpolation
 of bin sizes should be used.

What you want to do is set up a linear interpolation based on the
boundaries of the uneven bins.

Seconds  Value
00
44   10
120  20
140  30
200  40


Then evaluate that linear interpolation on the boundaries of the uniform bins.

[~]
|18 bin_bounds = np.array([0.0, 44.0, 120, 140, 200])

[~]
|19 bin_values = np.array([0.0, 10, 10, 10, 10])

[~]
|20 cum_bin_values = bin_values.cumsum()

[~]
|21 new_bounds = np.array([0.0, 40, 80, 120, 160, 200])

[~]
|22 ecdf = np.interp(new_bounds, bin_bounds, cum_bin_values)

[~]
|23 ecdf
array([  0.,   9.09090909,  14.73684211,  20.,
33.,  40.])

[~]
|24 uniform_histogram = np.diff(ecdf)

[~]
|25 uniform_histogram
array([  9.09090909,   5.64593301,   5.26315789,  13.,   6.6667])


This may be what you are doing already. I'm not sure what is in your
getx() and gety() methods. If so, then I think you are on the right
track. If you still have problems, then we might need to see some of
the problematic data and results.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rebinning numpy array

2011-11-13 Thread Olivier Delalleau
Just one thing: numpy.interp says it doesn't check that the x coordinates
are increasing, so make sure it's the case.

Assuming this is ok, I could still see how you may get some non-smooth
behavior: this may be because your spike can either be split between two
bins (which dilutes it somehow), or be included in a single bin (which
would make it stand out more). And as you increase your bin size, you will
switch between these two situations.

-=- Olivier

2011/11/13 Johannes Bauer dfnsonfsdu...@gmx.de

 Hi group,

 I have a rather simple problem, or so it would seem. However I cannot
 seem to find the right solution. Here's the problem:

 A Geiger counter measures counts in distinct time intervals. The time
 intervals are not of constant length. Imaging for example that the
 counter would always create a table entry when the counts reach 10. Then
 we would have the following bins (made-up data for illustration):

 Seconds Counts  Len CPS
 0 - 44  10  44  0.23
 44 - 12010  76  0.13
 120 - 140   10  20  0.5
 140 - 200   10  60  0.16

 So we have n bins (in this example 4), but they're not equidistant. I
 want to rebin samples to make them equidistant. For example, I would
 like to rebin into 5 bins of 40 seconds time each. Then the rebinned
 example (I calculate by hand so this might contain errors):

 0-409.09
 40-80   5.65
 80-120  5.26
 120-160 13.33
 160-200 6.66

 That means, if a destination bin completely overlaps a source bin, its
 complete value is taken. If it overlaps partially, linear interpolation
 of bin sizes should be used.

 It is very important that the overall count amount stays the same (in
 this case 40, so my numbers seem to be correct, I checked that). In this
 example I increased the bin size, but usually I will want to decrease
 bin size (even dramatically).

 Now my pathetic attempts look something like this:

 interpolation_points = 4000
 xpts = [ time.mktime(x.timetuple()) for x in self.getx() ]

 interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points)
 interpolatedy = numpy.interp(interpolatedx, xpts, self.gety())

 self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in
 interpolatedx ]
 self._yreformatted = interpolatedy

 This works somewhat, however I see artifacts depending on the
 destination sample size: for example when I have a spike in the sample
 input and reduce the number of interpolation points (i.e. increase
 destination bin size) slowly, the spike will get smaller and smaller
 (expected behaviour). After some amount of increasing, the spike however
 will magically reappear. I believe this to be an interpolation artifact.

 Is there some standard way to get from a non-uniformally distributed bin
 distribution to a unifomally distributed bin distribution of arbitrary
 bin width?

 Best regards,
 Joe
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rebinning numpy array

2011-11-13 Thread Olivier Delalleau
Also: it seems like you are using values at the boundaries of the bins,
while I think it would make more sense to compute interpolated values at
the middle point of a bin. I'm not sure it'll make a big difference
visually, but it may be more appropriate.

-=- Olivier

2011/11/13 Olivier Delalleau sh...@keba.be

 Just one thing: numpy.interp says it doesn't check that the x coordinates
 are increasing, so make sure it's the case.

 Assuming this is ok, I could still see how you may get some non-smooth
 behavior: this may be because your spike can either be split between two
 bins (which dilutes it somehow), or be included in a single bin (which
 would make it stand out more). And as you increase your bin size, you will
 switch between these two situations.

 -=- Olivier


 2011/11/13 Johannes Bauer dfnsonfsdu...@gmx.de

 Hi group,

 I have a rather simple problem, or so it would seem. However I cannot
 seem to find the right solution. Here's the problem:

 A Geiger counter measures counts in distinct time intervals. The time
 intervals are not of constant length. Imaging for example that the
 counter would always create a table entry when the counts reach 10. Then
 we would have the following bins (made-up data for illustration):

 Seconds Counts  Len CPS
 0 - 44  10  44  0.23
 44 - 12010  76  0.13
 120 - 140   10  20  0.5
 140 - 200   10  60  0.16

 So we have n bins (in this example 4), but they're not equidistant. I
 want to rebin samples to make them equidistant. For example, I would
 like to rebin into 5 bins of 40 seconds time each. Then the rebinned
 example (I calculate by hand so this might contain errors):

 0-409.09
 40-80   5.65
 80-120  5.26
 120-160 13.33
 160-200 6.66

 That means, if a destination bin completely overlaps a source bin, its
 complete value is taken. If it overlaps partially, linear interpolation
 of bin sizes should be used.

 It is very important that the overall count amount stays the same (in
 this case 40, so my numbers seem to be correct, I checked that). In this
 example I increased the bin size, but usually I will want to decrease
 bin size (even dramatically).

 Now my pathetic attempts look something like this:

 interpolation_points = 4000
 xpts = [ time.mktime(x.timetuple()) for x in self.getx() ]

 interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points)
 interpolatedy = numpy.interp(interpolatedx, xpts, self.gety())

 self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in
 interpolatedx ]
 self._yreformatted = interpolatedy

 This works somewhat, however I see artifacts depending on the
 destination sample size: for example when I have a spike in the sample
 input and reduce the number of interpolation points (i.e. increase
 destination bin size) slowly, the spike will get smaller and smaller
 (expected behaviour). After some amount of increasing, the spike however
 will magically reappear. I believe this to be an interpolation artifact.

 Is there some standard way to get from a non-uniformally distributed bin
 distribution to a unifomally distributed bin distribution of arbitrary
 bin width?

 Best regards,
 Joe
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rebinning numpy array

2011-11-13 Thread Olivier Delalleau
(Sorry for the spam, I should have given more thought to this before
hitting reply).

It actually seems to me that using a linear interpolation is not a good
idea, since it will throw out a lot of information if you decrease the
number of bins: to compute the value at time t, it will only use the
closest bins (t_k and t_{k+1} such that t_k  t  t_{k+1}), so that data
stored in many of the bins will not be used at all.
I haven't looked closely at the suggestion from Robert but it may be a
better way to achieve what you want.

-=- Olivier

2011/11/13 Olivier Delalleau sh...@keba.be

 Also: it seems like you are using values at the boundaries of the bins,
 while I think it would make more sense to compute interpolated values at
 the middle point of a bin. I'm not sure it'll make a big difference
 visually, but it may be more appropriate.

 -=- Olivier


 2011/11/13 Olivier Delalleau sh...@keba.be

 Just one thing: numpy.interp says it doesn't check that the x coordinates
 are increasing, so make sure it's the case.

 Assuming this is ok, I could still see how you may get some non-smooth
 behavior: this may be because your spike can either be split between two
 bins (which dilutes it somehow), or be included in a single bin (which
 would make it stand out more). And as you increase your bin size, you will
 switch between these two situations.

 -=- Olivier


 2011/11/13 Johannes Bauer dfnsonfsdu...@gmx.de

 Hi group,

 I have a rather simple problem, or so it would seem. However I cannot
 seem to find the right solution. Here's the problem:

 A Geiger counter measures counts in distinct time intervals. The time
 intervals are not of constant length. Imaging for example that the
 counter would always create a table entry when the counts reach 10. Then
 we would have the following bins (made-up data for illustration):

 Seconds Counts  Len CPS
 0 - 44  10  44  0.23
 44 - 12010  76  0.13
 120 - 140   10  20  0.5
 140 - 200   10  60  0.16

 So we have n bins (in this example 4), but they're not equidistant. I
 want to rebin samples to make them equidistant. For example, I would
 like to rebin into 5 bins of 40 seconds time each. Then the rebinned
 example (I calculate by hand so this might contain errors):

 0-409.09
 40-80   5.65
 80-120  5.26
 120-160 13.33
 160-200 6.66

 That means, if a destination bin completely overlaps a source bin, its
 complete value is taken. If it overlaps partially, linear interpolation
 of bin sizes should be used.

 It is very important that the overall count amount stays the same (in
 this case 40, so my numbers seem to be correct, I checked that). In this
 example I increased the bin size, but usually I will want to decrease
 bin size (even dramatically).

 Now my pathetic attempts look something like this:

 interpolation_points = 4000
 xpts = [ time.mktime(x.timetuple()) for x in self.getx() ]

 interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points)
 interpolatedy = numpy.interp(interpolatedx, xpts, self.gety())

 self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in
 interpolatedx ]
 self._yreformatted = interpolatedy

 This works somewhat, however I see artifacts depending on the
 destination sample size: for example when I have a spike in the sample
 input and reduce the number of interpolation points (i.e. increase
 destination bin size) slowly, the spike will get smaller and smaller
 (expected behaviour). After some amount of increasing, the spike however
 will magically reappear. I believe this to be an interpolation
 artifact.

 Is there some standard way to get from a non-uniformally distributed bin
 distribution to a unifomally distributed bin distribution of arbitrary
 bin width?

 Best regards,
 Joe
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rebinning numpy array

2011-11-13 Thread Robert Kern
On Sun, Nov 13, 2011 at 17:48, Olivier Delalleau sh...@keba.be wrote:
 Also: it seems like you are using values at the boundaries of the bins,
 while I think it would make more sense to compute interpolated values at the
 middle point of a bin. I'm not sure it'll make a big difference visually,
 but it may be more appropriate.

No, you do want to compute the interpolated values at the boundaries
of the new bins. Then differencing the values at the boundaries will
give you the correct values for the mass between the bounds.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rebinning numpy array

2011-11-13 Thread Olivier Delalleau
2011/11/13 Robert Kern robert.k...@gmail.com

 On Sun, Nov 13, 2011 at 17:48, Olivier Delalleau sh...@keba.be wrote:
  Also: it seems like you are using values at the boundaries of the bins,
  while I think it would make more sense to compute interpolated values at
 the
  middle point of a bin. I'm not sure it'll make a big difference visually,
  but it may be more appropriate.

 No, you do want to compute the interpolated values at the boundaries
 of the new bins. Then differencing the values at the boundaries will
 give you the correct values for the mass between the bounds.


I wrote this with non cumulative data in mind. However I just looked at
your suggestion, which is to accumulate data, and I agree it seems a better
way to achieve what the OP is trying to do, and in that case I agree that
computing interpolated values at the boundaries is the right way to go.
Sorry for the confusion,

-=- Olivier
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion