Re: [Numpy-discussion] Rebinning numpy array
Fit a poisson distribution (radioactive decay is a Poisson process), recompute lambda for whatever bin-size you need, and compute the new (estimated) bin counts by maximum likehood. It basically becomes a contrained optimization problem. Sturla Den 13.11.2011 17:04, skrev Johannes Bauer: Hi group, I have a rather simple problem, or so it would seem. However I cannot seem to find the right solution. Here's the problem: A Geiger counter measures counts in distinct time intervals. The time intervals are not of constant length. Imaging for example that the counter would always create a table entry when the counts reach 10. Then we would have the following bins (made-up data for illustration): Seconds Counts Len CPS 0 - 4410 44 0.23 44 - 120 10 76 0.13 120 - 140 10 20 0.5 140 - 200 10 60 0.16 So we have n bins (in this example 4), but they're not equidistant. I want to rebin samples to make them equidistant. For example, I would like to rebin into 5 bins of 40 seconds time each. Then the rebinned example (I calculate by hand so this might contain errors): 0-40 9.09 40-80 5.65 80-1205.26 120-160 13.33 160-200 6.66 That means, if a destination bin completely overlaps a source bin, its complete value is taken. If it overlaps partially, linear interpolation of bin sizes should be used. It is very important that the overall count amount stays the same (in this case 40, so my numbers seem to be correct, I checked that). In this example I increased the bin size, but usually I will want to decrease bin size (even dramatically). Now my pathetic attempts look something like this: interpolation_points = 4000 xpts = [ time.mktime(x.timetuple()) for x in self.getx() ] interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points) interpolatedy = numpy.interp(interpolatedx, xpts, self.gety()) self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in interpolatedx ] self._yreformatted = interpolatedy This works somewhat, however I see artifacts depending on the destination sample size: for example when I have a spike in the sample input and reduce the number of interpolation points (i.e. increase destination bin size) slowly, the spike will get smaller and smaller (expected behaviour). After some amount of increasing, the spike however will magically reappear. I believe this to be an interpolation artifact. Is there some standard way to get from a non-uniformally distributed bin distribution to a unifomally distributed bin distribution of arbitrary bin width? Best regards, Joe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rebinning numpy array
On 11/13/11 9:55 AM, Olivier Delalleau wrote: idea, since it will throw out a lot of information if you decrease the number of bins: I agree -- I'd think about looking at a smooth interpolation -- maybe kernel density estimation? On 11/14/11 8:12 AM, Sturla Molden wrote: Fit a poisson distribution (radioactive decay is a Poisson process), even better -- if you have a physical process that fits a given function form -- us it! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/ORR(206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Rebinning numpy array
Hi group, I have a rather simple problem, or so it would seem. However I cannot seem to find the right solution. Here's the problem: A Geiger counter measures counts in distinct time intervals. The time intervals are not of constant length. Imaging for example that the counter would always create a table entry when the counts reach 10. Then we would have the following bins (made-up data for illustration): Seconds Counts Len CPS 0 - 44 10 44 0.23 44 - 12010 76 0.13 120 - 140 10 20 0.5 140 - 200 10 60 0.16 So we have n bins (in this example 4), but they're not equidistant. I want to rebin samples to make them equidistant. For example, I would like to rebin into 5 bins of 40 seconds time each. Then the rebinned example (I calculate by hand so this might contain errors): 0-409.09 40-80 5.65 80-120 5.26 120-160 13.33 160-200 6.66 That means, if a destination bin completely overlaps a source bin, its complete value is taken. If it overlaps partially, linear interpolation of bin sizes should be used. It is very important that the overall count amount stays the same (in this case 40, so my numbers seem to be correct, I checked that). In this example I increased the bin size, but usually I will want to decrease bin size (even dramatically). Now my pathetic attempts look something like this: interpolation_points = 4000 xpts = [ time.mktime(x.timetuple()) for x in self.getx() ] interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points) interpolatedy = numpy.interp(interpolatedx, xpts, self.gety()) self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in interpolatedx ] self._yreformatted = interpolatedy This works somewhat, however I see artifacts depending on the destination sample size: for example when I have a spike in the sample input and reduce the number of interpolation points (i.e. increase destination bin size) slowly, the spike will get smaller and smaller (expected behaviour). After some amount of increasing, the spike however will magically reappear. I believe this to be an interpolation artifact. Is there some standard way to get from a non-uniformally distributed bin distribution to a unifomally distributed bin distribution of arbitrary bin width? Best regards, Joe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rebinning numpy array
On Sun, Nov 13, 2011 at 16:04, Johannes Bauer dfnsonfsdu...@gmx.de wrote: Hi group, I have a rather simple problem, or so it would seem. However I cannot seem to find the right solution. Here's the problem: A Geiger counter measures counts in distinct time intervals. The time intervals are not of constant length. Imaging for example that the counter would always create a table entry when the counts reach 10. Then we would have the following bins (made-up data for illustration): Seconds Counts Len CPS 0 - 44 10 44 0.23 44 - 120 10 76 0.13 120 - 140 10 20 0.5 140 - 200 10 60 0.16 So we have n bins (in this example 4), but they're not equidistant. I want to rebin samples to make them equidistant. For example, I would like to rebin into 5 bins of 40 seconds time each. Then the rebinned example (I calculate by hand so this might contain errors): 0-40 9.09 40-80 5.65 80-120 5.26 120-160 13.33 160-200 6.66 That means, if a destination bin completely overlaps a source bin, its complete value is taken. If it overlaps partially, linear interpolation of bin sizes should be used. What you want to do is set up a linear interpolation based on the boundaries of the uneven bins. Seconds Value 00 44 10 120 20 140 30 200 40 Then evaluate that linear interpolation on the boundaries of the uniform bins. [~] |18 bin_bounds = np.array([0.0, 44.0, 120, 140, 200]) [~] |19 bin_values = np.array([0.0, 10, 10, 10, 10]) [~] |20 cum_bin_values = bin_values.cumsum() [~] |21 new_bounds = np.array([0.0, 40, 80, 120, 160, 200]) [~] |22 ecdf = np.interp(new_bounds, bin_bounds, cum_bin_values) [~] |23 ecdf array([ 0., 9.09090909, 14.73684211, 20., 33., 40.]) [~] |24 uniform_histogram = np.diff(ecdf) [~] |25 uniform_histogram array([ 9.09090909, 5.64593301, 5.26315789, 13., 6.6667]) This may be what you are doing already. I'm not sure what is in your getx() and gety() methods. If so, then I think you are on the right track. If you still have problems, then we might need to see some of the problematic data and results. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rebinning numpy array
Just one thing: numpy.interp says it doesn't check that the x coordinates are increasing, so make sure it's the case. Assuming this is ok, I could still see how you may get some non-smooth behavior: this may be because your spike can either be split between two bins (which dilutes it somehow), or be included in a single bin (which would make it stand out more). And as you increase your bin size, you will switch between these two situations. -=- Olivier 2011/11/13 Johannes Bauer dfnsonfsdu...@gmx.de Hi group, I have a rather simple problem, or so it would seem. However I cannot seem to find the right solution. Here's the problem: A Geiger counter measures counts in distinct time intervals. The time intervals are not of constant length. Imaging for example that the counter would always create a table entry when the counts reach 10. Then we would have the following bins (made-up data for illustration): Seconds Counts Len CPS 0 - 44 10 44 0.23 44 - 12010 76 0.13 120 - 140 10 20 0.5 140 - 200 10 60 0.16 So we have n bins (in this example 4), but they're not equidistant. I want to rebin samples to make them equidistant. For example, I would like to rebin into 5 bins of 40 seconds time each. Then the rebinned example (I calculate by hand so this might contain errors): 0-409.09 40-80 5.65 80-120 5.26 120-160 13.33 160-200 6.66 That means, if a destination bin completely overlaps a source bin, its complete value is taken. If it overlaps partially, linear interpolation of bin sizes should be used. It is very important that the overall count amount stays the same (in this case 40, so my numbers seem to be correct, I checked that). In this example I increased the bin size, but usually I will want to decrease bin size (even dramatically). Now my pathetic attempts look something like this: interpolation_points = 4000 xpts = [ time.mktime(x.timetuple()) for x in self.getx() ] interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points) interpolatedy = numpy.interp(interpolatedx, xpts, self.gety()) self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in interpolatedx ] self._yreformatted = interpolatedy This works somewhat, however I see artifacts depending on the destination sample size: for example when I have a spike in the sample input and reduce the number of interpolation points (i.e. increase destination bin size) slowly, the spike will get smaller and smaller (expected behaviour). After some amount of increasing, the spike however will magically reappear. I believe this to be an interpolation artifact. Is there some standard way to get from a non-uniformally distributed bin distribution to a unifomally distributed bin distribution of arbitrary bin width? Best regards, Joe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rebinning numpy array
Also: it seems like you are using values at the boundaries of the bins, while I think it would make more sense to compute interpolated values at the middle point of a bin. I'm not sure it'll make a big difference visually, but it may be more appropriate. -=- Olivier 2011/11/13 Olivier Delalleau sh...@keba.be Just one thing: numpy.interp says it doesn't check that the x coordinates are increasing, so make sure it's the case. Assuming this is ok, I could still see how you may get some non-smooth behavior: this may be because your spike can either be split between two bins (which dilutes it somehow), or be included in a single bin (which would make it stand out more). And as you increase your bin size, you will switch between these two situations. -=- Olivier 2011/11/13 Johannes Bauer dfnsonfsdu...@gmx.de Hi group, I have a rather simple problem, or so it would seem. However I cannot seem to find the right solution. Here's the problem: A Geiger counter measures counts in distinct time intervals. The time intervals are not of constant length. Imaging for example that the counter would always create a table entry when the counts reach 10. Then we would have the following bins (made-up data for illustration): Seconds Counts Len CPS 0 - 44 10 44 0.23 44 - 12010 76 0.13 120 - 140 10 20 0.5 140 - 200 10 60 0.16 So we have n bins (in this example 4), but they're not equidistant. I want to rebin samples to make them equidistant. For example, I would like to rebin into 5 bins of 40 seconds time each. Then the rebinned example (I calculate by hand so this might contain errors): 0-409.09 40-80 5.65 80-120 5.26 120-160 13.33 160-200 6.66 That means, if a destination bin completely overlaps a source bin, its complete value is taken. If it overlaps partially, linear interpolation of bin sizes should be used. It is very important that the overall count amount stays the same (in this case 40, so my numbers seem to be correct, I checked that). In this example I increased the bin size, but usually I will want to decrease bin size (even dramatically). Now my pathetic attempts look something like this: interpolation_points = 4000 xpts = [ time.mktime(x.timetuple()) for x in self.getx() ] interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points) interpolatedy = numpy.interp(interpolatedx, xpts, self.gety()) self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in interpolatedx ] self._yreformatted = interpolatedy This works somewhat, however I see artifacts depending on the destination sample size: for example when I have a spike in the sample input and reduce the number of interpolation points (i.e. increase destination bin size) slowly, the spike will get smaller and smaller (expected behaviour). After some amount of increasing, the spike however will magically reappear. I believe this to be an interpolation artifact. Is there some standard way to get from a non-uniformally distributed bin distribution to a unifomally distributed bin distribution of arbitrary bin width? Best regards, Joe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rebinning numpy array
(Sorry for the spam, I should have given more thought to this before hitting reply). It actually seems to me that using a linear interpolation is not a good idea, since it will throw out a lot of information if you decrease the number of bins: to compute the value at time t, it will only use the closest bins (t_k and t_{k+1} such that t_k t t_{k+1}), so that data stored in many of the bins will not be used at all. I haven't looked closely at the suggestion from Robert but it may be a better way to achieve what you want. -=- Olivier 2011/11/13 Olivier Delalleau sh...@keba.be Also: it seems like you are using values at the boundaries of the bins, while I think it would make more sense to compute interpolated values at the middle point of a bin. I'm not sure it'll make a big difference visually, but it may be more appropriate. -=- Olivier 2011/11/13 Olivier Delalleau sh...@keba.be Just one thing: numpy.interp says it doesn't check that the x coordinates are increasing, so make sure it's the case. Assuming this is ok, I could still see how you may get some non-smooth behavior: this may be because your spike can either be split between two bins (which dilutes it somehow), or be included in a single bin (which would make it stand out more). And as you increase your bin size, you will switch between these two situations. -=- Olivier 2011/11/13 Johannes Bauer dfnsonfsdu...@gmx.de Hi group, I have a rather simple problem, or so it would seem. However I cannot seem to find the right solution. Here's the problem: A Geiger counter measures counts in distinct time intervals. The time intervals are not of constant length. Imaging for example that the counter would always create a table entry when the counts reach 10. Then we would have the following bins (made-up data for illustration): Seconds Counts Len CPS 0 - 44 10 44 0.23 44 - 12010 76 0.13 120 - 140 10 20 0.5 140 - 200 10 60 0.16 So we have n bins (in this example 4), but they're not equidistant. I want to rebin samples to make them equidistant. For example, I would like to rebin into 5 bins of 40 seconds time each. Then the rebinned example (I calculate by hand so this might contain errors): 0-409.09 40-80 5.65 80-120 5.26 120-160 13.33 160-200 6.66 That means, if a destination bin completely overlaps a source bin, its complete value is taken. If it overlaps partially, linear interpolation of bin sizes should be used. It is very important that the overall count amount stays the same (in this case 40, so my numbers seem to be correct, I checked that). In this example I increased the bin size, but usually I will want to decrease bin size (even dramatically). Now my pathetic attempts look something like this: interpolation_points = 4000 xpts = [ time.mktime(x.timetuple()) for x in self.getx() ] interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points) interpolatedy = numpy.interp(interpolatedx, xpts, self.gety()) self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in interpolatedx ] self._yreformatted = interpolatedy This works somewhat, however I see artifacts depending on the destination sample size: for example when I have a spike in the sample input and reduce the number of interpolation points (i.e. increase destination bin size) slowly, the spike will get smaller and smaller (expected behaviour). After some amount of increasing, the spike however will magically reappear. I believe this to be an interpolation artifact. Is there some standard way to get from a non-uniformally distributed bin distribution to a unifomally distributed bin distribution of arbitrary bin width? Best regards, Joe ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rebinning numpy array
On Sun, Nov 13, 2011 at 17:48, Olivier Delalleau sh...@keba.be wrote: Also: it seems like you are using values at the boundaries of the bins, while I think it would make more sense to compute interpolated values at the middle point of a bin. I'm not sure it'll make a big difference visually, but it may be more appropriate. No, you do want to compute the interpolated values at the boundaries of the new bins. Then differencing the values at the boundaries will give you the correct values for the mass between the bounds. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rebinning numpy array
2011/11/13 Robert Kern robert.k...@gmail.com On Sun, Nov 13, 2011 at 17:48, Olivier Delalleau sh...@keba.be wrote: Also: it seems like you are using values at the boundaries of the bins, while I think it would make more sense to compute interpolated values at the middle point of a bin. I'm not sure it'll make a big difference visually, but it may be more appropriate. No, you do want to compute the interpolated values at the boundaries of the new bins. Then differencing the values at the boundaries will give you the correct values for the mass between the bounds. I wrote this with non cumulative data in mind. However I just looked at your suggestion, which is to accumulate data, and I agree it seems a better way to achieve what the OP is trying to do, and in that case I agree that computing interpolated values at the boundaries is the right way to go. Sorry for the confusion, -=- Olivier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion