On Sun, Nov 21, 2010 at 12:30 PM, <josef.p...@gmail.com> wrote: > On Sun, Nov 21, 2010 at 2:48 PM, Keith Goodman <kwgood...@gmail.com> wrote: >> On Sun, Nov 21, 2010 at 10:25 AM, Wes McKinney <wesmck...@gmail.com> wrote: >>> On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman <kwgood...@gmail.com> wrote: >>>> On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney <wesmck...@gmail.com> wrote: >>>> >>>>> Keith (and others), >>>>> >>>>> What would you think about creating a library of mostly Cython-based >>>>> "domain specific functions"? So stuff like rolling statistical >>>>> moments, nan* functions like you have here, and all that-- NumPy-array >>>>> only functions that don't necessarily belong in NumPy or SciPy (but >>>>> could be included on down the road). You were already talking about >>>>> this on the statsmodels mailing list for larry. I spent a lot of time >>>>> writing a bunch of these for pandas over the last couple of years, and >>>>> I would have relatively few qualms about moving these outside of >>>>> pandas and introducing a dependency. You could do the same for larry-- >>>>> then we'd all be relying on the same well-vetted and tested codebase. >>>> >>>> I've started working on moving window statistics cython functions. I >>>> plan to make it into a package called Roly (for rolling). The >>>> signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, >>>> window, axis=-1), etc. >>>> >>>> I think of Nanny and Roly as two separate packages. A narrow focus is >>>> good for a new package. But maybe each package could be a subpackage >>>> in a super package? >>>> >>>> Would the function signatures in Nanny (exact duplicates of the >>>> corresponding functions in Numpy and Scipy) work for pandas? I plan to >>>> use Nanny in larry. I'll try to get the structure of the Nanny package >>>> in place. But if it doesn't attract any interest after that then I may >>>> fold it into larry. >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion@scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> Why make multiple packages? It seems like all these functions are >>> somewhat related: practical tools for real-world data analysis (where >>> observations are often missing). I suspect having everything under one >>> hood would create more interest than chopping things up-- would be >>> very useful to folks in many different disciplines (finance, >>> economics, statistics, etc.). In R, for example, NA-handling is just a >>> part of every day life. Of course in R there is a special NA value >>> which is distinct from NaN-- many folks object to the use of NaN for >>> missing values. The alternative is masked arrays, but in my case I >>> wasn't willing to sacrifice so much performance for purity's sake. >>> >>> I could certainly use the nan* functions to replace code in pandas >>> where I've handled things in a somewhat ad hoc way. >> >> A package focused on NaN-aware functions sounds like a good idea. I >> think a good plan would be to start by making faster, drop-in >> replacements for the NaN functions that are already in numpy and >> scipy. That is already a lot of work. After that, one possibility is >> to add stuff like nancumsum, nanprod, etc. After that moving window >> stuff? > > and maybe group functions after that?
Yes, group functions are on my list. > If there is a lot of repetition, you could use templating. Even simple > string substitution, if it is only replacing the dtype, works pretty > well. It would at least reduce some copy-paste. Unit test coverage should be good enough to mess around with trying templating. What's a good way to go? Write my own script that creates the .pyx file and call it from the make file? Or are there packages for doing the templating? I added nanmean (the first scipy function to enter nanny) and nanmin. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion