On Sat, Nov 20, 2010 at 7:24 PM, Keith Goodman <kwgood...@gmail.com> wrote: > On Sat, Nov 20, 2010 at 3:54 PM, Wes McKinney <wesmck...@gmail.com> wrote: > >> Keith (and others), >> >> What would you think about creating a library of mostly Cython-based >> "domain specific functions"? So stuff like rolling statistical >> moments, nan* functions like you have here, and all that-- NumPy-array >> only functions that don't necessarily belong in NumPy or SciPy (but >> could be included on down the road). You were already talking about >> this on the statsmodels mailing list for larry. I spent a lot of time >> writing a bunch of these for pandas over the last couple of years, and >> I would have relatively few qualms about moving these outside of >> pandas and introducing a dependency. You could do the same for larry-- >> then we'd all be relying on the same well-vetted and tested codebase. > > I've started working on moving window statistics cython functions. I > plan to make it into a package called Roly (for rolling). The > signatures are: mov_sum(arr, window, axis=-1) and mov_nansum(arr, > window, axis=-1), etc. > > I think of Nanny and Roly as two separate packages. A narrow focus is > good for a new package. But maybe each package could be a subpackage > in a super package? > > Would the function signatures in Nanny (exact duplicates of the > corresponding functions in Numpy and Scipy) work for pandas? I plan to > use Nanny in larry. I'll try to get the structure of the Nanny package > in place. But if it doesn't attract any interest after that then I may > fold it into larry. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
Why make multiple packages? It seems like all these functions are somewhat related: practical tools for real-world data analysis (where observations are often missing). I suspect having everything under one hood would create more interest than chopping things up-- would be very useful to folks in many different disciplines (finance, economics, statistics, etc.). In R, for example, NA-handling is just a part of every day life. Of course in R there is a special NA value which is distinct from NaN-- many folks object to the use of NaN for missing values. The alternative is masked arrays, but in my case I wasn't willing to sacrifice so much performance for purity's sake. I could certainly use the nan* functions to replace code in pandas where I've handled things in a somewhat ad hoc way. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion