Can you do something like df[“ColA”] = f(df)? — John
On Jan 21, 2014, at 8:48 AM, Blake Johnson <[email protected]> wrote: > I use within! pretty frequently. What should I be using instead if that is on > the chopping block? > > --Blake > > On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote: > I also agree with your approach, John. Based on your criteria, here > are some other things to consider for the chopping block. > > - expression-based indexing > - NamedArray (you already have an issue on this) > - with, within, based_on and variants > - @transform, @DataFrame > - select, filter > - DataStream > > Many of these were attempts to ease syntax via delayed evaluation. We > can either do without or try to implement something like LINQ. > > > > On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire <[email protected]> wrote: > > Hi John, > > > > I agree with pretty much everything you have written here, and really > > appreciate that you've taken the lead in cleaning things up and getting us > > on track. > > > > Cheers! > > Kevin > > > > > > On Mon, Jan 20, 2014 at 1:57 PM, John Myles White <[email protected]> > > wrote: > >> > >> As I said in another thread recently, I am currently the lead maintainer > >> of more packages than I can keep up with. I think it’s been useful for me > >> to > >> start so many different projects, but I can’t keep maintaining most of my > >> packages given my current work schedule. > >> > >> Without Simon Kornblith, Kevin Squire, Sean Garborg and several others > >> doing amazing work to keep DataArrays and DataFrames going, much of our > >> basic data infrastructure would have already become completely unusable. > >> But > >> even with the great work that’s been done on those package recently, > >> there’s > >> still lot of additional design work required. I’d like to free up some of > >> my > >> time to do that work. > >> > >> To keep things moving forward, I’d like to propose a couple of radical New > >> Year’s resolutions for the packages I work on. > >> > >> (1) We need to stop adding functionality and focus entirely on improving > >> the quality and documentation of our existing functionality. We have way > >> too > >> much prototype code in DataFrames that I can’t keep up with. I’m about to > >> make a pull request for DataFrames that will remove everything related to > >> column groupings, database-style indexing and Blocks.jl support. I > >> absolutely want to see us push all of those ideas forward in the future, > >> but > >> they need to happen in unmerged forks or separate packages until we have > >> the > >> resources needed to support them. Right now, they make an overwhelming > >> maintenance challenge even more onerous. > >> > >> (2) We can’t support anything other than the master branch of most > >> JuliaStats packages except possibly for Distributions. I personally don’t > >> have the time to simultaneously keep stuff working with Julia 0.2 and > >> Julia > >> 0.3. Moreover, many of our basic packages aren’t mature enough to justify > >> supporting older versions. We should do a better job of supporting our > >> master releases and not invest precious time trying to support older > >> releases. > >> > >> (3) We need to make more of DataArrays and DataFrames reflect the Julian > >> worldview. Lots of our code uses an interface that is incongruous with the > >> interfaces found in Base. Even worse, a large chunk of code has > >> type-stability problems that makes it very slow, when comparable code that > >> uses normal Arrays is 100x faster. We need to develop new idioms and new > >> strategies for making code that interacts with type-destabilizing NA’s > >> faster. More generally, we need to make DataArrays and DataFrames fit in > >> better with Julia when Julia and R disagree. Following R’s lead has often > >> lead us astray because R doesn’t share Julia’s strenths or weaknesses. > >> > >> (4) Going forward, there should be exactly one way to do most things. The > >> worst part of our current codebase is that there are multiple ways to > >> express the same computation, but (a) some of them are unusably slow and > >> (b) > >> some of them don’t ever get tested or maintained properly. This is closely > >> linked to the excess proliferation of functionality described in > >> Resolution > >> 1 above. We need to start removing stuff from our packages and making the > >> parts we keep both reliable and fast. > >> > >> I think we can push DataArrays and DataFrames to 1.0 status by the end of > >> this year. But I think we need to adopt a new approach if we’re going to > >> get > >> there. Lots of stuff needs to get deprecated and what remains needs a lot > >> more testing, benchmarking and documentation. > >> > >> — John > >> > >
