Sure, but the resulting expression is *much* more verbose. I just noticed that all expression-based indexing was on the chopping block. What is left after all this?
I can see how axing these features would make DataFrames.jl easier to maintain, but I found the expression stuff to present a rather nice interface. --Blake On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote: > > Can you do something like df[“ColA”] = f(df)? > > — John > > On Jan 21, 2014, at 8:48 AM, Blake Johnson > <[email protected]<javascript:>> > wrote: > > I use within! pretty frequently. What should I be using instead if that is > on the chopping block? > > --Blake > > On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote: >> >> I also agree with your approach, John. Based on your criteria, here >> are some other things to consider for the chopping block. >> >> - expression-based indexing >> - NamedArray (you already have an issue on this) >> - with, within, based_on and variants >> - @transform, @DataFrame >> - select, filter >> - DataStream >> >> Many of these were attempts to ease syntax via delayed evaluation. We >> can either do without or try to implement something like LINQ. >> >> >> >> On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire <[email protected]> >> wrote: >> > Hi John, >> > >> > I agree with pretty much everything you have written here, and really >> > appreciate that you've taken the lead in cleaning things up and getting >> us >> > on track. >> > >> > Cheers! >> > Kevin >> > >> > >> > On Mon, Jan 20, 2014 at 1:57 PM, John Myles White <[email protected] >> > >> > wrote: >> >> >> >> As I said in another thread recently, I am currently the lead >> maintainer >> >> of more packages than I can keep up with. I think it’s been useful for >> me to >> >> start so many different projects, but I can’t keep maintaining most of >> my >> >> packages given my current work schedule. >> >> >> >> Without Simon Kornblith, Kevin Squire, Sean Garborg and several others >> >> >> doing amazing work to keep DataArrays and DataFrames going, much of our >> >> >> basic data infrastructure would have already become completely >> unusable. But >> >> even with the great work that’s been done on those package recently, >> there’s >> >> still lot of additional design work required. I’d like to free up some >> of my >> >> time to do that work. >> >> >> >> To keep things moving forward, I’d like to propose a couple of radical >> New >> >> Year’s resolutions for the packages I work on. >> >> >> >> (1) We need to stop adding functionality and focus entirely on >> improving >> >> the quality and documentation of our existing functionality. We have >> way too >> >> much prototype code in DataFrames that I can’t keep up with. I’m about >> to >> >> make a pull request for DataFrames that will remove everything related >> to >> >> column groupings, database-style indexing and Blocks.jl support. I >> >> absolutely want to see us push all of those ideas forward in the >> future, but >> >> they need to happen in unmerged forks or separate packages until we >> have the >> >> resources needed to support them. Right now, they make an overwhelming >> >> >> maintenance challenge even more onerous. >> >> >> >> (2) We can’t support anything other than the master branch of most >> >> JuliaStats packages except possibly for Distributions. I personally >> don’t >> >> have the time to simultaneously keep stuff working with Julia 0.2 and >> Julia >> >> 0.3. Moreover, many of our basic packages aren’t mature enough to >> justify >> >> supporting older versions. We should do a better job of supporting our >> >> >> master releases and not invest precious time trying to support older >> >> releases. >> >> >> >> (3) We need to make more of DataArrays and DataFrames reflect the >> Julian >> >> worldview. Lots of our code uses an interface that is incongruous with >> the >> >> interfaces found in Base. Even worse, a large chunk of code has >> >> type-stability problems that makes it very slow, when comparable code >> that >> >> uses normal Arrays is 100x faster. We need to develop new idioms and >> new >> >> strategies for making code that interacts with type-destabilizing NA’s >> >> >> faster. More generally, we need to make DataArrays and DataFrames fit >> in >> >> better with Julia when Julia and R disagree. Following R’s lead has >> often >> >> lead us astray because R doesn’t share Julia’s strenths or weaknesses. >> >> >> >> >> (4) Going forward, there should be exactly one way to do most things. >> The >> >> worst part of our current codebase is that there are multiple ways to >> >> express the same computation, but (a) some of them are unusably slow >> and (b) >> >> some of them don’t ever get tested or maintained properly. This is >> closely >> >> linked to the excess proliferation of functionality described in >> Resolution >> >> 1 above. We need to start removing stuff from our packages and making >> the >> >> parts we keep both reliable and fast. >> >> >> >> I think we can push DataArrays and DataFrames to 1.0 status by the end >> of >> >> this year. But I think we need to adopt a new approach if we’re going >> to get >> >> there. Lots of stuff needs to get deprecated and what remains needs a >> lot >> >> more testing, benchmarking and documentation. >> >> >> >> — John >> >> >> > > > >
