Got it. I was thinking of the more verbose (but still useful) df[(df["colA"] > 4) & !isna(df["colB"]), :]
Kevin On Wed, Jan 22, 2014 at 3:10 PM, John Myles White <[email protected]>wrote: > The idealized expression interface offers things like (up to reordering): > > with(df, a + b * x) > > where a and b are variables in the caller's scope and x is a column of df. > > In practice, we've had to hack this sort of thing together to offer things > like > > with(df, :($a + $b * x)) > > That's because we need to pass quoted strings and we also need to tell the > system which variables are in the caller's cope. > > More generally, I'd refer to any operation that passes expressions around > and asks other functions to evaluate them with an ad hoc scope as > expression-based operations. > > R offers very deep support for this in the language. > > -- John > > On Jan 22, 2014, at 2:48 PM, Kevin Squire <[email protected]> wrote: > > Maybe I misinterpreted the term "expression-based interface". > > > On Wed, Jan 22, 2014 at 2:33 PM, John Myles White < > [email protected]> wrote: > >> My impression is that Pandas didn't support anything like delayed >> evaluation. Is that wrong? >> >> I'm aware that the resulting expressions are a lot more verbose. That >> definitely sucks. >> >> I'd love to see strong proposals for how we're going to do a better job >> of making code shorter going forward. But too much of our current codebase >> is buggy, unable to handle edge cases, slow and undocumented. I think it's >> much more important that we have one way of doing things that actually >> works as advertised for every Julia user than two ways of doing things, >> each of which is slightly broken and performs worse than R and Pandas. >> >> As I've been saying lately, I'm burning out on maintaing so much Julia >> code. If someone else wants to take charge of my projects, I'm ok with >> that. But if I'm going to be doing the work going forward, I need to devote >> my energies to making a small number of things work really well. Once we >> get our core functionality solid, I'll be comfortable getting fancier stuff >> working again. >> >> -- John >> >> On Jan 22, 2014, at 1:06 PM, Kevin Squire <[email protected]> wrote: >> >> I'm also a fan of the expression-based interface (mostly because I'm used >> to similar things in Pandas). I haven't looked at that code, though, so I >> can't comment on the complexity. >> >> Kevin >> >> >> On Wed, Jan 22, 2014 at 11:18 AM, Blake Johnson <[email protected] >> > wrote: >> >>> Sure, but the resulting expression is *much* more verbose. I just >>> noticed that all expression-based indexing was on the chopping block. What >>> is left after all this? >>> >>> I can see how axing these features would make DataFrames.jl easier to >>> maintain, but I found the expression stuff to present a rather nice >>> interface. >>> >>> --Blake >>> >>> >>> On Tuesday, January 21, 2014 11:51:03 AM UTC-5, John Myles White wrote: >>> >>>> Can you do something like df[“ColA”] = f(df)? >>>> >>>> — John >>>> >>>> >>>> On Jan 21, 2014, at 8:48 AM, Blake Johnson <[email protected]> >>>> wrote: >>>> >>>> I use within! pretty frequently. What should I be using instead if that >>>> is on the chopping block? >>>> >>>> --Blake >>>> >>>> On Tuesday, January 21, 2014 7:42:39 AM UTC-5, tshort wrote: >>>>> >>>>> I also agree with your approach, John. Based on your criteria, here >>>>> are some other things to consider for the chopping block. >>>>> >>>>> - expression-based indexing >>>>> - NamedArray (you already have an issue on this) >>>>> - with, within, based_on and variants >>>>> - @transform, @DataFrame >>>>> - select, filter >>>>> - DataStream >>>>> >>>>> Many of these were attempts to ease syntax via delayed evaluation. We >>>>> can either do without or try to implement something like LINQ. >>>>> >>>>> >>>>> >>>>> On Mon, Jan 20, 2014 at 7:02 PM, Kevin Squire <[email protected]> >>>>> wrote: >>>>> > Hi John, >>>>> > >>>>> > I agree with pretty much everything you have written here, and really >>>>> >>>>> > appreciate that you've taken the lead in cleaning things up and >>>>> getting us >>>>> > on track. >>>>> > >>>>> > Cheers! >>>>> > Kevin >>>>> > >>>>> > >>>>> > On Mon, Jan 20, 2014 at 1:57 PM, John Myles White <johnmyl...@ >>>>> gmail.com> >>>>> > wrote: >>>>> >> >>>>> >> As I said in another thread recently, I am currently the lead >>>>> maintainer >>>>> >> of more packages than I can keep up with. I think it’s been useful >>>>> for me to >>>>> >> start so many different projects, but I can’t keep maintaining most >>>>> of my >>>>> >> packages given my current work schedule. >>>>> >> >>>>> >> Without Simon Kornblith, Kevin Squire, Sean Garborg and several >>>>> others >>>>> >> doing amazing work to keep DataArrays and DataFrames going, much of >>>>> our >>>>> >> basic data infrastructure would have already become completely >>>>> unusable. But >>>>> >> even with the great work that’s been done on those package >>>>> recently, there’s >>>>> >> still lot of additional design work required. I’d like to free up >>>>> some of my >>>>> >> time to do that work. >>>>> >> >>>>> >> To keep things moving forward, I’d like to propose a couple of >>>>> radical New >>>>> >> Year’s resolutions for the packages I work on. >>>>> >> >>>>> >> (1) We need to stop adding functionality and focus entirely on >>>>> improving >>>>> >> the quality and documentation of our existing functionality. We >>>>> have way too >>>>> >> much prototype code in DataFrames that I can’t keep up with. I’m >>>>> about to >>>>> >> make a pull request for DataFrames that will remove everything >>>>> related to >>>>> >> column groupings, database-style indexing and Blocks.jl support. I >>>>> >> absolutely want to see us push all of those ideas forward in the >>>>> future, but >>>>> >> they need to happen in unmerged forks or separate packages until we >>>>> have the >>>>> >> resources needed to support them. Right now, they make an >>>>> overwhelming >>>>> >> maintenance challenge even more onerous. >>>>> >> >>>>> >> (2) We can’t support anything other than the master branch of most >>>>> >> JuliaStats packages except possibly for Distributions. I personally >>>>> don’t >>>>> >> have the time to simultaneously keep stuff working with Julia 0.2 >>>>> and Julia >>>>> >> 0.3. Moreover, many of our basic packages aren’t mature enough to >>>>> justify >>>>> >> supporting older versions. We should do a better job of supporting >>>>> our >>>>> >> master releases and not invest precious time trying to support older >>>>> >>>>> >> releases. >>>>> >> >>>>> >> (3) We need to make more of DataArrays and DataFrames reflect the >>>>> Julian >>>>> >> worldview. Lots of our code uses an interface that is incongruous >>>>> with the >>>>> >> interfaces found in Base. Even worse, a large chunk of code has >>>>> >> type-stability problems that makes it very slow, when comparable >>>>> code that >>>>> >> uses normal Arrays is 100x faster. We need to develop new idioms >>>>> and new >>>>> >> strategies for making code that interacts with type-destabilizing >>>>> NA’s >>>>> >> faster. More generally, we need to make DataArrays and DataFrames >>>>> fit in >>>>> >> better with Julia when Julia and R disagree. Following R’s lead has >>>>> often >>>>> >> lead us astray because R doesn’t share Julia’s strenths or >>>>> weaknesses. >>>>> >> >>>>> >> (4) Going forward, there should be exactly one way to do most >>>>> things. The >>>>> >> worst part of our current codebase is that there are multiple ways >>>>> to >>>>> >> express the same computation, but (a) some of them are unusably >>>>> slow and (b) >>>>> >> some of them don’t ever get tested or maintained properly. This is >>>>> closely >>>>> >> linked to the excess proliferation of functionality described in >>>>> Resolution >>>>> >> 1 above. We need to start removing stuff from our packages and >>>>> making the >>>>> >> parts we keep both reliable and fast. >>>>> >> >>>>> >> I think we can push DataArrays and DataFrames to 1.0 status by the >>>>> end of >>>>> >> this year. But I think we need to adopt a new approach if we’re >>>>> going to get >>>>> >> there. Lots of stuff needs to get deprecated and what remains needs >>>>> a lot >>>>> >> more testing, benchmarking and documentation. >>>>> >> >>>>> >> — John >>>>> >> >>>>> > >>>> >>>> >>>> >> >> > >
