Hi David, Thank you for your elaborated answer and for writing a package for general queries, that is great! I will keep the package in mind if I need something more complex.
I am currently looking for a lightweight solution within DataFrames, filtering is a very common operation. Right now, I am considering converting the DataFrame to an array and looping over the rows. I wonder if there is a syntactic sugar for this loop. -Júlio 2016-10-12 17:48 GMT-07:00 David Anthoff <anth...@berkeley.edu>: > Hi Julio, > > > > you can use the Query package for the first part. To filter a DataFrame > using some arbitrary julia expression, use something like this: > > > > using DataFrames, Query, NamedTuples > > > > q = @from i in df begin > > @where <filter expression> > > @select i > > end > > > > You can use any julia code in <filter expression>. Say your DataFrame has > a column called price, then you could filter like this: > > > > @where i.price > 30. > > > > The i will be a NamedTuple type, so you can access the columns either by > their name, or also by their index, e.g. > > > > @where i[1] > 30. > > > > if you want to filter by the first column. You can also just call some > function that you have defined somewhere else: > > > > @where foo(i) > > > > As long as the <julia expression> returns a Bool, you should be good. > > > > If you run a query like this, q will be a standard julia iterator. Right > now you can’t just say length(q), although that is something I should > probably enable at some point (I’m also looking into the VB LINQ syntax > that supports things like counting in the query expression itself). > > > > But you could materialize the query as an array and then look at the > length of that: > > > > q = @from i in df begin > > @where <filter expression> > > @select i > > @collect > > end > > count = length(q) > > > > The @collect statement means that the query will return an array of a > NamedTuple type (you can also materialize it into a whole bunch of other > data structures, take a look at the documentation). > > > > Let me know if this works, or if you have any other feedback on Query.jl, > I’m much in need of some user feedback for the package at this point. Best > way for that is to open issues here https://github.com/ > davidanthoff/Query.jl. > > > > Best, > > David > > > > *From:* julia-users@googlegroups.com [mailto:julia-users@googlegroups.com] > *On Behalf Of *Júlio Hoffimann > *Sent:* Wednesday, October 12, 2016 5:20 PM > *To:* julia-users <julia-users@googlegroups.com> > *Subject:* [julia-users] Filtering DataFrame with a function > > > > Hi, > > > > I have a DataFrame for which I want to filter rows that match a given > criteria. I don't have the number of columns beforehand, so I cannot > explicitly list the criteria with the :symbol syntax or write down a fixed > number of indices. > > > > Is there any way to filter with a lambda expression? Or even better, is > there any efficient way to count the number of occurrences of a specific > row of observations? > > > > -Júlio >