Thank you very Much David, these queries you showed are really nice. I meant that ideally I wouldn't need to install another package for a simple filter operation on the rows.
-Júlio 2016-10-12 22:14 GMT-07:00 <anth...@berkeley.edu>: > Were you worried about Query being not lightweight enough in terms of > overhead, or in terms of syntax? > > I just added a more lightweight syntax for this scenario to Query. You can > now do the following two things: > > q = @where(df, i->i.price > 30.) > > that will return a filtered iterator. You can materialize that into a > DataFrame with collect(q, DataFrame). > > I also added a counting option. Turns out that is actually a LINQ query > operator, and the goal is to implement all of those in Query. The syntax is > simple: > > @count(df, i->i.price > 30.) > > returns the number of rows for which the filter condition is true. > > Under the hood both of these new syntax options use the normal Query > machinery, this just provides a simpler syntax relative to the more > elaborate things I've posted earlier. In terms of LINQ, this corresponds to > the method invocation API that LINQ has. I'm still figuring out how to > surface something like @count in the query expression syntax, but for now > one can use it via this macro. > > All of this is on master right now, so you would have to do > Pkg.checkout("Query") to get these macros. > > Best, > David > > On Wednesday, October 12, 2016 at 6:47:15 PM UTC-7, Júlio Hoffimann wrote: >> >> Hi David, >> >> Thank you for your elaborated answer and for writing a package for >> general queries, that is great! I will keep the package in mind if I need >> something more complex. >> >> I am currently looking for a lightweight solution within DataFrames, >> filtering is a very common operation. Right now, I am considering >> converting the DataFrame to an array and looping over the rows. I wonder if >> there is a syntactic sugar for this loop. >> >> -Júlio >> >> 2016-10-12 17:48 GMT-07:00 David Anthoff <ant...@berkeley.edu>: >> >>> Hi Julio, >>> >>> >>> >>> you can use the Query package for the first part. To filter a DataFrame >>> using some arbitrary julia expression, use something like this: >>> >>> >>> >>> using DataFrames, Query, NamedTuples >>> >>> >>> >>> q = @from i in df begin >>> >>> @where <filter expression> >>> >>> @select i >>> >>> end >>> >>> >>> >>> You can use any julia code in <filter expression>. Say your DataFrame >>> has a column called price, then you could filter like this: >>> >>> >>> >>> @where i.price > 30. >>> >>> >>> >>> The i will be a NamedTuple type, so you can access the columns either by >>> their name, or also by their index, e.g. >>> >>> >>> >>> @where i[1] > 30. >>> >>> >>> >>> if you want to filter by the first column. You can also just call some >>> function that you have defined somewhere else: >>> >>> >>> >>> @where foo(i) >>> >>> >>> >>> As long as the <julia expression> returns a Bool, you should be good. >>> >>> >>> >>> If you run a query like this, q will be a standard julia iterator. Right >>> now you can’t just say length(q), although that is something I should >>> probably enable at some point (I’m also looking into the VB LINQ syntax >>> that supports things like counting in the query expression itself). >>> >>> >>> >>> But you could materialize the query as an array and then look at the >>> length of that: >>> >>> >>> >>> q = @from i in df begin >>> >>> @where <filter expression> >>> >>> @select i >>> >>> @collect >>> >>> end >>> >>> count = length(q) >>> >>> >>> >>> The @collect statement means that the query will return an array of a >>> NamedTuple type (you can also materialize it into a whole bunch of other >>> data structures, take a look at the documentation). >>> >>> >>> >>> Let me know if this works, or if you have any other feedback on >>> Query.jl, I’m much in need of some user feedback for the package at this >>> point. Best way for that is to open issues here >>> https://github.com/davidanthoff/Query.jl. >>> >>> >>> >>> Best, >>> >>> David >>> >>> >>> >>> *From:* julia...@googlegroups.com [mailto:julia...@googlegroups.com] *On >>> Behalf Of *Júlio Hoffimann >>> *Sent:* Wednesday, October 12, 2016 5:20 PM >>> *To:* julia-users <julia...@googlegroups.com> >>> *Subject:* [julia-users] Filtering DataFrame with a function >>> >>> >>> >>> Hi, >>> >>> >>> >>> I have a DataFrame for which I want to filter rows that match a given >>> criteria. I don't have the number of columns beforehand, so I cannot >>> explicitly list the criteria with the :symbol syntax or write down a fixed >>> number of indices. >>> >>> >>> >>> Is there any way to filter with a lambda expression? Or even better, is >>> there any efficient way to count the number of occurrences of a specific >>> row of observations? >>> >>> >>> >>> -Júlio >>> >> >>