Hi Júlio, If you're just interested in using an arbitrary function to filter on rows you can do something like:
df = DataFrame(Fish = ["Amir", "Betty", "Clyde"], Mass = [1.2, 3.3, 0.4]) filter(row) = (row[:Fish][1] != "A")&(row[:Mass]>1) df = df[[filter(r) for r in eachrow(df)],:] Is that what you're looking for? If not, can you give an example of what you want to do? Best, Alex On Wednesday, October 12, 2016 at 10:20:52 PM UTC-7, Júlio Hoffimann wrote: > > Thank you very Much David, these queries you showed are really nice. I > meant that ideally I wouldn't need to install another package for a simple > filter operation on the rows. > > -Júlio > > 2016-10-12 22:14 GMT-07:00 <ant...@berkeley.edu <javascript:>>: > >> Were you worried about Query being not lightweight enough in terms of >> overhead, or in terms of syntax? >> >> I just added a more lightweight syntax for this scenario to Query. You >> can now do the following two things: >> >> q = @where(df, i->i.price > 30.) >> >> that will return a filtered iterator. You can materialize that into a >> DataFrame with collect(q, DataFrame). >> >> I also added a counting option. Turns out that is actually a LINQ query >> operator, and the goal is to implement all of those in Query. The syntax is >> simple: >> >> @count(df, i->i.price > 30.) >> >> returns the number of rows for which the filter condition is true. >> >> Under the hood both of these new syntax options use the normal Query >> machinery, this just provides a simpler syntax relative to the more >> elaborate things I've posted earlier. In terms of LINQ, this corresponds to >> the method invocation API that LINQ has. I'm still figuring out how to >> surface something like @count in the query expression syntax, but for now >> one can use it via this macro. >> >> All of this is on master right now, so you would have to do >> Pkg.checkout("Query") to get these macros. >> >> Best, >> David >> >> On Wednesday, October 12, 2016 at 6:47:15 PM UTC-7, Júlio Hoffimann wrote: >>> >>> Hi David, >>> >>> Thank you for your elaborated answer and for writing a package for >>> general queries, that is great! I will keep the package in mind if I need >>> something more complex. >>> >>> I am currently looking for a lightweight solution within DataFrames, >>> filtering is a very common operation. Right now, I am considering >>> converting the DataFrame to an array and looping over the rows. I wonder if >>> there is a syntactic sugar for this loop. >>> >>> -Júlio >>> >>> 2016-10-12 17:48 GMT-07:00 David Anthoff <ant...@berkeley.edu>: >>> >>>> Hi Julio, >>>> >>>> >>>> >>>> you can use the Query package for the first part. To filter a DataFrame >>>> using some arbitrary julia expression, use something like this: >>>> >>>> >>>> >>>> using DataFrames, Query, NamedTuples >>>> >>>> >>>> >>>> q = @from i in df begin >>>> >>>> @where <filter expression> >>>> >>>> @select i >>>> >>>> end >>>> >>>> >>>> >>>> You can use any julia code in <filter expression>. Say your DataFrame >>>> has a column called price, then you could filter like this: >>>> >>>> >>>> >>>> @where i.price > 30. >>>> >>>> >>>> >>>> The i will be a NamedTuple type, so you can access the columns either >>>> by their name, or also by their index, e.g. >>>> >>>> >>>> >>>> @where i[1] > 30. >>>> >>>> >>>> >>>> if you want to filter by the first column. You can also just call some >>>> function that you have defined somewhere else: >>>> >>>> >>>> >>>> @where foo(i) >>>> >>>> >>>> >>>> As long as the <julia expression> returns a Bool, you should be good. >>>> >>>> >>>> >>>> If you run a query like this, q will be a standard julia iterator. >>>> Right now you can’t just say length(q), although that is something I >>>> should >>>> probably enable at some point (I’m also looking into the VB LINQ syntax >>>> that supports things like counting in the query expression itself). >>>> >>>> >>>> >>>> But you could materialize the query as an array and then look at the >>>> length of that: >>>> >>>> >>>> >>>> q = @from i in df begin >>>> >>>> @where <filter expression> >>>> >>>> @select i >>>> >>>> @collect >>>> >>>> end >>>> >>>> count = length(q) >>>> >>>> >>>> >>>> The @collect statement means that the query will return an array of a >>>> NamedTuple type (you can also materialize it into a whole bunch of other >>>> data structures, take a look at the documentation). >>>> >>>> >>>> >>>> Let me know if this works, or if you have any other feedback on >>>> Query.jl, I’m much in need of some user feedback for the package at this >>>> point. Best way for that is to open issues here >>>> https://github.com/davidanthoff/Query.jl. >>>> >>>> >>>> >>>> Best, >>>> >>>> David >>>> >>>> >>>> >>>> *From:* julia...@googlegroups.com [mailto:julia...@googlegroups.com] *On >>>> Behalf Of *Júlio Hoffimann >>>> *Sent:* Wednesday, October 12, 2016 5:20 PM >>>> *To:* julia-users <julia...@googlegroups.com> >>>> *Subject:* [julia-users] Filtering DataFrame with a function >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> I have a DataFrame for which I want to filter rows that match a given >>>> criteria. I don't have the number of columns beforehand, so I cannot >>>> explicitly list the criteria with the :symbol syntax or write down a fixed >>>> number of indices. >>>> >>>> >>>> >>>> Is there any way to filter with a lambda expression? Or even better, is >>>> there any efficient way to count the number of occurrences of a specific >>>> row of observations? >>>> >>>> >>>> >>>> -Júlio >>>> >>> >>> >