Hi Julio,


you can use the Query package for the first part. To filter a DataFrame using 
some arbitrary julia expression, use something like this:


using DataFrames, Query, NamedTuples


q = @from i in df begin

    @where <filter expression>

    @select i



You can use any julia code in <filter expression>. Say your DataFrame has a 
column called price, then you could filter like this:


@where i.price > 30.


The i will be a NamedTuple type, so you can access the columns either by their 
name, or also by their index, e.g.


@where i[1] > 30.


if you want to filter by the first column. You can also just call some function 
that you have defined somewhere else:


@where foo(i)


As long as the <julia expression> returns a Bool, you should be good.


If you run a query like this, q will be a standard julia iterator. Right now 
you can’t just say length(q), although that is something I should probably 
enable at some point (I’m also looking into the VB LINQ syntax that supports 
things like counting in the query expression itself).


But you could materialize the query as an array and then look at the length of 


q = @from i in df begin

    @where <filter expression>

    @select i



count = length(q)


The @collect statement means that the query will return an array of a 
NamedTuple type (you can also materialize it into a whole bunch of other data 
structures, take a look at the documentation).


Let me know if this works, or if you have any other feedback on Query.jl, I’m 
much in need of some user feedback for the package at this point. Best way for 
that is to open issues here https://github.com/davidanthoff/Query.jl.





From: julia-users@googlegroups.com [mailto:julia-users@googlegroups.com] On 
Behalf Of Júlio Hoffimann
Sent: Wednesday, October 12, 2016 5:20 PM
To: julia-users <julia-users@googlegroups.com>
Subject: [julia-users] Filtering DataFrame with a function




I have a DataFrame for which I want to filter rows that match a given criteria. 
I don't have the number of columns beforehand, so I cannot explicitly list the 
criteria with the :symbol syntax or write down a fixed number of indices.


Is there any way to filter with a lambda expression? Or even better, is there 
any efficient way to count the number of occurrences of a specific row of 



Reply via email to