Were you worried about Query being not lightweight enough in terms of 
overhead, or in terms of syntax?

I just added a more lightweight syntax for this scenario to Query. You can 
now do the following two things:

q = @where(df, i->i.price > 30.)

that will return a filtered iterator. You can materialize that into a 
DataFrame with collect(q, DataFrame).

I also added a counting option. Turns out that is actually a LINQ query 
operator, and the goal is to implement all of those in Query. The syntax is 
simple:

@count(df, i->i.price > 30.)

returns the number of rows for which the filter condition is true.

Under the hood both of these new syntax options use the normal Query 
machinery, this just provides a simpler syntax relative to the more 
elaborate things I've posted earlier. In terms of LINQ, this corresponds to 
the method invocation API that LINQ has. I'm still figuring out how to 
surface something like @count in the query expression syntax, but for now 
one can use it via this macro.

All of this is on master right now, so you would have to do 
Pkg.checkout("Query") to get these macros.

Best,
David

On Wednesday, October 12, 2016 at 6:47:15 PM UTC-7, Júlio Hoffimann wrote:
>
> Hi David,
>
> Thank you for your elaborated answer and for writing a package for general 
> queries, that is great! I will keep the package in mind if I need something 
> more complex.
>
> I am currently looking for a lightweight solution within DataFrames, 
> filtering is a very common operation. Right now, I am considering 
> converting the DataFrame to an array and looping over the rows. I wonder if 
> there is a syntactic sugar for this loop.
>
> -Júlio
>
> 2016-10-12 17:48 GMT-07:00 David Anthoff <ant...@berkeley.edu 
> <javascript:>>:
>
>> Hi Julio,
>>
>>  
>>
>> you can use the Query package for the first part. To filter a DataFrame 
>> using some arbitrary julia expression, use something like this:
>>
>>  
>>
>> using DataFrames, Query, NamedTuples
>>
>>  
>>
>> q = @from i in df begin
>>
>>     @where <filter expression>
>>
>>     @select i
>>
>> end
>>
>>  
>>
>> You can use any julia code in <filter expression>. Say your DataFrame has 
>> a column called price, then you could filter like this:
>>
>>  
>>
>> @where i.price > 30.
>>
>>  
>>
>> The i will be a NamedTuple type, so you can access the columns either by 
>> their name, or also by their index, e.g.
>>
>>  
>>
>> @where i[1] > 30.
>>
>>  
>>
>> if you want to filter by the first column. You can also just call some 
>> function that you have defined somewhere else:
>>
>>  
>>
>> @where foo(i)
>>
>>  
>>
>> As long as the <julia expression> returns a Bool, you should be good.
>>
>>  
>>
>> If you run a query like this, q will be a standard julia iterator. Right 
>> now you can’t just say length(q), although that is something I should 
>> probably enable at some point (I’m also looking into the VB LINQ syntax 
>> that supports things like counting in the query expression itself).
>>
>>  
>>
>> But you could materialize the query as an array and then look at the 
>> length of that:
>>
>>  
>>
>> q = @from i in df begin
>>
>>     @where <filter expression>
>>
>>     @select i
>>
>>     @collect
>>
>> end
>>
>> count = length(q)
>>
>>  
>>
>> The @collect statement means that the query will return an array of a 
>> NamedTuple type (you can also materialize it into a whole bunch of other 
>> data structures, take a look at the documentation).
>>
>>  
>>
>> Let me know if this works, or if you have any other feedback on Query.jl, 
>> I’m much in need of some user feedback for the package at this point. Best 
>> way for that is to open issues here 
>> https://github.com/davidanthoff/Query.jl.
>>
>>  
>>
>> Best,
>>
>> David
>>
>>  
>>
>> *From:* julia...@googlegroups.com <javascript:> [mailto:
>> julia...@googlegroups.com <javascript:>] *On Behalf Of *Júlio Hoffimann
>> *Sent:* Wednesday, October 12, 2016 5:20 PM
>> *To:* julia-users <julia...@googlegroups.com <javascript:>>
>> *Subject:* [julia-users] Filtering DataFrame with a function
>>
>>  
>>
>> Hi,
>>
>>  
>>
>> I have a DataFrame for which I want to filter rows that match a given 
>> criteria. I don't have the number of columns beforehand, so I cannot 
>> explicitly list the criteria with the :symbol syntax or write down a fixed 
>> number of indices.
>>
>>  
>>
>> Is there any way to filter with a lambda expression? Or even better, is 
>> there any efficient way to count the number of occurrences of a specific 
>> row of observations?
>>
>>  
>>
>> -Júlio
>>
>
>

Reply via email to