Hi Júlio,

If you're just interested in using an arbitrary function to filter on rows 
you can do something like:

df = DataFrame(Fish = ["Amir", "Betty", "Clyde"], Mass = [1.2, 3.3, 0.4])
filter(row) = (row[:Fish][1] != "A")&(row[:Mass]>1)
df = df[[filter(r) for r in eachrow(df)],:]

Is that what you're looking for?  If not, can you give an example of what 
you want to do?

Best,

Alex

On Wednesday, October 12, 2016 at 10:20:52 PM UTC-7, Júlio Hoffimann wrote:
>
> Thank you very Much David, these queries you showed are really nice. I 
> meant that ideally I wouldn't need to install another package for a simple 
> filter operation on the rows.
>
> -Júlio
>
> 2016-10-12 22:14 GMT-07:00 <ant...@berkeley.edu <javascript:>>:
>
>> Were you worried about Query being not lightweight enough in terms of 
>> overhead, or in terms of syntax?
>>
>> I just added a more lightweight syntax for this scenario to Query. You 
>> can now do the following two things:
>>
>> q = @where(df, i->i.price > 30.)
>>
>> that will return a filtered iterator. You can materialize that into a 
>> DataFrame with collect(q, DataFrame).
>>
>> I also added a counting option. Turns out that is actually a LINQ query 
>> operator, and the goal is to implement all of those in Query. The syntax is 
>> simple:
>>
>> @count(df, i->i.price > 30.)
>>
>> returns the number of rows for which the filter condition is true.
>>
>> Under the hood both of these new syntax options use the normal Query 
>> machinery, this just provides a simpler syntax relative to the more 
>> elaborate things I've posted earlier. In terms of LINQ, this corresponds to 
>> the method invocation API that LINQ has. I'm still figuring out how to 
>> surface something like @count in the query expression syntax, but for now 
>> one can use it via this macro.
>>
>> All of this is on master right now, so you would have to do 
>> Pkg.checkout("Query") to get these macros.
>>
>> Best,
>> David
>>
>> On Wednesday, October 12, 2016 at 6:47:15 PM UTC-7, Júlio Hoffimann wrote:
>>>
>>> Hi David,
>>>
>>> Thank you for your elaborated answer and for writing a package for 
>>> general queries, that is great! I will keep the package in mind if I need 
>>> something more complex.
>>>
>>> I am currently looking for a lightweight solution within DataFrames, 
>>> filtering is a very common operation. Right now, I am considering 
>>> converting the DataFrame to an array and looping over the rows. I wonder if 
>>> there is a syntactic sugar for this loop.
>>>
>>> -Júlio
>>>
>>> 2016-10-12 17:48 GMT-07:00 David Anthoff <ant...@berkeley.edu>:
>>>
>>>> Hi Julio,
>>>>
>>>>  
>>>>
>>>> you can use the Query package for the first part. To filter a DataFrame 
>>>> using some arbitrary julia expression, use something like this:
>>>>
>>>>  
>>>>
>>>> using DataFrames, Query, NamedTuples
>>>>
>>>>  
>>>>
>>>> q = @from i in df begin
>>>>
>>>>     @where <filter expression>
>>>>
>>>>     @select i
>>>>
>>>> end
>>>>
>>>>  
>>>>
>>>> You can use any julia code in <filter expression>. Say your DataFrame 
>>>> has a column called price, then you could filter like this:
>>>>
>>>>  
>>>>
>>>> @where i.price > 30.
>>>>
>>>>  
>>>>
>>>> The i will be a NamedTuple type, so you can access the columns either 
>>>> by their name, or also by their index, e.g.
>>>>
>>>>  
>>>>
>>>> @where i[1] > 30.
>>>>
>>>>  
>>>>
>>>> if you want to filter by the first column. You can also just call some 
>>>> function that you have defined somewhere else:
>>>>
>>>>  
>>>>
>>>> @where foo(i)
>>>>
>>>>  
>>>>
>>>> As long as the <julia expression> returns a Bool, you should be good.
>>>>
>>>>  
>>>>
>>>> If you run a query like this, q will be a standard julia iterator. 
>>>> Right now you can’t just say length(q), although that is something I 
>>>> should 
>>>> probably enable at some point (I’m also looking into the VB LINQ syntax 
>>>> that supports things like counting in the query expression itself).
>>>>
>>>>  
>>>>
>>>> But you could materialize the query as an array and then look at the 
>>>> length of that:
>>>>
>>>>  
>>>>
>>>> q = @from i in df begin
>>>>
>>>>     @where <filter expression>
>>>>
>>>>     @select i
>>>>
>>>>     @collect
>>>>
>>>> end
>>>>
>>>> count = length(q)
>>>>
>>>>  
>>>>
>>>> The @collect statement means that the query will return an array of a 
>>>> NamedTuple type (you can also materialize it into a whole bunch of other 
>>>> data structures, take a look at the documentation).
>>>>
>>>>  
>>>>
>>>> Let me know if this works, or if you have any other feedback on 
>>>> Query.jl, I’m much in need of some user feedback for the package at this 
>>>> point. Best way for that is to open issues here 
>>>> https://github.com/davidanthoff/Query.jl.
>>>>
>>>>  
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>>  
>>>>
>>>> *From:* julia...@googlegroups.com [mailto:julia...@googlegroups.com] *On 
>>>> Behalf Of *Júlio Hoffimann
>>>> *Sent:* Wednesday, October 12, 2016 5:20 PM
>>>> *To:* julia-users <julia...@googlegroups.com>
>>>> *Subject:* [julia-users] Filtering DataFrame with a function
>>>>
>>>>  
>>>>
>>>> Hi,
>>>>
>>>>  
>>>>
>>>> I have a DataFrame for which I want to filter rows that match a given 
>>>> criteria. I don't have the number of columns beforehand, so I cannot 
>>>> explicitly list the criteria with the :symbol syntax or write down a fixed 
>>>> number of indices.
>>>>
>>>>  
>>>>
>>>> Is there any way to filter with a lambda expression? Or even better, is 
>>>> there any efficient way to count the number of occurrences of a specific 
>>>> row of observations?
>>>>
>>>>  
>>>>
>>>> -Júlio
>>>>
>>>
>>>
>

Reply via email to