Thank you very Much David, these queries you showed are really nice. I
meant that ideally I wouldn't need to install another package for a simple
filter operation on the rows.

-Júlio

2016-10-12 22:14 GMT-07:00 <anth...@berkeley.edu>:

> Were you worried about Query being not lightweight enough in terms of
> overhead, or in terms of syntax?
>
> I just added a more lightweight syntax for this scenario to Query. You can
> now do the following two things:
>
> q = @where(df, i->i.price > 30.)
>
> that will return a filtered iterator. You can materialize that into a
> DataFrame with collect(q, DataFrame).
>
> I also added a counting option. Turns out that is actually a LINQ query
> operator, and the goal is to implement all of those in Query. The syntax is
> simple:
>
> @count(df, i->i.price > 30.)
>
> returns the number of rows for which the filter condition is true.
>
> Under the hood both of these new syntax options use the normal Query
> machinery, this just provides a simpler syntax relative to the more
> elaborate things I've posted earlier. In terms of LINQ, this corresponds to
> the method invocation API that LINQ has. I'm still figuring out how to
> surface something like @count in the query expression syntax, but for now
> one can use it via this macro.
>
> All of this is on master right now, so you would have to do
> Pkg.checkout("Query") to get these macros.
>
> Best,
> David
>
> On Wednesday, October 12, 2016 at 6:47:15 PM UTC-7, Júlio Hoffimann wrote:
>>
>> Hi David,
>>
>> Thank you for your elaborated answer and for writing a package for
>> general queries, that is great! I will keep the package in mind if I need
>> something more complex.
>>
>> I am currently looking for a lightweight solution within DataFrames,
>> filtering is a very common operation. Right now, I am considering
>> converting the DataFrame to an array and looping over the rows. I wonder if
>> there is a syntactic sugar for this loop.
>>
>> -Júlio
>>
>> 2016-10-12 17:48 GMT-07:00 David Anthoff <ant...@berkeley.edu>:
>>
>>> Hi Julio,
>>>
>>>
>>>
>>> you can use the Query package for the first part. To filter a DataFrame
>>> using some arbitrary julia expression, use something like this:
>>>
>>>
>>>
>>> using DataFrames, Query, NamedTuples
>>>
>>>
>>>
>>> q = @from i in df begin
>>>
>>>     @where <filter expression>
>>>
>>>     @select i
>>>
>>> end
>>>
>>>
>>>
>>> You can use any julia code in <filter expression>. Say your DataFrame
>>> has a column called price, then you could filter like this:
>>>
>>>
>>>
>>> @where i.price > 30.
>>>
>>>
>>>
>>> The i will be a NamedTuple type, so you can access the columns either by
>>> their name, or also by their index, e.g.
>>>
>>>
>>>
>>> @where i[1] > 30.
>>>
>>>
>>>
>>> if you want to filter by the first column. You can also just call some
>>> function that you have defined somewhere else:
>>>
>>>
>>>
>>> @where foo(i)
>>>
>>>
>>>
>>> As long as the <julia expression> returns a Bool, you should be good.
>>>
>>>
>>>
>>> If you run a query like this, q will be a standard julia iterator. Right
>>> now you can’t just say length(q), although that is something I should
>>> probably enable at some point (I’m also looking into the VB LINQ syntax
>>> that supports things like counting in the query expression itself).
>>>
>>>
>>>
>>> But you could materialize the query as an array and then look at the
>>> length of that:
>>>
>>>
>>>
>>> q = @from i in df begin
>>>
>>>     @where <filter expression>
>>>
>>>     @select i
>>>
>>>     @collect
>>>
>>> end
>>>
>>> count = length(q)
>>>
>>>
>>>
>>> The @collect statement means that the query will return an array of a
>>> NamedTuple type (you can also materialize it into a whole bunch of other
>>> data structures, take a look at the documentation).
>>>
>>>
>>>
>>> Let me know if this works, or if you have any other feedback on
>>> Query.jl, I’m much in need of some user feedback for the package at this
>>> point. Best way for that is to open issues here
>>> https://github.com/davidanthoff/Query.jl.
>>>
>>>
>>>
>>> Best,
>>>
>>> David
>>>
>>>
>>>
>>> *From:* julia...@googlegroups.com [mailto:julia...@googlegroups.com] *On
>>> Behalf Of *Júlio Hoffimann
>>> *Sent:* Wednesday, October 12, 2016 5:20 PM
>>> *To:* julia-users <julia...@googlegroups.com>
>>> *Subject:* [julia-users] Filtering DataFrame with a function
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I have a DataFrame for which I want to filter rows that match a given
>>> criteria. I don't have the number of columns beforehand, so I cannot
>>> explicitly list the criteria with the :symbol syntax or write down a fixed
>>> number of indices.
>>>
>>>
>>>
>>> Is there any way to filter with a lambda expression? Or even better, is
>>> there any efficient way to count the number of occurrences of a specific
>>> row of observations?
>>>
>>>
>>>
>>> -Júlio
>>>
>>
>>

Reply via email to