Re: Aggregate over a column: the proper way to do

sam smith Sat, 09 Apr 2022 08:06:53 -0700

Yes. Returns the number of rows in the Dataset as *long*. but in my case
the aggregation returns a table of two columns.


Le ven. 8 avr. 2022 à 14:12, Sean Owen <sro...@gmail.com> a écrit :

> Dataset.count() returns one value directly?
>
> On Thu, Apr 7, 2022 at 11:25 PM sam smith <qustacksm2123...@gmail.com>
> wrote:
>
>> My bad, yes of course that! still i don't like the ..
>> select("count(myCol)") .. part in my line is there any replacement to that ?
>>
>> Le ven. 8 avr. 2022 à 06:13, Sean Owen <sro...@gmail.com> a écrit :
>>
>>> Just do an average then? Most of my point is that filtering to one group
>>> and then grouping is pointless.
>>>
>>> On Thu, Apr 7, 2022, 11:10 PM sam smith <qustacksm2123...@gmail.com>
>>> wrote:
>>>
>>>> What if i do avg instead of count?
>>>>
>>>> Le ven. 8 avr. 2022 à 05:32, Sean Owen <sro...@gmail.com> a écrit :
>>>>
>>>>> Wait, why groupBy at all? After the filter only rows with myCol equal
>>>>> to your target are left. There is only one group. Don't group just count
>>>>> after the filter?
>>>>>
>>>>> On Thu, Apr 7, 2022, 10:27 PM sam smith <qustacksm2123...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I want to aggregate a column by counting the number of rows having
>>>>>> the value "myTargetValue" and return the result
>>>>>> I am doing it like the following:in JAVA
>>>>>>
>>>>>>> long result =
>>>>>>> dataset.filter(dataset.col("myCol").equalTo("myTargetVal")).groupBy(col("myCol")).agg(count(dataset.col("myCol"))).select("count(myCol)").first().getLong(0);
>>>>>>
>>>>>>
>>>>>> Is that the right way? if no, what if a more optimized way to do that
>>>>>> (always in JAVA)?
>>>>>> Thanks for the help.
>>>>>>
>>>>>

Re: Aggregate over a column: the proper way to do

Reply via email to