[jira] [Commented] (BEAM-12181) Implement parallelized (approximate) mode

Svetak Vihaan Sundhar (Jira) Mon, 28 Feb 2022 14:21:05 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499204#comment-17499204
 ]


Svetak Vihaan Sundhar commented on BEAM-12181:
----------------------------------------------

Agreed on the points above.

 

Also also, we might consider providing our own kde() method that does the 
kernel density estimation (and building approximate mode on that). Pandas 
doesn't have this, but it does have series.plot.kde. Probably pandas doesn't 
have it just because it's easy enough for their users to use the scipy one.

 

>>My thoughts on this were to just use  {{scipy.gaussian_kde.}}

{{{}{}}}{{{}{}}}

Looking at online implementations such as 

[http://rmflight.github.io/posts/2018-07-19-finding-modes-using-kernel-density-estimates/]

 

Could you elaborate on why it may be necessary to create our own implementation 
of kde()? Is the scipy impl incompatible with the series format?

 

I'm thinking it may make more sense to use the function, rather than creating 
our own.

> Implement parallelized (approximate) mode
> -----------------------------------------
>
>                 Key: BEAM-12181
>                 URL: https://issues.apache.org/jira/browse/BEAM-12181
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-dataframe, sdk-py-core
>            Reporter: Brian Hulette
>            Priority: P3
>              Labels: dataframe-api
>
> Currently we require Singleton partitioning to compute mode(). We should 
> provide an option to compute approximate mode() which can be parallelized.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (BEAM-12181) Implement parallelized (approximate) mode

Reply via email to