[
https://issues.apache.org/jira/browse/BEAM-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17499204#comment-17499204
]
Svetak Vihaan Sundhar commented on BEAM-12181:
----------------------------------------------
Agreed on the points above.
Also also, we might consider providing our own kde() method that does the
kernel density estimation (and building approximate mode on that). Pandas
doesn't have this, but it does have series.plot.kde. Probably pandas doesn't
have it just because it's easy enough for their users to use the scipy one.
>>My thoughts on this were to just use {{scipy.gaussian_kde.}}
{{{}{}}}{{{}{}}}
Looking at online implementations such as
[http://rmflight.github.io/posts/2018-07-19-finding-modes-using-kernel-density-estimates/]
Could you elaborate on why it may be necessary to create our own implementation
of kde()? Is the scipy impl incompatible with the series format?
I'm thinking it may make more sense to use the function, rather than creating
our own.
> Implement parallelized (approximate) mode
> -----------------------------------------
>
> Key: BEAM-12181
> URL: https://issues.apache.org/jira/browse/BEAM-12181
> Project: Beam
> Issue Type: Improvement
> Components: dsl-dataframe, sdk-py-core
> Reporter: Brian Hulette
> Priority: P3
> Labels: dataframe-api
>
> Currently we require Singleton partitioning to compute mode(). We should
> provide an option to compute approximate mode() which can be parallelized.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)