[
https://issues.apache.org/jira/browse/BEAM-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16163818#comment-16163818
]
Kenneth Knowles edited comment on BEAM-2516 at 9/12/17 11:14 PM:
-----------------------------------------------------------------
Went through the commit history to be sure, and things are quick before
fa3a5abbc94db629feae8d7d73a31e7dda06bf76 while they are slow afterwards, so it
is isolated to the use of dehydration-insensitive APIs in the ParDo evaluator,
as suspected.
was (Author: kenn):
Went through the commit history to be sure, and things are quick before
4b355844a4920bc9faba75f7cd61008bedebaf29 while they are slow afterwards, so it
is isolated to the use of dehydration-insensitive APIs in the ParDo evaluator,
as suspected.
> User reports 4 minutes to process 1 million line CSV in DirectRunner
> --------------------------------------------------------------------
>
> Key: BEAM-2516
> URL: https://issues.apache.org/jira/browse/BEAM-2516
> Project: Beam
> Issue Type: Bug
> Components: runner-direct
> Reporter: Kenneth Knowles
> Assignee: Kenneth Knowles
> Priority: Minor
> Fix For: 2.2.0
>
>
> https://stackoverflow.com/questions/44736414/simple-apache-beam-manipulations-work-very-slow
> I don't know what the expectation are here, so I wasn't ready to say this is
> WAI. Low priority since it isn't what the runner is for anyhow, but this
> seems like the scale of data that should be snappy. Worth investigating, or
> maybe you can quickly indicate why it is expected?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)