[
https://issues.apache.org/jira/browse/BEAM-7029?focusedWorklogId=297385&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-297385
]
ASF GitHub Bot logged work on BEAM-7029:
----------------------------------------
Author: ASF GitHub Bot
Created on: 19/Aug/19 20:11
Start Date: 19/Aug/19 20:11
Worklog Time Spent: 10m
Work Description: manuelaguilar commented on issue #8251: [BEAM-7029] Add
KafkaIO.Read as external transform
URL: https://github.com/apache/beam/pull/8251#issuecomment-522734720
@mxm It seems performance depends mostly on the sink. I've been able to get
3000 msg/sec with a file sink (which doesn't complete the final write when I
cancel the job via Flink), and 2300 msg/sec with a recently patched version of
the mongodb sink. The pipeline had a map transform to get the value from every
KV element. This was done on a quad core Intel(R) Xeon(R) CPU E5-2695 v4 @
2.10GHz virtual machine with 8GB memory using a standalone Flink docker image
as runner endpoint.
I have observed our Google Dataflow n1-standard-1 instance (1vCPU, 3.75GB
memory) can process 2000 msg/sec per worker (using the respective Dataflow
runner). This was using a Java dataflow template.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 297385)
Time Spent: 16h 20m (was: 16h 10m)
> Support KafkaIO to be configured externally for use with other SDKs
> -------------------------------------------------------------------
>
> Key: BEAM-7029
> URL: https://issues.apache.org/jira/browse/BEAM-7029
> Project: Beam
> Issue Type: New Feature
> Components: io-java-kafka, runner-flink, sdk-py-core
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: Major
> Fix For: 2.13.0
>
> Time Spent: 16h 20m
> Remaining Estimate: 0h
>
> As of BEAM-6730, we can externally configure existing transforms from SDKs.
> We should add more useful transforms then just {{GenerateSequence}}.
> {{KafkaIO}} is a good candidate.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)