Hi, I am doing some kind of performance testing. I submitted two Beam jobs, one is running with a Dataflow worker and the one Samza runner.
My Samza deployment is standalone. I have 10 workers for each job. My DAG is very basic *Read From Kafka -> BeamSQL filtter -> Write GCS* However I have the same DAG in one job three times. Jobs are reading from 3 different topics with 280 partitions. When I compare Samza worker and Dataflow performance for the exact same Beam job. One samza worker can not process more than *2.5k* messages But Dataflow worker can process *10K* messages per worker. Can you help me? Am I missing something? Or Samza Beam jobs are really slow ? My Samza beam Job settings > app.runner.class=org.apache.samza.runtime.LocalApplicationRunner > job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory > job.coordinator.zk.connect=10.64.2.78:2181 > > task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory > task.commit.ms=60000 > job.default.system=default > > systems.default.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory > systems.default.producer.bootstrap.servers= job.systemstreampartition.grouper.factory=org.apache.samza.container.grouper.stream.GroupBySystemStreamPartitionFactory > > metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory > metrics.reporters=jmx My Samza Runner Params > --runner=SamzaRunner --samzaExecutionEnvironment=STANDALONE > --maxSourceParallelism=300 --maxBundleSize=10000 --maxBundleTimeMs=10000 > --systemBufferSize=10000 Thanks in Advance