[jira] [Work logged] (BEAM-11657) Kafka read performance regression due to added header support

ASF GitHub Bot (Jira) Mon, 01 Feb 2021 02:37:23 -0800


     [ 
https://issues.apache.org/jira/browse/BEAM-11657?focusedWorklogId=545248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-545248
 ]


ASF GitHub Bot logged work on BEAM-11657:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Feb/21 10:36
            Start Date: 01/Feb/21 10:36
    Worklog Time Spent: 10m 
      Work Description: scwhittle commented on pull request #13782:
URL: https://github.com/apache/beam/pull/13782#issuecomment-770755221


   Not sure if there are dashboards but the perf results from console output 
were:
   16:21:21 org.apache.beam.sdk.io.kafka.KafkaIOIT > 
testKafkaIOReadsAndWritesCorrectlyInStreaming STANDARD_OUT
   16:21:21     Load test results for test (ID): 
539adecb-21ea-4aaf-935c-95c9bb9b91c5 and timestamp: 
2021-01-28T15:02:34.094000000Z:
   16:21:21                      Metric:                    Value:
   16:21:21                    read_time                     1.385
   16:21:21                   write_time                     9.316
   16:21:21                     run_time                    10.701
   
   The subsequent run 
https://ci-beam.apache.org/job/beam_PerformanceTests_Kafka_IO/1871/console had 
results:
   20:34:52 org.apache.beam.sdk.io.kafka.KafkaIOIT > 
testKafkaIOReadsAndWritesCorrectlyInStreaming STANDARD_OUT
   20:34:52     Load test results for test (ID): 
cb47f5d5-0102-49ce-8fd1-73eb3e6bbe40 and timestamp: 
2021-01-28T19:16:14.211000000Z:
   20:34:52                      Metric:                    Value:
   20:34:52                    read_time                     2.873
   20:34:52                   write_time                     14.97
   20:34:52                     run_time                    17.843
   
   I'm not sure how stable these are, write_time shouldn't directly be 
effected, but if the pipeline is doing both in parallel the CPU waste on 
reading could impact write performance as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 545248)
    Time Spent: 1h 40m  (was: 1.5h)

> Kafka read performance regression due to added header support
> -------------------------------------------------------------
>
>                 Key: BEAM-11657
>                 URL: https://issues.apache.org/jira/browse/BEAM-11657
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-kafka
>            Reporter: Sam Whittle
>            Assignee: Sam Whittle
>            Priority: P2
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Support for headers in KafkaIO reads was recently added:
> https://issues.apache.org/jira/browse/BEAM-10865
> This introduced several reflection calls into the path of advancing 
> KafkaUnboundedReader.  While separately running benchmarks, I noticed this 
> regression.  
> Calls currently come from:
> ConsumerSpEL.hasHeaders -> can be cached similar to other booleans
> deserialize key and value methods -> could be avoided in cases where headers 
> are not being examined (at a minimum can be avoided for known coders like 
> ByteArrayDeserializer)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (BEAM-11657) Kafka read performance regression due to added header support

Reply via email to