[ https://issues.apache.org/jira/browse/KAFKA-5515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068546#comment-16068546 ]
Neil Avery commented on KAFKA-5515: ----------------------------------- I've taken a look at dropping SimpleDateFormat and replacing it with commons-lang3-FastDateFormat (available in project but not a dependency on this module). Microbenchmarking diffs show SDF starts at 800ms/million then hotspots down to 250ms. Interestingly FDF starts at 400ms/million then gets down to 350ms (not very convincing). Calendar usage sucks performance and there is a degree of caching inside both of the impls. Looking at this in a different way "Segments" is a time-series slice/bucketing function to group/allocate/lookup segments etc. Does a real world calendar matter? - I've knocked together a simple math alternative that break into time-slice where all months/years are equals size. The time formatting is identical but day/month will be incorrect as a result of no calendar. This gets down to 150ms pretty much straight away. (still using SDF is still used for parsing). All tests pass, system runs fine etc - but I'm not sure of the gravity of this as a possible change - will it break things - any advice or feedback? > Consider removing date formatting from Segments class > ----------------------------------------------------- > > Key: KAFKA-5515 > URL: https://issues.apache.org/jira/browse/KAFKA-5515 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Bill Bejeck > Assignee: Neil Avery > Labels: performance > > Currently the {{Segments}} class uses a date when calculating the segment id > and uses {{SimpleDateFormat}} for formatting the segment id. However this is > a high volume code path and creating a new {{SimpleDateFormat}} and > formatting each segment id is expensive. We should look into removing the > date from the segment id or at a minimum use a faster alternative to > {{SimpleDateFormat}}. We should also consider keeping a lookup of existing > segments to avoid as many string operations as possible. -- This message was sent by Atlassian JIRA (v6.4.14#64029)