ryantam626 commented on PR #27924: URL: https://github.com/apache/beam/pull/27924#issuecomment-1672082225
> ... generally object creation of one or two should not make noticable difference, and Timestamp is light weight. Yes I am surprised it showed up that prominently in the flamegraph we obtained as well. > This suggest the involved code path is a hot path, not limited to Timestamp creation. Smells like it! > Would you mind sharing more about the benchmark / codes that doing the test ? I can try - the pipeline we have is not really a benchmark, but rather a production-ish pipeline that we use for analysing our data, the first few steps goes like this 1. Read from a BigQuery query which returns a sparse minutely summary from data collection devices 2. Assign timestamp using the summary's timestamp (already in the minute level resolution) 3. Window into using a SlidingWindows, 30 days window size, 1 day increment. 4. Unrelated (to this ticket) steps that analyse patterns follows. -------- I am curious to know if you have spotted any inefficiency in these first few steps. We have since made some improvement to our pipeline which incidentally side-stepped this slowness [1], but regardless I thought this change would be a nice addition seeing as it's almost a risk-free change which supposedly improve performance. [1] We went for a sliding window specification of 90 days window size, 30 days increment, trading extra memory usage (and small amount of correctness in our pipeline) for speed - as each minutely summary will only need to go into 3 bucket instead of 30 buckets under this sliding window specification. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
