Thanks both for the reply. It's understandable that those kernels might not be optimized right now considering the state of the Arrow compute.
> The temporal rounding operations operate on localized times taking into account the timestamp's timezone, which is why they're more computationally intensive that raw floating point operations. The examples we ran are on UTC timestamp without any timezone complications, perhaps there is room for short circuits when there are no timezone complications... > Which operation in particular did you benchmark? Is it part of a significant workload for you or did you just try it out of curiosity? We are trying to evaluate baseline performance on "Temporal Round" + "GroupBy Aggregation" (around a stream of time series data to 5min interval and aggregate) and noticed this issue. This is not an urgent issue, I am asking here mostly because I'd like to understand what might be going on. The responses have been helpful. Thank you! On Wed, Apr 13, 2022 at 3:28 AM Antoine Pitrou <anto...@python.org> wrote: > > Hello Li, > > The temporal rounding operations operate on localized times taking into > account the timestamp's timezone, which is why they're more > computationally intensive that raw floating point operations. > > Which operation in particular did you benchmark? Is it part of a > significant workload for you or did you just try it out of curiosity? > > Regards > > Antoine. > > > > > Le 12/04/2022 à 22:31, Li Jin a écrit : > > Thanks David! > > > > I am not yet familiar with the implementation of this kernel so I am > > hoping someone more familiar with kernels can shed some light on this. I > > wonder if this is kind of expected performance (comparing to similar > kernel > > perf) or maybe something with the RoundTemporal implementation seems off? > > > > Steven (who ran the test) computed around 500 CPU cycles / value which > > seems more than what is needed but I am not an expert on the kernels so > > want to hear more thoughts from the dev. > > > > Li > > > > On Tue, Apr 12, 2022 at 4:19 PM David Li <lidav...@apache.org> wrote: > > > >> While we do track benchmarks for each commit on Conbench [1] it seems we > >> lack benchmarks for the temporal operations. I filed ARROW-16173 [2]. > >> > >> They do do a bit more work than just a round (especially if they need to > >> handle time zones). > >> > >> [1]: https://conbench.ursa.dev/ > >> [2]: https://issues.apache.org/jira/browse/ARROW-16173 > >> > >> -David > >> > >> On Tue, Apr 12, 2022, at 15:40, Li Jin wrote: > >>> Sorry I should have mentioned this is the Arrow C++ compute kernels. > >>> > >>> On Tue, Apr 12, 2022 at 3:39 PM Li Jin <ice.xell...@gmail.com> wrote: > >>> > >>>> Hello! > >>>> > >>>> We recently noticed unexpected performance with Arrow's temporal > >>>> operation kernels (in particular, CeilTemporal). The perf we see are > >> around > >>>> 1.4-1.8 Gb / s. This seems to be much lower than adding a constant to > a > >>>> float column (~9Gb/s). This is a bit unexpected because CeilTemporal > is > >>>> similar to a numeric round operation so we are wondering if there are > >> some > >>>> benchmarks around this and where the issue might be? > >>>> > >>>> Thanks! > >>>> Li > >>>> > >> > > >