Hi Li,

I've implemented most of the temporal rounding logic. The kernels have
not really been optimized at all yet as they are pretty new and not
completely finished (ambiguous behaviour due to DST [1], rounding
origin point [2], etc). Most effort so far was on making test sets and
getting the right results. Given that I'm actually positively
surprised with the 1:5 ratio compared to float addition.

David's proposal for benchmarking is a great starting point. I'll look
into it next.
An easy optimization right now would be better templating [3] and
perhaps simplification of rounding for sub-hour units.

[1] https://github.com/apache/arrow/pull/12528
[2] https://github.com/apache/arrow/pull/12657
[3] https://issues.apache.org/jira/browse/ARROW-15787

Rok

On Tue, Apr 12, 2022 at 10:32 PM Li Jin <ice.xell...@gmail.com> wrote:
>
> Thanks David!
>
> I am not yet familiar with the implementation of this kernel so I am
> hoping someone more familiar with kernels can shed some light on this. I
> wonder if this is kind of expected performance (comparing to similar kernel
> perf) or maybe something with the RoundTemporal implementation seems off?
>
> Steven (who ran the test) computed around 500 CPU cycles / value which
> seems more than what is needed but I am not an expert on the kernels so
> want to hear more thoughts from the dev.
>
> Li
>
> On Tue, Apr 12, 2022 at 4:19 PM David Li <lidav...@apache.org> wrote:
>
> > While we do track benchmarks for each commit on Conbench [1] it seems we
> > lack benchmarks for the temporal operations. I filed ARROW-16173 [2].
> >
> > They do do a bit more work than just a round (especially if they need to
> > handle time zones).
> >
> > [1]: https://conbench.ursa.dev/
> > [2]: https://issues.apache.org/jira/browse/ARROW-16173
> >
> > -David
> >
> > On Tue, Apr 12, 2022, at 15:40, Li Jin wrote:
> > > Sorry I should have mentioned this is the Arrow C++ compute kernels.
> > >
> > > On Tue, Apr 12, 2022 at 3:39 PM Li Jin <ice.xell...@gmail.com> wrote:
> > >
> > >> Hello!
> > >>
> > >> We recently noticed unexpected performance with Arrow's temporal
> > >> operation kernels (in particular, CeilTemporal). The perf we see are
> > around
> > >> 1.4-1.8 Gb / s. This seems to be much lower than adding a constant to a
> > >> float column (~9Gb/s). This is a bit unexpected because CeilTemporal is
> > >> similar to a numeric round operation so we are wondering if there are
> > some
> > >> benchmarks around this and where the issue might be?
> > >>
> > >> Thanks!
> > >> Li
> > >>
> >

Reply via email to