Having some kind of calibration that could run would be nice :) I
suppose single block read times on HDDs are likely stable, but wee
don't really know where the data is coming from. It could an HDD, an
SDD or even a network service with variable latency. So I'm not
convinced we'll ever get estimations that are really accurate. That's
why I'm suggesting we just call things cost units and leave it at
that. If we want to calibrate, what really matters is the ratio
between CPU time and data read time.
--
Michael Mior
mm...@apache.org

Le sam. 4 janv. 2020 à 15:10, Vladimir Sitnikov
<sitnikov.vladi...@gmail.com> a écrit :
>
> Technically speaking, single-block read time for HDDs is pretty much
> stable, so the use of seconds might be not that bad.
> However, it seconds might be complicated to measure CPU-like activity (e.g.
> different machines might execute EnumerableJoin at different rate :( )
>
>
> What if we benchmark a trivial EnumerableCalc(EnumerableTableScan) for a
> table of 100 rows and 10 columns
> and call it a single cost unit?
>
> In other words, we could have an etalon benchmark that takes X seconds and
> we could call it a single cost unit.
>
> For instance, org.apache.calcite.rel.core.Sort#computeSelfCost returns a
> cost.
> Of course, it has NLogN assumption, but which multiplier should it use?
>
> One could measure the wallclock time for the sort, and divide it by the
> time it takes to execute the etalon cost benchmark.
>
> WDYT?
>
> Vladimir

Reply via email to