Re: Measuring element sizes in benchmarks

Łukasz Gajowy Tue, 28 May 2019 10:55:11 -0700

Alexey,

sorry for the confusion then. Let me explain this better once more:

1. IO tests:

In IO tests we do not use the Synthetic Sources that generate the records.
We use a GenerateSequence class that generates a sequence of long values
and then map it to some records to finally write that to a data sink and
then read it back and assert if everything is correct. This approach makes
estimating the size per each test cumbersome - saying "I have 1 000 000
records from the GenerateSequence transform is not enough to estimate the
size because it's up to the Map step (different for some tests) that
creates the actual records to be saved. So each IO integration test is
different here.

What I am looking for (in case of IO tests) is a measurement of total size
so that it is:
 - easier to say how much data do we have exactly
 - calculate the throughput based on that size measurement in a universal
way that could be used in all IO tests

To have such a universal solution we figured that using the ByteMonitor to
measure the size of elements of type T (generic) will be ok. But it wasn't
and now we want to fix this. The counters Robert mentioned look tempting
because they would allow us to have both advantages and calculate the
throughput (am I right here?).

2. Load tests of Core Beam Operations:

Here the situation is slightly different. We use Synthetic Sources to
generate the data of KV<byte[], byte[]> shape. So by passing appropriate
pipeline options (keySize, valueSize, numOfRecords), we can determine what
is the total size of the load that we've put into the pipeline. However,
there is no assertion at the end of the test - those tests focus only on
measuring performance. They also fan out and form different DAGs so joining
the data back together to check if it didn't get lost is cumbersome too. To
check if "there is no data loss or corruption" we wanted to measure bytes
here (as is done in Nexmark as far as I know).

At first, we thought that we can use the same "ByteMonitor" to have the
proper value registered in dashboards. It seems, however, that we were
wrong and the performance dropped [1] and this is definitely not the size
we initially defined [2] so this is not the way to go. Again, maybe the
counters are the better idea?

We don't want to measure throughput in these load tests. Only in IO tests.

3. Finally, "the comparison":

I figured, that the difference between expected sizes (we expect 2GB here)
and the ones showed on the dashboards may be due to some memory allocation
overheads or other things happening in Beam. I thought it could be
interesting to track too to know "the overhead". On the other hand, when I
double checked, I'm not even sure now if the measurement in ByteMonitor is
correct (I might have been too hasty with proposing this. If so - sorry
about that).

Thanks!

[1]
https://apache-beam-testing.appspot.com/explore?dashboard=5643144871804928
[2]
https://apache-beam-testing.appspot.com/explore?dashboard=5701325169885184

wt., 28 maj 2019 o 18:08 Alexey Romanenko <[email protected]>
napisał(a):

> On 28 May 2019, at 17:31, Łukasz Gajowy <[email protected]> wrote:
>
>
> I'm not quite following what these sizes are needed for--aren't the
> benchmarks already tuned to be specific, known sizes?
>
> Maybe I wasn't clear enough. Such metric is useful mostly in IO tests -
> different IOs generate records of different size. It would be ideal for us
> to have a universal way to get total size so that we could provide some
> throughput measurement (we can easily get time). In Load tests we indeed
> have known sizes but as I said above in point 2 - maybe it's worthy to look
> at the other size as well (to compare)?
>
>
> Łukasz, I’m sorry but it’s still not clear for me - what is a point to
> compare these sizes? I want to say that If we already have a size of
> generated load (like expected data size) and processing time after the test
> run, then we can calculate throughput. In addition, we compute and check a
> hash of all processed data and compare it with expected hash to make sure
> that there is no data loss or corruption.  Do I miss something?
>
>
>
> especially for benchmarking purposes a 5x
> overhead means you're benchmarking the sizing code, not the pipeline
> itself.
>
> Exactly. We don't want to do this.
>
> Beam computes estimates for PCollection sizes by using coder and
> sampling and publishes these as counters. It'd be best IMHO to reuse
> this. Are these counters not sufficient?
>
> I didn't know that and this should do the trick! Is such counter available
> for all sdks (or at least Python and Java)? Is it supported for all runners
> (or at least Flink and Dataflow)? Where can I find it to see if it fits?
>
> Thanks!
>
>
> wt., 28 maj 2019 o 16:46 Robert Bradshaw <[email protected]> napisał(a):
>
>> I'm not quite following what these sizes are needed for--aren't the
>> benchmarks already tuned to be specific, known sizes? I agree that
>> this can be expensive; especially for benchmarking purposes a 5x
>> overhead means you're benchmarking the sizing code, not the pipeline
>> itself.
>>
>> Beam computes estimates for PCollection sizes by using coder and
>> sampling, and publishes these as counters. It'd be best IMHO to reuse
>> this. Are these counters not sufficient?
>>
>> On Tue, May 28, 2019 at 12:55 PM Łukasz Gajowy <[email protected]>
>> wrote:
>> >
>> > Hi all,
>> >
>> > part of our work while creating benchmarks for Beam is to collect total
>> data size (bytes) that was put inside the testing pipeline. We need that in
>> load tests of core beam operations (to see how big was the load really) and
>> IO tests (to calculate throughput). The "not so good" way we're doing it
>> right now is that we add a DoFn step called "ByteMonitor" to the pipeline
>> to get the size of every element using a utility called
>> "ObjectSizeCalculator [1].
>> >
>> > Problems with this approach:
>> > 1. It's computationally expensive. After introducing this change, tests
>> are 5x slower than before. This is due to the fact that now the size of
>> each record is calculated separately.
>> > 2. Naturally, the size of a particular record measured this way is
>> greater than the size of the generated key+values itself. Eg. if a
>> synthetic source generates key + value that has 10 bytes total, after
>> collecting the total bytes metric it's 8x greater (due to wrapping the
>> value in richer objects, allocating more memory than needed, etc).
>> >
>> > The main question here is: which size of particular records is more
>> interesting in benchmarks? The, let's call it, "net" size (key + value
>> size, and nothing else), or the "gross" size (including all allocated
>> memory for a particular element in PCollection and all the overhead of
>> wrapping it in richer objects)? Maybe both sizes are good to be measured?
>> >
>> > For the "net" size we probably could (should?) do something similar to
>> what Nexmark suites have: pre-define size per each element type and read it
>> once the element is spotted in the pipeline [3].
>> >
>> > What do you think? Is there any other (efficient + reliable) way of
>> measuring the total load size that I missed?
>> >
>> > Thanks for opinions!
>> >
>> > Best,
>> > Łukasz
>> >
>> > [1]
>> https://github.com/apache/beam/blob/a16a5b71cf8d399070a72b0f062693180d56b5ed/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/metrics/ByteMonitor.java
>> > [2] https://issues.apache.org/jira/browse/BEAM-7431
>> > [3]
>> https://github.com/apache/beam/blob/eb3b57554d9dc4057ad79bdd56c4239bd4204656/sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/model/KnownSize.java
>>
>
>

Re: Measuring element sizes in benchmarks

Reply via email to