On Tuesday, 16 January 2018 at 22:04:52 UTC, Johan Engelen wrote:
Because PGO optimizes for the given profile, it would help a lot if you clarified how you do your PGO benchmarking. What kind of test load profile you used for optimization and what test load you use for the time measurement.

The profiling used is checked into the repo and run as part of a PGO build, it is available for inspection. The benchmarks used for deltas are also documented, they the ones used in the benchmark comparison to similar tools done in March 2017. This report is in the repo (https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md).

However, it's hard to imagine anyone perusing the repo for this stuff, so I'll try to summarize what I did below.

Benchmarks - Six different tests of rather different but common operations run on large data files. The six tests were chosen because for each I was able to find at least three other tools, written in native compiled languages, with similar functionality. There are other valuable benchmarks, but I haven't published them.

Profiling - Profiling was developed separately for each tool. For each I generated several data files with data representative of typical uses cases. Generally numeric or text data in several forms and distributions. The data was unrelated to the data used in benchmarks, which is from publicly available machine learning data sets. However, personal judgement was used in the generation of the data sets, so it's not free from bias.

After generating the data, I generated a set of run options specific to each tool. As an example, tsv-filter selects data file lines based on various numeric and text criteria (e.g. less-than). There are a bit over 50 comparison operations, plus a few meta operations. The profiling runs ensure all the operations are run at least once, but that the most important overweighted. The ldc.profile.resetAll call was used to exclude all the initial setup code (command line argument processing). This was nice because it meant the data files could be small relative to real-world sets, and it runs fast enough to do at part of the build step (ie. on Travis-CI).

Look at https://github.com/eBay/tsv-utils-dlang/tree/master/tsv-filter/profile_data to see a concrete example (tsv-filter). In that directory are five data files and a shell script that runs the commands and collects the data.

This was done for four of the tools covering five of the benchmarks. I skipped one of the tools (tsv-join), as it's harder to come up with a concise set of profile operations for it.

I then ran the standard benchmarks I usually report on in various D venues.

Clearly personal judgment played a role. However, the tools are reasonably task focused, and I did take basic steps to ensure the benchmark data and tests were separate from the training data/tests. For these reasons, my confidence is good that the results are reasonable and well founded.


Reply via email to