Re: TSV Utilities release with LTO and PGO enabled

Jon Degenhardt via Digitalmars-d-announce Tue, 16 Jan 2018 20:40:40 -0800

On Tuesday, 16 January 2018 at 22:04:52 UTC, Johan Engelen wrote:

Because PGO optimizes for the given profile, it would help alot if you clarified how you do your PGO benchmarking. Whatkind of test load profile you used for optimization and whattest load you use for the time measurement.

The profiling used is checked into the repo and run as part of aPGO build, it is available for inspection. The benchmarks usedfor deltas are also documented, they the ones used in thebenchmark comparison to similar tools done in March 2017. Thisreport is in the repo(https://github.com/eBay/tsv-utils-dlang/blob/master/docs/Performance.md).

However, it's hard to imagine anyone perusing the repo for thisstuff, so I'll try to summarize what I did below.

Benchmarks - Six different tests of rather different but commonoperations run on large data files. The six tests were chosenbecause for each I was able to find at least three other tools,written in native compiled languages, with similar functionality.There are other valuable benchmarks, but I haven't published them.

Profiling - Profiling was developed separately for each tool. Foreach I generated several data files with data representative oftypical uses cases. Generally numeric or text data in severalforms and distributions. The data was unrelated to the data usedin benchmarks, which is from publicly available machine learningdata sets. However, personal judgement was used in the generationof the data sets, so it's not free from bias.

After generating the data, I generated a set of run optionsspecific to each tool. As an example, tsv-filter selects datafile lines based on various numeric and text criteria (e.g.less-than). There are a bit over 50 comparison operations, plus afew meta operations. The profiling runs ensure all the operationsare run at least once, but that the most important overweighted.The ldc.profile.resetAll call was used to exclude all the initialsetup code (command line argument processing). This was nicebecause it meant the data files could be small relative toreal-world sets, and it runs fast enough to do at part of thebuild step (ie. on Travis-CI).

Look athttps://github.com/eBay/tsv-utils-dlang/tree/master/tsv-filter/profile_data to see a concrete example (tsv-filter). In that directory are five data files and a shell script that runs the commands and collects the data.

This was done for four of the tools covering five of thebenchmarks. I skipped one of the tools (tsv-join), as it's harderto come up with a concise set of profile operations for it.

I then ran the standard benchmarks I usually report on in variousD venues.

Clearly personal judgment played a role. However, the tools arereasonably task focused, and I did take basic steps to ensure thebenchmark data and tests were separate from the trainingdata/tests. For these reasons, my confidence is good that theresults are reasonable and well founded.


--Jon

Re: TSV Utilities release with LTO and PGO enabled

Reply via email to