Re: ttfautohint's functionality from the removal of the infinality patch
>> It's not clear what you are actually asking. ttfautohint is an >> incarnation of FreeType's auto-hinter, translated into TrueType >> bytecode, more or less. This is not related to interpreting >> bytecode. > > But does it contain hidden assumptions about how bytecodes are > interpreted (by specific renderer, and specific version of > now-simplified renderer )? No hidden assumptions – everything is documented :-) https://freetype.org/ttfautohint/doc/ttfautohint.html#stem-width-and-positioning-mode Werner
Re: ttfautohint's functionality from the removal of the infinality patch
On Friday, 18 August 2023 at 22:27:26 BST, Werner LEMBERG wrote: > > Are we getting into a situation where ttfautohint is hinting for the > > (limited) "good enough" light hinting model of recent freetype? > It's not clear what you are actually asking. ttfautohint is an > incarnation of FreeType's auto-hinter, translated into TrueType > bytecode, more or less. This is not related to interpreting bytecode. But does it contain hidden assumptions about how bytecodes are interpreted (by specific renderer, and specific version of now-simplified renderer )? > Well, the Infinality stuff was basically unmaintained; it essentially > consisted of a large bunch of exceptions for some special fonts. > Alexei and I agree that we no longer need this. There was not a > single voice objecting to its removal, by the way... I agree there is no general usage for it. For the purpose of fontval's backend, until a genuine Microsoft backend happens (if ever), I suspect that the infinality stuff behaves closer to Microsoft renderer towards Microsoft hinted fonts. So if the goal is not "good enough hinting" but "matching Microsoft behavior", it has to stay. So FWIW, I am just mentioning that I will carry a reverse diff in future version of Fontval, if another release ever happens...
Re: ttfautohint's functionality from the removal of the infinality patch
> Yesterday's opentype font meeting touched upon hinting and > ttfautohint briefly. I see the infinality patch is already gone > (next release, 2.13.2 I guess - bits of it was removed in 2.13.1 > already). Question is, does its removal impact the functionality of > ttfautohint? No. > Are we getting into a situation where ttfautohint is hinting for the > (limited) "good enough" light hinting model of recent freetype? It's not clear what you are actually asking. ttfautohint is an incarnation of FreeType's auto-hinter, translated into TrueType bytecode, more or less. This is not related to interpreting bytecode. > And dropping support, and/or compatibility/ awareness of intended > usages of outcome in other *cough* microsoft *cough* hinting models? Well, the Infinality stuff was basically unmaintained; it essentially consisted of a large bunch of exceptions for some special fonts. Alexei and I agree that we no longer need this. There was not a single voice objecting to its removal, by the way... Werner
ttfautohint's functionality from the removal of the infinality patch
Hi, Yesterday's opentype font meeting touched upon hinting and ttfautohint briefly. I see the infinality patch is already gone (next release, 2.13.2 I guess - bits of it was removed in 2.13.1 already). Question is, does its removal impact the functionality of ttfautohint? Are we getting into a situation where ttfautohint is hinting for the (limited) "good enough" light hinting model of recent freetype? And dropping support, and/or compatibility/ awareness of intended usages of outcome in other *cough* microsoft *cough* hinting models? FWIW, fontval will carry the reverse diff. "Good enough" hinting isn't good enough - some people just want familarity, so nothing except Microsoft releasing a (binary-only) backend is good enough :-(. Unless/until that happens, variety (even if it is "poorer", for a value of "poorer") is important.
Re: -warmup
Hİ, I have edited the code aligning with the Hin-Tak’s suggestion. Here is the two results pages, also pushed on gitlab. Best, Goksu goksu.in On 18 Aug 2023 14:02 +0300, Werner LEMBERG , wrote: > > > What happens if you use, say, `-c 10', just running the > > > `Get_Char_Index` test? Are the percental timing differences then > > > still that large? > > Actually Get_Char_Index, on the three pages I have sent in the > > prev. mail, is higher than 6% only 4 times out of 15 total. (which is > > seem on other tests as well). > > Well, the thing is that IMHO the difference should be *much* smaller – > your HTML pages show the execution of identical code on an identical > machine, right? > > > about outliers, i splitted every tests into chuncks that is sized > > 100. Made IQR calculations and calculated average time on valid > > chunks. you can find the result in the attachment also pushed to > > gitlab. > > Thanks. Hin-Tak gave additional suggestions how to possibly improve > the removal of outliers. > > > also, since statistics and benchmarking are a sciences their self, i > > am a bit struggling while approaching the problem as well as feels > > like out of the gsoc project scope. > > Indeed, the focus lately shifted from a representational aspect to a > more thorough approach how to handle benchmarks in general. You are > done with the first part, more or less, and it looks fine. The > latter, however, is definitely part of the GSoC project, too, and I'm > surprised that you think this might not be so: What are benchmark > timings good for if the returned values are completely meaningless? > > In most cases, a small performance optimization in FreeType might > yield, say, an improvement of 1%. Right now, such a change would not > be detectable at all if using the framework you are working on – it > would be completely hidden by noise. > > To summarize: Benchmark comparisons only work if there is a sound > mathematical foundation to reduce the noise. I don't ask you to > reinvent the wheel, but please do some more internet research and > check existing code how to tackle such problems. I'm 100% sure that > such code already exists (for example, the Google benchmark stuff > mentioned in a previous e-mail, scientific papers on arXiv, etc., > etc.) and can be easily used, adapted, and simplified for our > purposes. > > > Werner Freetype Benchmark Results Warning: Baseline and Benchmark have the same commit ID! Info InfoBaselineBenchmark Parameters-c 550 -w 50-c 550 -w 50 Commit ID3553148135531481 Commit Date2023-08-18 02:04:38 +03002023-08-18 02:04:38 +0300 BranchGSoC-2023-AhmetGSoC-2023-Ahmet * Average time for all iterations. Smaller values are better.** N count in (x | y) format is for showing baseline and benchmark N counts seperately when they differs.Total Results TestNBaseline (µs)Benchmark (µs)Difference (%) Load25178180190899-7.1 Load_Advances (Normal)251616541594301.4 Load_Advances (Fast)2510471130-7.9 Load_Advances (Unscaled)259961006-1.0 Render252784102766440.6 Get_Glyph252135642093142.0 Get_Char_Index23500010129921.9 Iterate CMap2500873908-4.0 New_Face25001286213004-1.1 Embolden252290422261561.3 Stroke259561529556160.1 Get_BBox251809221771082.1 Get_CBox252118552114930.2 New_Face & load glyph(s)2529090287781.1 TOTAL299245565824524790.1 Results for Roboto_subset.ttf TestN* Baseline (µs)* Benchmark (µs)Difference (%) Load62996431568-5.3 Load_Advances (Normal)629942273428.7 Load_Advances (Fast)62332320.6 Load_Advances (Unscaled)62202200.0 Render65317953842-1.2 Get_Glyph63882039532-1.8 Get_Char_Index47000197198-0.3 Iterate CMap500185194-4.6 New_Face500230822671.8 Embolden64110941912-2.0 Stroke6213674213932-0.1 Get_BBox617216160356.9 Get_CBox63897640102-2.9 New_Face & load glyph(s)6556455540.2 TOTAL14160004715874729300.3 Results for Arial_subset.ttf TestN* Baseline (µs)* Benchmark (µs)Difference (%) Load475003783443165-14.1 Load_Advances (Normal)4750037650352156.5 Load_Advances (Fast)47500202284-40.6 Load_Advances (Unscaled)47500192198-3.1 Render47500
Re: -warmup
On Fri, 18 Aug 2023 11:02:49 + (UTC), Werner LEMBERG wrote: > To summarize: Benchmark comparisons only work if there is a sound > mathematical foundation to reduce the noise. I am probably not qualified, but I am following the discussion for some time. And I think there is a problem with the benchmarking itself. If I understand correctly the nice tables show the same code on the same machine so 40% difference or so is not ok. I had a quick look at ftbench.c and I have the impression that the timer ist using by clock_gettime for every single iteration twice. I had expected to do N iterations with a single clock_gettime before and after N iterations. If the benchmarked code is short this will accumulate errors that cannot be removed afterwards. But I may be wrong... Greetings, chris
Re: -warmup
>> What happens if you use, say, `-c 10', just running the >> `Get_Char_Index` test? Are the percental timing differences then >> still that large? > Actually Get_Char_Index, on the three pages I have sent in the > prev. mail, is higher than 6% only 4 times out of 15 total. (which is > seem on other tests as well). Well, the thing is that IMHO the difference should be *much* smaller – your HTML pages show the execution of identical code on an identical machine, right? > about outliers, i splitted every tests into chuncks that is sized > 100. Made IQR calculations and calculated average time on valid > chunks. you can find the result in the attachment also pushed to > gitlab. Thanks. Hin-Tak gave additional suggestions how to possibly improve the removal of outliers. > also, since statistics and benchmarking are a sciences their self, i > am a bit struggling while approaching the problem as well as feels > like out of the gsoc project scope. Indeed, the focus lately shifted from a representational aspect to a more thorough approach how to handle benchmarks in general. You are done with the first part, more or less, and it looks fine. The latter, however, is definitely part of the GSoC project, too, and I'm surprised that you think this might not be so: What are benchmark timings good for if the returned values are completely meaningless? In most cases, a small performance optimization in FreeType might yield, say, an improvement of 1%. Right now, such a change would not be detectable at all if using the framework you are working on – it would be completely hidden by noise. To summarize: Benchmark comparisons only work if there is a sound mathematical foundation to reduce the noise. I don't ask you to reinvent the wheel, but please do some more internet research and check existing code how to tackle such problems. I'm 100% sure that such code already exists (for example, the Google benchmark stuff mentioned in a previous e-mail, scientific papers on arXiv, etc., etc.) and can be easily used, adapted, and simplified for our purposes. Werner
Re: -warmup
Hi, The approach we initially took was, in fact, based on the principle of the interquartile range (IQR) – a method that excludes outliers by determining the range between the first and third quartiles. However, I understand from your feedback that directly focusing on the median and quantiles offers a clearer representation. I will adapt the code aligning with your suggestion. Best, Goksu goksu.in On 18 Aug 2023 1:04 PM +0300, Hin-Tak Leung , wrote: > > > On Friday, 18 August 2023 at 00:21:41 BST, Ahmet Göksu wrote: > > > > about outliers, i splitted every tests into chuncks that is sized 100. Made > > IQR calculations and calculated average time on valid chunks. you can find > > the result in the attachment also pushed to gitlab. > > > also, since statistics and benchmarking are a sciences their self, i am a > > bit struggling while approaching the problem as well as feels like out of > > the gsoc project scope. I would like to share this with your indulgence. > > yet, of course I will move in accordance with your instructions. > > Hmm, this is lacking basic maths skills... cutting into chucks and > recombining them aren’t going to deal with outliners. Read about "median", > "quantile" on Wikipedia/Google'ing. Anyway, you want to calculate the > "median" time. E.g. sort 100 numbers by size, getting the average of 50th and > 51th, and your error is the difference between the 91th and the 10th > quantile. ( the 10th and the 91th when you sort them in order of size). If > you can do that for the entire set, do it for the whole set; if not, a > running median - ie. The median of every chuck of 100. Then combine the > running medians. > > This way, the top 9 and bottom 9 values of each 100 have no contribution at > all to your outcome. This is dealing with outliners. > >
Re: -warmup
On Friday, 18 August 2023 at 00:21:41 BST, Ahmet Göksu wrote: > about outliers, i splitted every tests into chuncks that is sized 100. Made > IQR calculations and calculated average time on valid chunks. you can find > the result in the attachment also pushed to gitlab. > also, since statistics and benchmarking are a sciences their self, i am a bit > struggling while approaching the problem as well as feels like out of the > gsoc project scope. I would like to share this with your indulgence. yet, of > course I will move in accordance with your instructions. Hmm, this is lacking basic maths skills... cutting into chucks and recombining them aren’t going to deal with outliners. Read about "median", "quantile" on Wikipedia/Google'ing. Anyway, you want to calculate the "median" time. E.g. sort 100 numbers by size, getting the average of 50th and 51th, and your error is the difference between the 91th and the 10th quantile. ( the 10th and the 91th when you sort them in order of size). If you can do that for the entire set, do it for the whole set; if not, a running median - ie. The median of every chuck of 100. Then combine the running medians. This way, the top 9 and bottom 9 values of each 100 have no contribution at all to your outcome. This is dealing with outliners.