Re: [patch] Simplify ftconfig.h

2020-06-29 Thread Werner LEMBERG


> We should also decide whether to chose a dash or an underscore as
> the word separator for header and source file names :-) I have no
> favorites here, but being consistent will be less ambiguous for
> everyone.

I definitely prefer '-' over '_'.


Werner



Re: Overlap oversampling

2020-06-29 Thread Alexei Podtelezhnikov
David,

On Mon, Jun 29, 2020 at 6:58 PM David Turner  wrote:
> So, could have a deep look at the patches here. They're pretty neat. I'll 
> just recommend documenting the subtle computations in ft_smooth_slow_spans() 
> a little better, and avoid branches altogether, by using bit twiddling to 
> perform saturated addition instead (removing branches from loops is always 
> best for performance). I.e. something like the following:

I completely agree with sum | -(sum>>8). Another idea is sum - (sum>>8).

> Can you tell me how to actually test that the code works as expected though?

The proof-of-concept patch is set up to replace normal rendering with
oversampling when SCALE is 2 or 4. I was using this font:
https://github.com/adobe-fonts/source-serif-pro/tree/release/VAR

I suggested FT_RENDER_MODE_SLOW to explicitly discourage its use for
good fonts. We can do FT_RENDER_MODE_OVERSAMPLE instead. Werner also
suggested using OVERLAP_SIMPLE and OVERLAP_COMPOUND flags to trigger
this mode but they may be unset or unavailable. I think this mode
should be explicit.

Regards,
Alexei



Re: I'm back

2020-06-29 Thread Behdad Esfahbod
Hi David,

Thanks for the detailed response.

On Mon, Jun 29, 2020 at 5:07 PM David Turner  wrote:

>
>
> Le dim. 28 juin 2020 à 04:19, Behdad Esfahbod  a
> écrit :
>
>> Hi David,
>>
>> I've been meaning to reply to your request but the list of things I
>> wanted to communicate kept growing exponentially and so I have not been
>> able to say.
>>
>> But here, from a technical perspective this is my biggest issue, which
>> you can also see as a roadmap:
>>
>>   https://gitlab.gnome.org/GNOME/pango/-/issues/404#note_851881
>>
>>
> Thanks for the link Bhedad, this gives useful history about your woes and
> considerations. I'd like to address some of these here.
>
> First of all, the reason why FreeType has never been designed to be
> "thread-safe" (in the sense where it supports multiple threads operating on
> its objects concurrently [1]) is *entirely and very very intentional*.
>

I fully understand it was intentional.


> The library was designed, first and foremost to be as efficient as
> possible on embedded systems with limited memory and operating system
> primitives.
> Any kind of thread-safety scheme, even atomic refcounts, would have been
> detrimental to its portability, and most likely performance.
>

But that doesn't have to be the case. For example, HarfBuzz could always be
compiled with HB_NO_MT to remove all that overhead. See hb-atomic.hh We
even enable that in our HB_TINY configuration. See CONFIG.md and
hb-config.hh.


> Reducing memory usage means using in-place modification of existing
> objects and aggressive caching of state within various objects under a root
> FT_Library instance.
>

I'm not convinced that you lose *any* efficiency within the model that I
propose. I hold HarfBuzz to the same if not more extreme standards and we
have pretty much always been able to find a solution without compromising.
We continue to do that. In fact, last night I started a major overhaul of
the codebase.

That's why we have FT_Size and FT_GlyphSlot objects, very visible in the
> API, but there are also various levels of caching internally (most of them
> simple LRU lists that only change occasionally, but still a potential
> source of thread races).
>

Umm. I removed all of those a few years ago. Again, without any side effect.


> And as far as I know, this has always been pointed out explicitly in the
> FT_Library documentation.
>

Yeah I'm not claiming any of it is a surprise (there was one surprise I ran
into the other day that I'll write about separately).


> I understand that if you expected otherwise, you may have been
> disappointed (to keep it politely).
>

I came to this field with zero expectations. Everything I learned was from
how people in projects large and small needed to show text.


> However I do not consider the API to be "broken" regarding thread-safety,
> it's just not designed for this use case, at all. It solves a very
> different problem.
>

If you wish so. I'm not crazy about words. :-)


> Thus there is no way to "fix all thread-safety" issues in it, without
> changing the internals and the API tremendously.
>

Agree about the internals, but not the API. I think new API can be added
and old API kept. It will be a complete overhaul, I agree, but then again,
the project is way overdue for one. So I'm suggesting that instead of a
freetype3 that has incompatible ABI, just evolve freetype2 into freetype3,
and freetype4, ... and remove old cruft after ten years or twenty years or
never. That's what I'm leaving HarfBuzz with.

You may have made some modifications to the library to get rid of the most
> annoying thread-related issues you've encountered,
> but I assure you there are still by-design thread-races in many places.
> Also, some of the fixes seem a little dubious, i.e. the raster pool is now
> a fixed 16 kiB now, while it used to be resizable by the user.
>

In twenty years I never ran into *any* FreeType client who would get
anywhere like configuring anything like that. If they did, they'd just
modify the source.

Again, I understand everything I point out can be countered with a "some
might need". That's a difference in library design philosophy.


> This was useful when rendering large vector graphics (not fonts) with the
> FreeType rasterizers. Now the algorithm are essentially quadratic, which
> makes it painful or unusable for this use case.
>

Again, if there was a mission statement to FreeType, we could deduce
whether that should remain a goal.


> The root of your problem seems to be a vast impedance mismatch between the
> Cairo / Pango / Harfbuzz APIs which expect, or encourage, several threads
> (probably from clients, which would be worse) to use FT_Face objects
> liberally.
> This simply cannot work reliably.
>

We agree on the root cause indeed.


> My only immediate recommendation would be to use a different "face"
> abstraction object for these, that would either reimplement FreeType, or
> parts of it (the route Pango seems to have chosen), or wrap 

Re: [patch] Simplify ftconfig.h

2020-06-29 Thread David Turner
Le mer. 24 juin 2020 à 13:49, Alexei Podtelezhnikov  a
écrit :

> Hello David,
>
> >  create mode 100644 include/freetype/config/compiler_macros.h
> >  create mode 100644 include/freetype/config/integer_types.h
> >  create mode 100644 include/freetype/config/mac_support.h
> >  create mode 100644 include/freetype/internal/compiler_macros.h
>
> The same filename albeit in different folders might be confusing and
> prone to errors.


Oh, very good point, I'll change this.

Also _macros.h is a bit redundant.


I don't think so, the header is really about defining macros that are
compiler-specific.


> How about
> config/ftconfig-compiler.h and internal/ftcompiler.h?


First, we should avoid our cryptic short names now, the use of prefixes
like "ft" was due to the fact that some people didn't even have proper
directories when developing for embedded systems (a long long time ago).
So any public header began with "ft", all TrueType headers, began with
"tt", etc... We really don't need this anymore. We need to keep the public
header names consistent to avoid breaking all kinds of scripts that operate
on them, but for anything new, or under internal/ or even src/, we are now
free to rename everything to something vastly more meaningful.
The good thing is we don't have to do this all at once, progressively will
be good.

Also ftconfig-compiler.h makes me think about a compiler for configuration
files, I find "compiler_macros.h" or "compiler_defines.h" more explicit.



> Personally, I
> would rather reflect custody in the names: ftconfig-types.h
> ftconfig-mac.h. It might just be me but I am not a big fan of
> verbosity in C including filenames.
>
>
We should also decide whether to chose a dash or an underscore as the word
separator for header and source file names :-)
I have no favorites here, but being consistent will be less ambiguous for
everyone.



> Alexei
>


Re: I'm back

2020-06-29 Thread David Turner
Le dim. 28 juin 2020 à 04:19, Behdad Esfahbod  a écrit :

> Hi David,
>
> I've been meaning to reply to your request but the list of things I wanted
> to communicate kept growing exponentially and so I have not been able to
> say.
>
> But here, from a technical perspective this is my biggest issue, which you
> can also see as a roadmap:
>
>   https://gitlab.gnome.org/GNOME/pango/-/issues/404#note_851881
>
>
Thanks for the link Bhedad, this gives useful history about your woes and
considerations. I'd like to address some of these here.

First of all, the reason why FreeType has never been designed to be
"thread-safe" (in the sense where it supports multiple threads operating on
its objects concurrently [1]) is *entirely and very very intentional*.
The library was designed, first and foremost to be as efficient as possible
on embedded systems with limited memory and operating system primitives.
Any kind of thread-safety scheme, even atomic refcounts, would have been
detrimental to its portability, and most likely performance.
Reducing memory usage means using in-place modification of existing objects
and aggressive caching of state within various objects under a root
FT_Library instance.
That's why we have FT_Size and FT_GlyphSlot objects, very visible in the
API, but there are also various levels of caching internally (most of them
simple LRU lists that only change occasionally, but still a potential
source of thread races).
And as far as I know, this has always been pointed out explicitly in the
FT_Library documentation.

I understand that if you expected otherwise, you may have been disappointed
(to keep it politely). However I do not consider the API to be "broken"
regarding thread-safety, it's just not designed for this use case, at all.
It solves a very different problem.
Thus there is no way to "fix all thread-safety" issues in it, without
changing the internals and the API tremendously. You may have made some
modifications to the library to get rid of the most annoying thread-related
issues you've encountered,
but I assure you there are still by-design thread-races in many places.
Also, some of the fixes seem a little dubious, i.e. the raster pool is now
a fixed 16 kiB now, while it used to be resizable by the user.
This was useful when rendering large vector graphics (not fonts) with the
FreeType rasterizers. Now the algorithm are essentially quadratic, which
makes it painful or unusable for this use case.

The root of your problem seems to be a vast impedance mismatch between the
Cairo / Pango / Harfbuzz APIs which expect, or encourage, several threads
(probably from clients, which would be worse) to use FT_Face objects
liberally.
This simply cannot work reliably.

My only immediate recommendation would be to use a different "face"
abstraction object for these, that would either reimplement FreeType, or
parts of it (the route Pango seems to have chosen), or wrap FreeType
objects behind the right amount of
caching and thread-specific synchronization to make it work (like other
libraries like Skia do). I can't tell you which way is best. It would be
interesting to discuss which properties these "thread-safe font objects"
could have.
Maybe we can write a sane and small library on top of FreeType to provide
just that. I just don't think there is a way to do that within it.

Another alternative would be to refactor the FreeType library considerably
into a new version (with a different thread-safe API where it matters), but
that's a far harder problem, is likely to result in something that will be
larger, more complex internally, use more memory, and likely be slower.
At least we could rewrite the internals with a better language than C89 :)
But I'd rather have an extensive unit-test and regression test suite before
trying this.

- David

[1] Except if each thread has its own FT_Library instance (and
corresponding set of FT_Face/FT_Size/FT_GlyphSlot objects.

I will keep trying to comment on the two GSoC projects, and eventually get
> to write about the health of the freetype as a community project.
>
> Cheers,
> behdad
>
> On Fri, Apr 24, 2020 at 1:03 PM David Turner  wrote:
>
>> Hello freetype-devel@ list members,
>>
>> It's been a very very long time, but I have some free time in the coming
>> weeks to work on FreeType. Werner invited me to write a small announcement
>> here and I'm currently looking at the official bugs list.
>>
>> I'd like to know what are, in your opinion, the most pressing issues to
>> work on at that point?
>>
>> Apart from that, I had the following things in mind:
>>
>> - Improving / refactoring the build system a little. E.g. it should be
>> possible to simplify the rules.mk/module.mk files considerably, and
>> auto-generate most of the Makefiles / Jamfiles / CMakefiles from a single
>> source of truth (exact format to be defined), at least the parts that deal
>> with the headers / sources / configuration headers and the module
>> dependencies.
>>
>> - Improve testing (unit 

Re: Overlap oversampling

2020-06-29 Thread David Turner
So, could have a deep look at the patches here. They're pretty neat. I'll
just recommend documenting the subtle computations in
ft_smooth_slow_spans() a little better, and avoid branches altogether, by
using bit twiddling to perform saturated addition instead (removing
branches from loops is always best for performance). I.e. something like
the following:

  /* This function averages inflated spans in direct rendering mode.
   * It assumes that coverage spans are rendered in a SCALE*SCALE
   * inflated pixel space, and computes the contribution of each
   * span 'sub-pixel' to the target bitmap's pixel. I.e.:
   *
   *  If (x, y) are a pixel coordinates in inflated space, then
   *  (xt := x/SCALE, yt := y/SCALE) are the pixel coordinates in the target
   *  bitmap, where '/' denotes integer division.
   *
   *  Let's define GRIDSIZE := SCALE * SCALE, then if `c` is the 8-bit
coverage
   *  for (x, y) in inflated space, then its contribution to (xt, yt) would
be
   *  ct := c // GRIDSIZE, where '//' denotes division of real numbers (i.e.
   *  without truncation to a lower fixed or floating point precision).
   *
   *  Since these can only be stored on 8-bit target bitmap pixels, there
are
   *  at least two ways to approximate the sum:
   *
   * 1) Compute `ct := FLOOR(c // GRIDSIZE)`, which means that if all
   *pixels in inflated space have full coverage (i.e. value 255),
then
   *their contribution sums will be GRIDSIZE * FLOOR(255 /
GRIDSIZE),
   *which will be 252 (for SCALE == 2), or 240 (for SCALE == 4).
   *
   *A later passe will be needed to scale the values to the 0..255
   *range.
   *
   * 2) Compute `ct := ROUND(c // GRIDSIZE)`, in which case the total
   *contribution sum may reach 256 for both `SCALE == 2` and
   *`SCALE == 4`, which cannot be stored in an 8-bit pixel byte of
the
   *target bitmap. To deal with this, perform saturated arithmetic
to
   *ensure that the value never goes over 255. This avoids an
   *additional rescaling step, and is implemented below.
   */
  static void
  ft_smooth_slow_spans( int y,
int count,
const FT_Span*  spans,
TOrigin*target )
  {
unsigned char*  dst = target->origin - ( y / SCALE ) * target->pitch;
unsigned intx;

for ( ; count--; spans++ )
{
  unsigned coverage = (spans->coverage + GRIDSIZE / 2) / GRIDSIZE;


  for ( x = 0; x < spans->len; x++ )
  {
/* The following performs a saturated addition of d[0] + coverage */
unsigned char*  d = [(spans->x + x) / SCALE];
unsigned int  sum = d[0] + coverage;


d[0] = (FT_Byte)(d | -(sum >> 8));
  }
}
  }

Here's a Compiler Explorer link  that
compares the two implementations.


Can you tell me how to actually test that the code works as expected though?

Thanks

- David

Le mar. 23 juin 2020 à 20:16, David Turner  a écrit :

>
>
> Le mar. 23 juin 2020 à 05:42, Alexei Podtelezhnikov 
> a écrit :
>
>> Hi again,
>>
>> The oversampling is implemented though inflating the outline and then
>> averaging the increased number of cells using FT_RASTER_FLAG_DIRECT
>> mechanism. The first two patches set the stage by splitting the code
>> paths for LCD rendering out of the way and trying
>> FT_RASTER_FLAG_DIRECT for FT_RENDER_MODE_LCD. The third one implements
>> oversampling by replacing the normal rendering with oversampling if
>> SCALE is 2 or 4 (as opposed to 1). Again the proposal is to have it as
>> FT_RENDER_MODE_SLOW eventually. The slightly complicated averaging of
>> cells is due to 255/4+255/4+255/4+255/4 = 252 instead of 255, so we
>> have to do rounding, yet avoid overflowing.
>>
>> Thanks, I'll take a look at your patches.
>
> However, please don't call it FT_RENDER_MODE_SLOW, the fact that it is
> slow is an implementation detail, and we could very well replace this with
> a different algorithm in the future (maybe slow, maybe not). So something
> like FT_RENDER_MODE_OVERLAPPED_OUTLINES seems more appropriate, since it
> describes why you would want to use this mode, instead of what its
> performance profile is :-)
>
> Comments?
>>
>> Alexei
>>
>