On Sat, Apr 30, 2016 at 11:26 PM, L. David Baron <dba...@dbaron.org> wrote:
> On Friday 2016-04-29 10:43 +0300, Henri Sivonen wrote:
> I still find it sad that ECMAScript Intl came (as I understand it)
> very close to just standardizing on a piece of software (ICU),

Looking at the standard, it seems intentionally vague about what data
sources are supported, and that's not good for a Web standard.
However, it seems to me that in practice there is no standardized
dependency on ICU but on the CLDR database maintained by Unicode.org.
In a C or C++ program, the easiest and least-NIH way to expose CLDR is
to use ICU like Google and Apple do and like we do on desktop. I'm not
sure what Microsoft does, considering that these days they are no
longer opposed to using open source software, but I believe that Edge
exposes CLDR via some non-ICU Microsoft-developed mechanism. So it
seems like there are two independent interoperable implementations as
far as code goes.

> and
> also find it disturbing that we're going to extend that
> standardization on a particular piece of software (possibly even
> more rigidly) into other areas.

As noted, these other areas are why I care about having ICU
unconditionally available on all platforms that Gecko runs on and why
I think it's harmful when ICU is blocked on one platform. Also, as
noted, I don't care about ICU per se, but I care about being able to
treat operations like Unicode normalization and locale-aware collation
as foundational operations whose availability or correctness is not
optional. I think it would be ideal if we had a library or set of
libraries written in Rust that provided this functionality, but until
such a library written in Rust shows up, ICU is the only option on the
table today that is (if bundled in Gecko in its latest version)
correct and cross-platform consistent.

I think it is harmful that we have to maintain abstractions for
foundational operations to support a configuration where the back end
isn't correct (to latest Unicode data) and cross-platform consistent.
Until Rust-based replacements show up, the most reasonable way to
perform operations that depend on Unicode.org data is to bundle ICU
and to call its APIs directly without abstraction layers in between.

Again, talking about ICU as just an enabler of the ECMAScript
Internationalization API is a bad framing for the issue, because it
makes it seem like blocking ICU "just" turns off one fairly recent Web
API. Yet, Gecko has needs for functionality exposed by ICU in various
places. For example:

 * Hyphenation, spellchecking, layout, gfx and JavaScript parsing need
access to the character properties of the Unicode database. Currently,
we duplicate ICU functionality (in out-of-date manner I believe) to
implement these in libxul.

 * Internationalized domain names and text shaping need Unicode
Normalization. Currently, we duplicate ICU functionality (in
out-of-date manner I believe) to implement these in libxul.

 * IndexedDB, XPath, XUL, SQL storage and history search UI use
locale-sensitive sorting. Currently, we duplicate ICU functionality on
Android for these by calling into the thread-unsafe C standard library
API for this stuff. This is fundamentally broken, because the design
of the C standard library is fundamentally broken: In the C standard
library, there's no to ask for comparison according to a given locale.
Instead, we set the locale process-wide (all threads!), then ask the C
standard library to do a comparison, and then unset the locale
process-wide.

 * Parts of the Firefox UI do locale-sensitive datetime formatting in
a way that calls to legacy platform APIs duplicating ICU function in a
manner that imports system-specific bugs.

 * Based on open bugs, it seems we duplicate ICU functionality for
bidi, but it's not clear to me if we're already building that part of
ICU anyway and the relative correctness is unclear to me.

I think it's neither good use of developer time nor holistic
management of product size in bytes to have this duplication sprinkled
around. (Though I don't believe that getting rid of the above
duplication of ICU functionality would add up to the size of ICU
itself: We should expect ICU to be a net addition to APK size in any
case.)

It's worth noting that the above items split into on one hand the
Unicode character property database and associated algorithms
(normalization, bidi, line breaking, script run identification) and on
the other hand the CLDR database and associated algorithms
(locale-sensitive sorting, date formatting, number formatting, etc.).
We have more foundational dependency needs on the former than the
latter, but the discussion about ICU size as well as the ECMAScript
Internationalization API exposure is mainly about the latter.

Again, ideally, we'd have a an actively-maintained Rust library for
the Unicode character property database and associated algorithms and
another actively-maintained Rust library the CLDR database and
associated algorithms. But absent such libraries showing up, ICU is
what we have available and should use until then. That is, I think we
should systematically use a single source of Gecko-bundled
foundational functionality for the Unicode character property database
and associated algorithms and I think we should systematically use a
single source of Gecko-bundled foundational functionality for the CLDR
database and associated algorithms. (And we've already turned off a
bunch of stuff in ICU that doesn't appear on the above list.)

> I think many of the arguments we
> made against standardizing on SQLite seem to apply to ICU as well,
> such as security risk and needing to reverse-engineer when writing
> future implementations of parts of the Web platform.

Is there any actual evidence that non-ICU exposure of the CLDR
data-based operations wasn't interoperable with ICU exposure of the
CLDR data-based operations?

> While I expect that some of the features that Intl provides (from
> ICU data) are worthwhile in terms of codesize, I'm certainly not
> confident that they all are.  I have similar worries about other
> large chunks of code that land in our tree...

What other chunks?

What bothers me the most regarding size of what we ship is

 * Failure to make the most out of compression (i.e. Zopfli) before
objecting to the addition of new things stuff. I've brought this up
before, but just now, I downloaded the (en-US API level 15) APK for
Fennec 46 and ran ImageOptim (https://imageoptim.com/mac) on the PNG
files included directly in the APK (i.e. not the one hidden inside
omni.ja). ImageOptim says: "Saved 311KB out of 1.7MB. 28.6% per file
on average (up to 94.3%)." (There wasn't a single already-optimal PNG
there!) Additionally, the same exercise could be repeated for images
in omni.ja. Then all the XML and JS could be Zopflified. The bundled
.ttf files could be turned into Brotli-compressed WOFF2 files. All as
a matter of being smarter about how we use lossless compression on
stuff that we already losslessly compress without adding any novel
decompression capability to the product and without revisiting the
decision not to compress .so files. (Though maybe we should look into
building ICU as a separate .so and seeing how well Zopfli compresses
the CLDR data.)

 * Failure to remove old mostly useless cruft like nsIEntityConverter
before blocking new useful stuff like ICU. (Not the same size, sure.)

 * Landing JS polyfills for platform features that we implement
natively when checking in front end code that imports cross-browser JS
libraries. (I was particularly saddened to see Hello land a JS
polyfill for TextDecoder at the time when I was working on removing
dead weight from our Gecko-level encoding converters.)

> And when I say worthwhile, I'm talking not just about whether the
> feature is intrinsically valuable, but whether it's actually going
> to be used by Web developers to get that value to users.

Sadly, exposure of CLDR data-based operations to the Web (whether via
the ECMAScript Internationalization API or IndexedDB) is the sort of
this that is unlikely to deliver value until Web app developers feel
that it's OK to unconditionally rely on the browser providing that
functionality. In that sense, refusal to ship the feature on some
platform citing lack of use on the Web is exactly the sort of thing
that will significantly lower the probability that use on the Web will
take off thereby making the feature dead weight even on platforms
where it is shipped.

> I think enough of our users are on Windows XP that decisions about
> dropping Windows XP are not a purely engineering decision.

I don't recall it ever being explicitly articulated why having x
number or percentage of users on XP leads to the conclusion that XP
must remain supported. Having x users on XP is quite different from
having x users on a more current platform, because XP itself is no
longer getting security patches and because our competitors are no
longer delivering security patches for their browsers and XP.

If the concern is that it's wrong to stop delivering security patches
for x users, surely it can't be Mozilla's responsibility to
indefinitely continue to deliver browser security patches to users who
already tolerate running a system that doesn't get security patches
and for which other browsers have already earlier stop delivering
security patches. One might argue that it's wrong to give these users
the impression that it's OK to continue to stay on a system that
doesn't get security patches.

If the concern is that the users would switch to another browser,
making a dent in in Firefox's overall observed market share, as long
as the users stick to XP, what would they switch to? Security patches
for IE and Chrome have already stopped earlier and considering that XP
support was actually ripped out of Chromium, Opera and other Chromium
re-packagers probably won't be able to continue to support XP,
either).

If the concern is that users would switch to another browser when they
switch off XP having felt abandoned on XP, why should we expect that
to be the case considering that IE and Chrome abandoned them earlier?

As far as I can tell, the main effect that we should expect from
dropping XP or putting XP on ESR is the market share of the *latest*
Firefox dropping (as opposed to the market share of Firefox in general
dropping). Maybe that's the concern and maybe everyone is supposed to
know that that's the concern, but I think it would still help to
articulate what the concern is instead of just pointing to having lots
of users on XP.

> (Do we
> still have more Windows XP users than we have on all non-Windows
> platforms combined?)  Pushing those users to ESR without buy-in from
> all parts of the organization will likely lead to worse engineering
> problems than having to support XP (e.g., having to support 45ESR
> substantially longer).

That's certainly a risk of pushing XP to ESR.

>  (It also seems to be that we need to answer the question,
> already raised in this thread, about whether the parts that are
> expensive for them to support intersect at all with the parts that
> we use.)

Indeed, but on our side, we should remember tally every thread like
this as part of the cost of supporting XP on our side.

-- 
Henri Sivonen
hsivo...@hsivonen.fi
https://hsivonen.fi/
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to