On Sat, Apr 30, 2016 at 11:26 PM, L. David Baron <dba...@dbaron.org> wrote: > On Friday 2016-04-29 10:43 +0300, Henri Sivonen wrote: > I still find it sad that ECMAScript Intl came (as I understand it) > very close to just standardizing on a piece of software (ICU),
Looking at the standard, it seems intentionally vague about what data sources are supported, and that's not good for a Web standard. However, it seems to me that in practice there is no standardized dependency on ICU but on the CLDR database maintained by Unicode.org. In a C or C++ program, the easiest and least-NIH way to expose CLDR is to use ICU like Google and Apple do and like we do on desktop. I'm not sure what Microsoft does, considering that these days they are no longer opposed to using open source software, but I believe that Edge exposes CLDR via some non-ICU Microsoft-developed mechanism. So it seems like there are two independent interoperable implementations as far as code goes. > and > also find it disturbing that we're going to extend that > standardization on a particular piece of software (possibly even > more rigidly) into other areas. As noted, these other areas are why I care about having ICU unconditionally available on all platforms that Gecko runs on and why I think it's harmful when ICU is blocked on one platform. Also, as noted, I don't care about ICU per se, but I care about being able to treat operations like Unicode normalization and locale-aware collation as foundational operations whose availability or correctness is not optional. I think it would be ideal if we had a library or set of libraries written in Rust that provided this functionality, but until such a library written in Rust shows up, ICU is the only option on the table today that is (if bundled in Gecko in its latest version) correct and cross-platform consistent. I think it is harmful that we have to maintain abstractions for foundational operations to support a configuration where the back end isn't correct (to latest Unicode data) and cross-platform consistent. Until Rust-based replacements show up, the most reasonable way to perform operations that depend on Unicode.org data is to bundle ICU and to call its APIs directly without abstraction layers in between. Again, talking about ICU as just an enabler of the ECMAScript Internationalization API is a bad framing for the issue, because it makes it seem like blocking ICU "just" turns off one fairly recent Web API. Yet, Gecko has needs for functionality exposed by ICU in various places. For example: * Hyphenation, spellchecking, layout, gfx and JavaScript parsing need access to the character properties of the Unicode database. Currently, we duplicate ICU functionality (in out-of-date manner I believe) to implement these in libxul. * Internationalized domain names and text shaping need Unicode Normalization. Currently, we duplicate ICU functionality (in out-of-date manner I believe) to implement these in libxul. * IndexedDB, XPath, XUL, SQL storage and history search UI use locale-sensitive sorting. Currently, we duplicate ICU functionality on Android for these by calling into the thread-unsafe C standard library API for this stuff. This is fundamentally broken, because the design of the C standard library is fundamentally broken: In the C standard library, there's no to ask for comparison according to a given locale. Instead, we set the locale process-wide (all threads!), then ask the C standard library to do a comparison, and then unset the locale process-wide. * Parts of the Firefox UI do locale-sensitive datetime formatting in a way that calls to legacy platform APIs duplicating ICU function in a manner that imports system-specific bugs. * Based on open bugs, it seems we duplicate ICU functionality for bidi, but it's not clear to me if we're already building that part of ICU anyway and the relative correctness is unclear to me. I think it's neither good use of developer time nor holistic management of product size in bytes to have this duplication sprinkled around. (Though I don't believe that getting rid of the above duplication of ICU functionality would add up to the size of ICU itself: We should expect ICU to be a net addition to APK size in any case.) It's worth noting that the above items split into on one hand the Unicode character property database and associated algorithms (normalization, bidi, line breaking, script run identification) and on the other hand the CLDR database and associated algorithms (locale-sensitive sorting, date formatting, number formatting, etc.). We have more foundational dependency needs on the former than the latter, but the discussion about ICU size as well as the ECMAScript Internationalization API exposure is mainly about the latter. Again, ideally, we'd have a an actively-maintained Rust library for the Unicode character property database and associated algorithms and another actively-maintained Rust library the CLDR database and associated algorithms. But absent such libraries showing up, ICU is what we have available and should use until then. That is, I think we should systematically use a single source of Gecko-bundled foundational functionality for the Unicode character property database and associated algorithms and I think we should systematically use a single source of Gecko-bundled foundational functionality for the CLDR database and associated algorithms. (And we've already turned off a bunch of stuff in ICU that doesn't appear on the above list.) > I think many of the arguments we > made against standardizing on SQLite seem to apply to ICU as well, > such as security risk and needing to reverse-engineer when writing > future implementations of parts of the Web platform. Is there any actual evidence that non-ICU exposure of the CLDR data-based operations wasn't interoperable with ICU exposure of the CLDR data-based operations? > While I expect that some of the features that Intl provides (from > ICU data) are worthwhile in terms of codesize, I'm certainly not > confident that they all are. I have similar worries about other > large chunks of code that land in our tree... What other chunks? What bothers me the most regarding size of what we ship is * Failure to make the most out of compression (i.e. Zopfli) before objecting to the addition of new things stuff. I've brought this up before, but just now, I downloaded the (en-US API level 15) APK for Fennec 46 and ran ImageOptim (https://imageoptim.com/mac) on the PNG files included directly in the APK (i.e. not the one hidden inside omni.ja). ImageOptim says: "Saved 311KB out of 1.7MB. 28.6% per file on average (up to 94.3%)." (There wasn't a single already-optimal PNG there!) Additionally, the same exercise could be repeated for images in omni.ja. Then all the XML and JS could be Zopflified. The bundled .ttf files could be turned into Brotli-compressed WOFF2 files. All as a matter of being smarter about how we use lossless compression on stuff that we already losslessly compress without adding any novel decompression capability to the product and without revisiting the decision not to compress .so files. (Though maybe we should look into building ICU as a separate .so and seeing how well Zopfli compresses the CLDR data.) * Failure to remove old mostly useless cruft like nsIEntityConverter before blocking new useful stuff like ICU. (Not the same size, sure.) * Landing JS polyfills for platform features that we implement natively when checking in front end code that imports cross-browser JS libraries. (I was particularly saddened to see Hello land a JS polyfill for TextDecoder at the time when I was working on removing dead weight from our Gecko-level encoding converters.) > And when I say worthwhile, I'm talking not just about whether the > feature is intrinsically valuable, but whether it's actually going > to be used by Web developers to get that value to users. Sadly, exposure of CLDR data-based operations to the Web (whether via the ECMAScript Internationalization API or IndexedDB) is the sort of this that is unlikely to deliver value until Web app developers feel that it's OK to unconditionally rely on the browser providing that functionality. In that sense, refusal to ship the feature on some platform citing lack of use on the Web is exactly the sort of thing that will significantly lower the probability that use on the Web will take off thereby making the feature dead weight even on platforms where it is shipped. > I think enough of our users are on Windows XP that decisions about > dropping Windows XP are not a purely engineering decision. I don't recall it ever being explicitly articulated why having x number or percentage of users on XP leads to the conclusion that XP must remain supported. Having x users on XP is quite different from having x users on a more current platform, because XP itself is no longer getting security patches and because our competitors are no longer delivering security patches for their browsers and XP. If the concern is that it's wrong to stop delivering security patches for x users, surely it can't be Mozilla's responsibility to indefinitely continue to deliver browser security patches to users who already tolerate running a system that doesn't get security patches and for which other browsers have already earlier stop delivering security patches. One might argue that it's wrong to give these users the impression that it's OK to continue to stay on a system that doesn't get security patches. If the concern is that the users would switch to another browser, making a dent in in Firefox's overall observed market share, as long as the users stick to XP, what would they switch to? Security patches for IE and Chrome have already stopped earlier and considering that XP support was actually ripped out of Chromium, Opera and other Chromium re-packagers probably won't be able to continue to support XP, either). If the concern is that users would switch to another browser when they switch off XP having felt abandoned on XP, why should we expect that to be the case considering that IE and Chrome abandoned them earlier? As far as I can tell, the main effect that we should expect from dropping XP or putting XP on ESR is the market share of the *latest* Firefox dropping (as opposed to the market share of Firefox in general dropping). Maybe that's the concern and maybe everyone is supposed to know that that's the concern, but I think it would still help to articulate what the concern is instead of just pointing to having lots of users on XP. > (Do we > still have more Windows XP users than we have on all non-Windows > platforms combined?) Pushing those users to ESR without buy-in from > all parts of the organization will likely lead to worse engineering > problems than having to support XP (e.g., having to support 45ESR > substantially longer). That's certainly a risk of pushing XP to ESR. > (It also seems to be that we need to answer the question, > already raised in this thread, about whether the parts that are > expensive for them to support intersect at all with the parts that > we use.) Indeed, but on our side, we should remember tally every thread like this as part of the cost of supporting XP on our side. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform