Hello, Markus Scherer <markus....@gmail.com> wrote: |On Thu, Apr 24, 2014 at 12:56 PM, Steffen Nurpmeso <sdaode\ |n...@yandex.com>wrote: |> Markus Scherer <markus....@gmail.com> wrote: |>|I strongly recommend you parse the derived properties rather than trying |> to |>|follow the derivation formula, because that can change over time. |> |> ..this file includes only those core properties that have |> themselves a derivation-may-change property? | |I don't know what that means.
|What I tried to say is, if you need ID_Start, then parse ID_Start from |DerivedCoreProperties.txt. That's more stable (and easier than parsing the |pieces and deriving | |# Lu + Ll + Lt + Lm + Lo + Nl |# + Other_ID_Start |# - Pattern_Syntax |# - Pattern_White_Space | |yourself. But i *do* need to parse several many pieces (since i'm hardly interested in ID_Start only)! Unicode has DerivedAge.txt (i don't know where that is derived from) and i need to parse PropList.txt anyway (to get the full list of whitespace characters, for example). So imho it's a bit like «Kraut und Rüben» («higgledy-piggledy» sayy <http://www.dict.cc/?s=Kraut+und+R%C3%BCben>). |For example, at least one of the derivation formulas (for Alphabetic) is |changing from 6.3 to 7.0. That is interesting or frightening, i don't know yet. Wouldn't it make sense to introduce a single PropListsJoined.txt that does it all. Or, for the sake of small and possibly space-constrained projects.. ?0[steffen@sherwood ]$ (cd ~/arena/docs.coding/unicode/data; > ll DerivedCore* PropList*) 100 [.] 99531 25 Sep 2013 PropList.txt 820 [.] 836985 25 Sep 2013 DerivedCoreProperties.txt ..and this is what i would do: offer a new file, say, Formula.txt, which defines exactly the necessary formula, e.g., to quote your example Alphabetic < UnicodeData.txt < PropList.txt + Lu + Ll + Lt + Lm + Lo + Nl + Other_ID_Start - Pattern_Syntax - Pattern_White_Space = That concept seems to be scalable at first glance. Old parsers will not generate correct data in the future anymore if i understood correctly? At least there should be a formular-compatibility version tag added somewhere, so that parsers can prevent themselves from generating incorrect data and automatically. I don't know why there need to be megabytes of duplicated data. Ach; and i'm not gonna start to dream of better support for ISO C / POSIX character classes. (Oh. ...It's surely sapless.) Ciao, --steffen _______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode