Hi Petr and darcs-users, 2009/12/25 Petr Rockai <[email protected]>: > Eric Kow <[email protected]> writes: >> 5. Despite our confidence about UTF-8 detection, tagging is still a good idea >> [a] for sheer conservatism and [b] for potential efficiency gains [c] >> because it lets us reliably distinguish NFC UTF-8 vs the other one. > it may make sense to think of proper metadata formatting before the > Ignore-This madness spins completely out of control (it was already bad > with the random junk, and it's only getting worse). > > Btw. as for [b] I don't think that scanning for the utf8 tag is > substantially faster than checking whether a string is utf-8 or not.
True > For [c], excuse my ignorance (I am still in the process of catching up > on darcs-users@) but where exactly do we care whether we have NFC or > not? I.e. is this a matter of correctness for darcs, or is it just a > matter of unexpectedly non-matched --match/--patch? Just a matter of matching. > In the latter case, can we make this optional, and maybe issue a warning > when a non-ASCII matcher is used and we don't have ICU, instead of > having a hard dependency? We could do that, but I would prefer to have darcs "just work" and do the right thing, instead of issue a warning. Let me reiterate that a lot of packages use ICU these days; a quick Google search turns up CouchDB and OpenOffice.org. > Additionally, do we believe that not having to run the metadata being > matched through normalisation is a substantial performance boost? (In > pseudocode) Is "(normalise metadata) `match` (normalise matcher)" > substantially slower than "metadata `match` (normalise matcher)`? I > understand that the tagging is needed for the latter to work reliably. I'd think it is when you're doing a linear search through a potentially large patch history. > The catch is that for vast body of the existing patches, this > optimisation is not going to work at all anyway (since even if it is > accidentally in the right format, we don't know -- there is no tag -- > and have to re-normalise). So I would say that this comes down to > asking, whether we expect to stick to this particular patch (metadata) > format (free text with Ignore-This tags, maybe utf8 and maybe arbitrary > other encoding) for long enough to make this optimisation pay off. (I am > more leaned toward saying no, and just normalising everything for a good > measure, if we can, and issuing warnings when we cannot). I for one hope to have designed something that can last quite a while, so that this optimization will pay off someday. Perhaps for all those folks that are going to convert their huge-ass git repos to darcs one day ;-). Reinier _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
