On Thu, Jul 13, 2006 at 01:17:39PM -0700, Zack Weinberg wrote: > Currently the knowledge of which characters are not allowed in a > pathname is split between paths.cc and constants.cc. > paths.cc:has_bad_chars is the sole user of > constants.cc:illegal_path_bytes, but adds more to the set (notably > backslash). I note also that this code is all marked as "must be > super fast" but has_bad_chars uses a relatively inefficient algorithm. > This patch deletes illegal_path_bytes and reduces has_bad_chars to a > simple loop with the forbidden bytes expressed in code, rather than > looked up in a table. The LIKELY and UNLIKELY coerce gcc 4.1 into > generating code which is, um, not actively stupid (bug filed).
Seems fine to me. > +// ??? Ensure use of UTF8 encoding internally, validate encoding here. ^^ Hmm? > u8 x = (u8)*c; > - if (x < sizeof(bad_table) && bad_table[x]) > - return true; > + // 0x5c is '\\'; we use the hex constant to make the dependency on > + // ASCII encoding explicit. > + if (UNLIKELY(x <= 0x1f || x == 0x5c || x == 0x7f)) This could do with a comment about how the innocent looking "u8" there is critical to the "<=" doing the right thing on machines with signed chars... -- Nathaniel -- "Of course, the entire effort is to put oneself Outside the ordinary range Of what are called statistics." -- Stephan Spender _______________________________________________ Monotone-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/monotone-devel
