Re: [Monotone-devel] rfc: small simplification to paths.cc/constants.cc

Nathaniel Smith Fri, 14 Jul 2006 23:42:19 -0700

On Thu, Jul 13, 2006 at 01:17:39PM -0700, Zack Weinberg wrote:
> Currently the knowledge of which characters are not allowed in a
> pathname is split between paths.cc and constants.cc.
> paths.cc:has_bad_chars is the sole user of
> constants.cc:illegal_path_bytes, but adds more to the set (notably
> backslash).  I note also that this code is all marked as "must be
> super fast" but has_bad_chars uses a relatively inefficient algorithm.
> This patch deletes illegal_path_bytes and reduces has_bad_chars to a
> simple loop with the forbidden bytes expressed in code, rather than
> looked up in a table.  The LIKELY and UNLIKELY coerce gcc 4.1 into
> generating code which is, um, not actively stupid (bug filed).


Seems fine to me.

> +// ??? Ensure use of UTF8 encoding internally, validate encoding here.

^^ Hmm?

>       u8 x = (u8)*c;
> -      if (x < sizeof(bad_table) && bad_table[x])
> -          return true;
> +      // 0x5c is '\\'; we use the hex constant to make the dependency on
> +      // ASCII encoding explicit.
> +      if (UNLIKELY(x <= 0x1f || x == 0x5c || x == 0x7f))

This could do with a comment about how the innocent looking "u8" there
is critical to the "<=" doing the right thing on machines with signed
chars...

-- Nathaniel

-- 
"Of course, the entire effort is to put oneself
 Outside the ordinary range
 Of what are called statistics."
  -- Stephan Spender


_______________________________________________
Monotone-devel mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/monotone-devel

Re: [Monotone-devel] rfc: small simplification to paths.cc/constants.cc

Reply via email to