On Sat, Jan 26, 2013 at 09:30:00PM -0800, Junio C Hamano wrote:
> Michael Haggerty <[email protected]> writes:
>
> > This will still fail under Python 2.x if repo.path is a byte string that
> > contains non-ASCII characters. And it will fail under Python 3.1 and
> > later if repo.path contains characters using the surrogateescape
> > encoding option [1],...
> > Here you don't really need byte-for-byte correctness; it would be enough
> > to get *some* byte string that is unique for a given input ...
>
> Yeek.
>
> As we do not care about the actual value at all, how about doing
> something like this instead?
>
> + hasher.update(".".join([str(ord(c)) for c in repo.path]))
This doesn't solve the original problem since we're still ending up with
a Unicode string. If we wanted something like this it would need to be:
hasher.update(b'.'.join([b'%X' % ord(c) for c in repo.path]))
which limits us to Python 2.6 and later and seems to me to be less clear
than introducing an "encode_filepath" helper function using Michael's
suggestion.
John
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html