Your message dated Wed, 3 Apr 2013 13:27:36 +0200 (CEST)
with message-id <[email protected]>
has caused the   report #704182,
regarding diffutils: Diff -r will confusion between asian characters in 
filenames, when locale are non asian - UTF-8.
to be marked as having been forwarded to the upstream software
author(s) [email protected]

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
704182: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=704182
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Hello.

Received this report from the Debian bug system. I initially believed
this to be a duplicate of Debian Bug#633978, but it's not.

Here is a way to reproduce it, provided by the submitter after the
initial report:

--------------------------------------------------------
Here are a few command you may use to reproduce the bug

mkdir d1 d2
echo azerty > "d1/エンドカード1"
echo qsdfgh > "d2/ブックレット1"

If the bug is present, diff will return
> LANG=some_non_asian_LOCALE.utf8 diff -r d1 d2
1c1
< azerty
---
> qsdfgh

if the bug is not present you will have something like :
> LANG=C diff -r d1 d2
Only in d1: エンドカード1
Only in d2: ブックレット1
--------------------------------------------------------

I can also reproduce it with diffutils 3.3, this is the output in such case:

diff 
"d1/\343\202\250\343\203\263\343\203\211\343\202\253\343\203\274\343\203\2111" 
"d2/\343\203\226\343\203\203\343\202\257\343\203\254\343\203\203\343\203\2101"
1c1
< azerty
---
> qsdfgh

Follows the initial report:

---------- Forwarded message ----------
From: Philippe Errembault
To: Debian Bug Tracking System <[email protected]>
Date: Fri, 29 Mar 2013 03:10:46 +0100
Subject: Bug#704182: diffutils: Diff -r will confusion between asian characters
    in filenames, when locale are non asian - UTF-8.

Package: diffutils
Version: 1:3.0-1
Severity: normal


I don't know if this bug is caused by diff or by strcoll.
When comparing filenames with strcoll, using non asian utf8 locales,
chinese characters are considered identical, whichs lead to confusion
between files which are differents. 

E.g.: if you diff -r two directories with files in different orders,
because they where on different file systems, written with different OS.
For an example, I wanted to diff a copy on a server, of a directory from 
an NTFS disk. or simply because the files lists are not the same, and
the sort happens differently. then, diff may consider as two different
files as being the same, and report differences because it compares
different files. for examples, in my situation, it believed that
"エンドカード1.jpg" and "ブックレット1.jpg" were files with the same name
and reported errors between them.

The point, is that, I don't know if it is or not normal that
strcoll("エンドカード1.jpg", "ブックレット1.jpg"); returns 0 when locale
is anything_non_asian.utf-8

[...]

--- End Message ---

Reply via email to