Re: [rfc] lstrcmpi: order still wrong (was Re: Regression in lstrcmpiA (occurred in late June, NLS related) from 2003 year)

2009-07-06 Thread Yuriy Kaminskiy
On 04.07.2009 23:55, Yuriy Kaminskiy wrote:
 Yuriy Kaminskiy wrote:
 I'm wrong - I don't have working windows installation at hands and cannot 
 check
 that.
Well, no answer so far; I thought should write test, code is more welcomed than
just words, and noticed that such test already present, but disabled :-E.
That's wrong. If test report breakage, it should not be simply silenced and
forgotten for 6 years.
See [rfc] [kernel32/tests] enable sort order test series in *-patches





Re: [rfc] lstrcmpi: order still wrong (was Re: Regression in lstrcmpiA (occurred in late June, NLS related) from 2003 year)

2009-07-04 Thread Yuriy Kaminskiy
Yuriy Kaminskiy wrote:
I've stumbled over problem with lstrcmpi sorting is still wrong. Some
 japanese game engine uses binary search on presorted array, and fails
 with a-la object not found errors.
[...]
 proper order should be _  0 (ok) and .  _ (fails with vanilla
 wine).
  Well, after private email, I think I should stress out, that while I /believe/
that sort order in winxp does not depend on locale, it is also /possible/ that
I'm wrong - I don't have working windows installation at hands and cannot check
that.
  [Nevertheless, I've ran that game in japanese locale [ja_JP.UTF-8], and /if/
sort order in winxp depend on locale, sort order in wine should be fixed to
depend on locale too: still bug, but slightly different ;-)].





[rfc] lstrcmpi: order still wrong (was Re: Regression in lstrcmpiA (occurred in late June, NLS related) from 2003 year)

2009-07-03 Thread Yuriy Kaminskiy
Hello!
   Previous thread on this topic:
http://www.mail-archive.com/wine-devel@winehq.org/msg01080.html
   I've stumbled over problem with lstrcmpi sorting is still wrong. Some
japanese game engine uses binary search on presorted array, and fails
with a-la object not found errors.
   Judging by object order in archive,
=== cut ===
...
conf_p.MGD- (would fail with strcasecmp, ok with wine)
conf01.MGD--/
...
title.MGD-- fails with vanilla wine
title_p.MGD--/
...
=== cut ===
proper order should be _  0 (ok) and .  _ (fails with vanilla
wine).
   I've replaced collation weight of '_' with 0x02560111, and now these
games run fine; but that's dirty hack, of cause, and should not be
applied to upstream: 1) it is modifies generated file; 2) weight for _
chosen arbitrary and can cause conflicts somewhere else (or, rather, not
can, but certainly will - there are other symbols with weight
0x0256???); 3) weight for other _-like chars should be modified too.
   Hope you can suggest better solution.
   FWIW, I've checked mentioned in previous thread unicode-2.1.9d8
tables - same mismatch, will not work too.
   I think, only proper way is somehow extract this table from windows
(either directly by LCMapStringW(LC_MAP_SORTKEY), or sorting array of
a[i]=i; with CompareStringW and using that order). I'm not a lawyer, but
really doubt that such reproduced table can be considered copyrightable
anywhere. How can anyone make compatible reimplementation without
reproducing in some way this table?
-- 





Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-03 Thread Shachar Shemesh
Troy Rollo wrote:

The 2.1.9d8 file seems after a quick look to be closer to the Crossover 
version of the table - for example, it has many of the different types of 
space characters sorted near 0020, which is an aspect of the Crossover table 
not present in the table based on allkeys.txt (3.1.1), so the theory that 
Microsoft's results are just based on an earlier version of the standard 
table is starting to look like it has merit.
 

Logically, it doesn't make sense that they did anything else. After all 
- why would they?

Even if it's not the case, there may be several possible workarounds for 
this issue. I have a lawer I can consult about this matter, but let's 
rule out the Unicode 2.0 theory first. I have access to the Unicode 2.0 
(printed) book, if that's any help to anyone.

Shachar

--
Shachar Shemesh
Open Source integration consultant
Home page  resume - http://www.shemesh.biz/




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Dmitry Timoshkov
Troy Rollo [EMAIL PROTECTED] wrote:

 Well right now it's not using any table at all - it's just going through to 
 strncmpiW, which is essentially a word-by-word comparison. Presumably the 
 issue now is copyright on the MS version of the table. Do you have anything 
 written down on the differences that you can give me so I can look for 
 work-arounds?

I'm attaching current diff between CX Office and WineHQ CVS edited manually
to remove not related parts, ignoring that in dlls/kernel/tests/locale.c
some parts missing in the CX Office CVS got removed. The diff is provided
solely for demonstrating what exactly fixes were made and for testing,
it's not ready yet for inclusion into the WIneHQ due to reasons explained
earlier.

Some areas of interest are CompareString test suite, changes for unicode
collation table, and changes in the CompareString implementation.

P.S.
Sorry, I compressed the diff since only few of you all might be interested
to look at the really boring details...

-- 
Dmitry.


compare_string.diff.gz
Description: GNU Zip compressed data


Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Jakob Eriksson
Uwe Bonnes wrote:

   Dmitry The source of all of this is the difference between MS and
   Dmitry unicode.org sort weight tables. There is no an easy way to make
   Dmitry unicode.org database look like the MS one unfortunately...
Can we perhaps write a tool that dumps those tables on a running MS system
as header files that wine can use? Would this be allowable?
 

Wouldn't the clean-room way be to write regression tests that pass on 
Windows?

regards,
Jakob




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Jeff Smith
--- Dmitry Timoshkov [EMAIL PROTECTED] wrote:
 Jakob Eriksson [EMAIL PROTECTED] wrote:
 
  Dmitry The source of all of this is the difference between MS and
  Dmitry unicode.org sort weight tables. There is no an easy way to make
  Dmitry unicode.org database look like the MS one unfortunately...
  
  Can we perhaps write a tool that dumps those tables on a running MS system
  as header files that wine can use? Would this be allowable?

  
  
  Wouldn't the clean-room way be to write regression tests that pass on 
  Windows?
 
 That's the approach we have chosen so far.
 
 -- 
 Dmitry.

You mean something like:

===
#include windows.h

unsigned char test_strings[96][2];

int xyz (const void * y, const void * z)
{
return lstrcmpi(y, z);
}

int main(int argc, char *argv[])
{
int i;

for (i=0; i96; i++)
sprintf (test_strings[i], %c, i+0x20);
qsort (test_strings[0][0], 96, 2, xyz);
for (i=0; i96; i++) {
printf (  0x%02x '%s', test_strings[i][0], test_strings[i]);
if ((i == 95) || (lstrcmpi(test_strings[i], test_strings[i+1])))
printf (\n);
}

return 0;
}
===
[On Windows 2000 Pro]
  0x7f '#8962;'
  0x27 '''
  0x2d '-'
  0x20 ' '
  0x21 '!'
  0x22 ''
  0x23 '#'
  0x24 '$'
  0x25 '%'
  0x26 ''
  0x28 '('
  0x29 ')'
  0x2a '*'
  0x2c ','
  0x2e '.'
  0x2f '/'
  0x3a ':'
  0x3b ';'
  0x3f '?'
  0x40 '@'
  0x5b '['
  0x5c '\'
  0x5d ']'
  0x5e '^'
  0x5f '_'
  0x60 '`'
  0x7b '{'
  0x7c '|'
  0x7d '}'
  0x7e '~'
  0x2b '+'
  0x3c ''
  0x3d '='
  0x3e ''
  0x30 '0'
  0x31 '1'
  0x32 '2'
  0x33 '3'
  0x34 '4'
  0x35 '5'
  0x36 '6'
  0x37 '7'
  0x38 '8'
  0x39 '9'
  0x61 'a'  0x41 'A'
  0x62 'b'  0x42 'B'
  0x43 'C'  0x63 'c'
  0x44 'D'  0x64 'd'
  0x45 'E'  0x65 'e'
  0x66 'f'  0x46 'F'
  0x47 'G'  0x67 'g'
  0x48 'H'  0x68 'h'
  0x69 'i'  0x49 'I'
  0x4a 'J'  0x6a 'j'
  0x6b 'k'  0x4b 'K'
  0x6c 'l'  0x4c 'L'
  0x6d 'm'  0x4d 'M'
  0x6e 'n'  0x4e 'N'
  0x6f 'o'  0x4f 'O'
  0x50 'P'  0x70 'p'
  0x51 'Q'  0x71 'q'
  0x72 'r'  0x52 'R'
  0x53 'S'  0x73 's'
  0x74 't'  0x54 'T'
  0x75 'u'  0x55 'U'
  0x76 'v'  0x56 'V'
  0x77 'w'  0x57 'W'
  0x58 'X'  0x78 'x'
  0x59 'Y'  0x79 'y'
  0x5a 'Z'  0x7a 'z'
===

 -- Jeff Smith



__
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com



Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Dmitry Timoshkov
Jeff Smith [EMAIL PROTECTED] wrote:

 You mean something like:

[skipped]

Exactly. I have something like that here, the only difference is that
I'm dumping full unicode range 0-0x, not only first 96 characters.

-- 
Dmitry.





Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Dimitrie O. Paun
On October 2, 2003 10:19 am, Dmitry Timoshkov wrote:
 That's the approach we have chosen so far.

So, what's the problem with doing something like so:

For all x,y in Unicode
print x,y,lstrcmpi(x,y)

(It will generate maybe close to 30GB of output, but it's OK)

Run this on Windows and Wine, compare the result, and generate
a sort of patch file to apply to the unicode.org tables. For
added points, we can run this on multiple versions of Windows,
and only look at things that are immutable between versions...

-- 
Dimi.




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Shachar Shemesh
Dmitry Timoshkov wrote:

Jeff Smith [EMAIL PROTECTED] wrote:

 

You mean something like:
   

[skipped]

Exactly. I have something like that here, the only difference is that
I'm dumping full unicode range 0-0x, not only first 96 characters.
 

Isn't the full unicode range significantly larger than 0-0x? What 
about agregates? CJK etc?

Shachar

--
Shachar Shemesh
Open Source integration consultant
Home page  resume - http://www.shemesh.biz/




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Troy Rollo
On Thu, 2 Oct 2003 19:34, Dmitry Timoshkov wrote:
  Can we perhaps write a tool that dumps those tables on a running MS
  system as header files that wine can use? Would this be allowable?

 I really hope that we could find a solution without doing that.

Indeed - since doing that would compromise redistribution in Australia. There 
is a seminal case in which a table contained in a computer program was held 
to have copyright separately to the computer program itself. Thus to be 
distributable here (at least), the table either needs to be capable of 
generation or computation from established objective rules (which would tend 
to negate copyright), or a method of reproducing the result without the table 
would need to be devised.




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Troy Rollo
On Thu, 2 Oct 2003 21:49, Jakob Eriksson wrote:
 Wouldn't the clean-room way be to write regression tests that pass on
 Windows?

This doesn't help avoid the copyright on the table if you in fact reproduce 
the table.




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Dimitrie O. Paun
On Fri, 3 Oct 2003, Troy Rollo wrote:

 This doesn't help avoid the copyright on the table if you in fact reproduce 
 the table.

Why is that? We're talking here about lstrcmpiA() behaviour, why would a
test for

For all x,y in Unicode:
print x,y,lstrcmpiA(x,y)

violate the copyright?

-- 
Dimi.




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Troy Rollo
On Fri, 3 Oct 2003 08:21, Dimitrie O. Paun wrote:
 Why is that? We're talking here about lstrcmpiA() behaviour, why would a
 test for

 For all x,y in Unicode:
   print x,y,lstrcmpiA(x,y)

 violate the copyright?

I think the suggestion was that the regression tests be used to fabricate the 
table and then include the resulting fabricated table in Wine. If so, the 
result would still be copied, although by an indirect means.




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Dimitrie O. Paun
On Fri, 3 Oct 2003, Troy Rollo wrote:

 On Fri, 3 Oct 2003 08:21, Dimitrie O. Paun wrote:
  Why is that? We're talking here about lstrcmpiA() behaviour, why would a
  test for
 
  For all x,y in Unicode:
  print x,y,lstrcmpiA(x,y)
 
  violate the copyright?
 
 I think the suggestion was that the regression tests be used to fabricate the 
 table and then include the resulting fabricated table in Wine. If so, the 
 result would still be copied, although by an indirect means.

I don't think the result is still copied, if so than you would never be
able to run tests. But this is not what I suggested anyway. I said to run
the above on Windows and on Wine (which is based on the unicode.org tables).
Compare the results, and generate the differences. Use that as a 'patch'
to future unicode.org table updates.

-- 
Dimi.




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Troy Rollo
On Fri, 3 Oct 2003 08:47, Dimitrie O. Paun wrote:

 I said to run
 the above on Windows and on Wine (which is based on the unicode.org
 tables). Compare the results, and generate the differences. Use that as a
 'patch' to future unicode.org table updates.

Yes, this is a problem for copyright. The result still counts as copied, at 
least in Australia, the UK and New Zealand. It's arguable in the United 
States that given Microsoft's position you could bring it within Feist, but 
if you're using a mechanism that relies on the contents of the table and will 
necessarily produce the same table, it counts as copying.

Incidentally, going through the differences, is the value for character code 
0x34 correct in the Crossover version? All the other characters in the Basic 
Latin range that have differences are punctuation characters (in fact all the 
Basic Latin range punctuation characters have differences). 0x34, however is 
the digit '4', and it would seem odd that it would differ in ways the other 
digits don't.




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Dimitrie O. Paun
On October 2, 2003 07:30 pm, Troy Rollo wrote:
 Yes, this is a problem for copyright. The result still counts as copied, at
 least in Australia, the UK and New Zealand.

This doesn't make any sense. It means that we can _never_ have correct
behaviour, no matter what we do, even if we magically come up with the
same table. This is insane.

-- 
Dimi.




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-02 Thread Troy Rollo
On Fri, 3 Oct 2003 14:02, Dimitrie O. Paun wrote:
 This doesn't make any sense.

Well when the High Court of Australia considered it they said it was 
unsatisfactory, which is their way of saying it sucks, but that's the way it 
is.

It means that we can _never_ have correct
 behaviour, no matter what we do, even if we magically come up with the
 same table. This is insane.

In some cases it amounts to that.  This is why it's important to try to come 
up with some way of expressing the contents of the table without the table, 
or of finding objective rules that can generate the table.

Having compared a few versions of the allkeys database it seems that there 
have been some changes to the ordering of characters between versions, which 
leads me to wonder if Microsoft were just using an earlier version of the 
table. Microsoft's documentation suggests they adhere to version 2.0 of the 
Unicode standard, whereas the allkeys.txt file immediately accessible on the 
unicode.org web site is version 3.1.1.

Here's the versions I can find: 

2.1.9d8 http://www.unicode.org/reports/tr10/basekeys.txt
2.1.9d8 http://www.unicode.org/reports/tr10/compkeys.txt
3.1.1   http://www.unicode.org/reports/tr10/allkeys-3.1.1.txt
3.1.1d3 http://www.unicode.org/reports/tr10/allkeys-3.1.1d3.txt
3.0.0d5 http://www.unicode.org/reports/tr10/allkeys-4.0.0d5.txt

The 2.1.9d8 file seems after a quick look to be closer to the Crossover 
version of the table - for example, it has many of the different types of 
space characters sorted near 0020, which is an aspect of the Crossover table 
not present in the table based on allkeys.txt (3.1.1), so the theory that 
Microsoft's results are just based on an earlier version of the standard 
table is starting to look like it has merit.




Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-01 Thread Troy Rollo
When lstrcmpiA was moved from ole2nls.c to locale.c, (around 28th June) the 
results of comparisons in some cases became reversed. For example, the 
underscore now returns as greater than alphabetic characters, whereas it used 
to return as less than alphabetic characters. The older behaviour was 
consistent with Win2k.

The output below is from the following source:

---begin test program---
#include windows.h
#include stdio.h

char *test_strings[] =
{
_,
A,
a,
z,
Z,
0
};


void
test_string(char *pch)
{
char **ppch = test_strings;

while (*ppch)
{
printf(%s\t%s\t%d\n, pch, *ppch, lstrcmpiA(pch, *ppch));
++ppch;
}
}
int
main(int argc, char **argv)
{
char **ppch = test_strings;

while (*ppch)
test_string(*ppch++);
return 0;
}
---end test program---

---Wine output from immediately before the change---
_   _   0sorts
_   A   -1
_   a   -1
_   z   -1
_   Z   -1
A   _   1
A   A   0
A   a   0
A   z   -1
A   Z   -1
a   _   1
a   A   0
a   a   0
a   z   -1
a   Z   -1
z   _   1
z   A   1
z   a   1
z   z   0
z   Z   0
Z   _   1
Z   A   1
Z   a   1
Z   z   0
Z   Z   0
---End---

---Wine output from immediately after the change---
_   _   0
_   A   1
_   a   1
_   z   1
_   Z   1
A   _   -1
A   A   0
A   a   0
A   z   -1
A   Z   -1
a   _   -1
a   A   0
a   a   0
a   z   -1
a   Z   -1
z   _   -1
z   A   1
z   a   1
z   z   0
z   Z   0
Z   _   -1
Z   A   1
Z   a   1
Z   z   0
Z   Z   0
~---End---




Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-01 Thread Troy Rollo
Further investigation reveals another problem in lstrcmpiA: MSDN documents 
this function as executing what it describes as a word sort, which results 
in the words co-op and coop sorting to the same place. This is almost a 
correct description of what happens (if the strings come out to be the same 
after the word sort it appears that it does a regular comparison as well). 
The attached files demonstrate the divergence of wine in this regard as well 
as the original regression.


#include windows.h
#include stdio.h

char *test_strings1[] =
{
	_,
	A,
	a,
	z,
	Z,
	0
};

char *test_strings2[] =
{
	coop,
	co-op,
	co-op a,
	coop a,
	co-op b,
	coop b,
	0
};


void
test_string(char *pch, char **test_strings)
{
	char **ppch = test_strings;

	while (*ppch)
	{
		printf(%s\t%s\t%d\n, pch, *ppch, lstrcmpiA(pch, *ppch));
		++ppch;
	}
}

void
do_test(char **test_strings)
{
	char **ppch = test_strings;

	while (*ppch)
		test_string(*ppch++, test_strings);
}

int
main(int argc, char **argv)
{
	do_test(test_strings1);
	do_test(test_strings2);

	return 0;
}
_   _   0
_   A   -1
_   a   -1
_   z   -1
_   Z   -1
A   _   1
A   A   0
A   a   0
A   z   -1
A   Z   -1
a   _   1
a   A   0
a   a   0
a   z   -1
a   Z   -1
z   _   1
z   A   1
z   a   1
z   z   0
z   Z   0
Z   _   1
Z   A   1
Z   a   1
Z   z   0
Z   Z   0
coopcoop0
coopco-op   -1
coopco-op a -1
coopcoop a  -1
coopco-op b -1
coopcoop b  -1
co-op   coop1
co-op   co-op   0
co-op   co-op a -1
co-op   coop a  -1
co-op   co-op b -1
co-op   coop b  -1
co-op a coop1
co-op a co-op   1
co-op a co-op a 0
co-op a coop a  1
co-op a co-op b -1
co-op a coop b  -1
coop a  coop1
coop a  co-op   1
coop a  co-op a -1
coop a  coop a  0
coop a  co-op b -1
coop a  coop b  -1
co-op b coop1
co-op b co-op   1
co-op b co-op a 1
co-op b coop a  1
co-op b co-op b 0
co-op b coop b  1
coop b  coop1
coop b  co-op   1
coop b  co-op a 1
coop b  coop a  1
coop b  co-op b -1
coop b  coop b  0
_   _   0
_   A   1
_   a   1
_   z   1
_   Z   1
A   _   -1
A   A   0
A   a   0
A   z   -1
A   Z   -1
a   _   -1
a   A   0
a   a   0
a   z   -1
a   Z   -1
z   _   -1
z   A   1
z   a   1
z   z   0
z   Z   0
Z   _   -1
Z   A   1
Z   a   1
Z   z   0
Z   Z   0
coopcoop0
coopco-op   1
coopco-op a 1
coopcoop a  -1
coopco-op b 1
coopcoop b  -1
co-op   coop-1
co-op   co-op   0
co-op   co-op a -1
co-op   coop a  -1
co-op   co-op b -1
co-op   coop b  -1
co-op a coop-1
co-op a co-op   1
co-op a co-op a 0
co-op a coop a  -1
co-op a co-op b -1
co-op a coop b  -1
coop a  coop1
coop a  co-op   1
coop a  co-op a 1
coop a  coop a  0
coop a  co-op b 1
coop a  coop b  -1
co-op b coop-1
co-op b co-op   1
co-op b co-op a 1
co-op b coop a  -1
co-op b co-op b 0
co-op b coop b  -1
coop b  coop1
coop b  co-op   1
coop b  co-op a 1
coop b  coop a  1
coop b  co-op b 1
coop b  coop b  0


Re: Regression in lstrcmpiA (occurred in late June, NLS related)

2003-10-01 Thread Troy Rollo
On Wed, 1 Oct 2003 18:25, Dmitry Timoshkov wrote:
  The older behaviour was
  consistent with Win2k.

 ... and only with Latin1 locale, failing with others.

Yes, but it this also means it worked for ASCII-7. Right now it doesn't even 
work for that. This creates problems for some applications, such as those 
that incorrectly use lstrcmpA to do binary searches on internal ordered 
keyword tables where the keywords can include punctuation characters or 
underscores. It means they fail to find some of their keywords, the result 
being spurious error results. Since the ASCII-7 range is the same regardless 
of character set, this wrong use of lstrcmpA happens to work on Windows if 
all the keywords in such a table are limited to that range.