On Fri, 3 Dec 2004 01:41:41 +0100 Christian Biere <[EMAIL PROTECTED]> wrote:
> Does that mean, there's a problem with such results when not using ICU? ICU > is actually only used to match queries and canonicalize (outgoing) queries > if I remember correctly. It shouldn't affect viewing at all. It might be just my misunderstanding, though. Umm, I can't make it clear since there are combinations of GTK1 with(out) ICU, GTK2 with(out) ICU. Also I can't remember exactly from which version of gtkg, however formerly all that seems to be Japanese (Chinese) is underscored. One day it appeared suddenly, "Oh I can read these character, Japanese!". By the your pointing out, I recompiled gtkg without ICU library, then had been running, I confirmed it dosen't affect viewing. Now I must say I had wrong guess. I used to think why ICU library will be required for libiconv and libintl to be here. However where outgoing query is concerned, gtkg with GTK2 and ICU is dropping my search query (some Japanese, Chinese, Russian) automatically with '(WARNING): dropping invalid local query ""'. > Which GUI did you use, GTK1 or GTK2? Mainly GTK2, but I can get similar results if I used GTK1 as well. It might make this subject complicated, there is an another problem in GTK1; when I try to see "Information about selected file" pulling up from bottom of search results pane, even what is displayed completely in the search results pane becomes the blank (i.e. at file name, SHA1, size...) in the file details panel. > All peers must use UTF-8 and only UTF-8 encoded queries and results. There's > probably still quite an amount of improperly encoded of those on the network Indeed, I don't know how amounts are too. It says why I having carried out such a question, I worry about the next stage, i.e. "The time has come, we need to ban...". > The underscore is probably created by gtk-gnutella itself due to a conversion > problem (invalid or unexpected encoding). We don't use the official unicode > replacement character there because it would often unnecessarily enforce > UTF-8 (instead of plain ASCII) encoding of string and it's much more > inconvinient to handle in filenames (at least in a terminal). OK, I can see. > If string is not UTF-8 encoded, gtk-gnutella can only guess the used encoded > which means it falls back to used locale character set boldly assuming that > the user is rather interested in search results from users/machines using the > same locale settings. Ah, my problem might be around here. If there is a feature which can confirm filenames of mine currently shared (I know there is number of files, its size and LimeWire have all these feature), or emits notification when I'm trying to share a file with invalid encoded its filename, the problem caused by encoding is less than now for those who are annoying against bogus strings same as me, yes, applied only to outgoing of hits on local DB in the gtkg though... > What means "unreadable"? Only underscores and question marks, or what? An underscore almost all, a few ASCII character caused by invalid conversion and an ordinary ASCII character, there is no question mark. Then I've noticed these underscored search results come from Shareaza which is avaiable only Windows (95, 98, ME, NT, 2000, XP). As a matter of course, there is a certain exception even if it comes from LimeWire. These exceptions make me confused all the more :-( > LimeWire (due to Java) uses UTF-16 internally and emits only UTF-8 encoded > search results - I'm not sure whether composed or decomposed. During my > tests I didn't notice too many broken results that is most results with > (probably) Japanese filenames don't contain any characters that imply a > conversion error. My search query is 'limewire', 'japanese' which brings many Japanese filename. > gtk-gnutella will only convert strings that are not valid UTF-8 encoded. I > don't know your locale settings. If you used EUC and the remote peer sends > ShiftJIS (which is illegal and a bug in the remote peer), the conversion > fails and you'll see a broken string (with a lot of underscores or > question marks). > For gtk-gnutella it's optimal to use a locale with UTF-8 encoding > (and if necessary override the language setting). My encoding is ja_JP.EUC (LANG, LC_ALL). And there are two versions of libiconv, I'm using locally installed GNU libiconv 1.9.2 to be enabled extra encoding. I have a bit hesitation to enfoce whole my encoding UTF-8, since my system dosen't have it. Well ok, I'll have to write a wrapper script. Thank you. -- Daichi ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Gtk-gnutella-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/gtk-gnutella-devel
