https://bugs.freedesktop.org/show_bug.cgi?id=40665
[email protected] changed: What |Removed |Added ---------------------------------------------------------------------------- Version|LibO 3.4.3 release |LibO 3.5.0 Beta2 --- Comment #5 from [email protected] 2011-12-26 19:35:32 PST --- LO: 3.5 Beta2 (in Chinese or English UI) with zh-TW help installed OS: Windows XP-sp3 I had limited access to test all kinds of cases with beta2 as possible. For non-Chinese developers' imagination, let "BC", "DEF" stand for two and three characters meaningful Chinese phrase respectively which may be inputed for searching. A. One meaningful phrases or two or more meaningful phrases seperated with space-character(s): "BC", "DEF", "BC DEF" Real samples: "手動", "編號", "段落", "按一下" Result: often the same messages: No topics found (this message are translated in English) If the "conditional text" is searched in English help file, "conditional" and "text" are marked in the returned topic-text. In contrast to this only "縮", but not "排" is marked in the returned topic-text. The same occurs by searching "快鍵". It seems that only the first character of the searching words is marked and only the first character of searching word was recognised as searching word. The result by searching "縮排", "表單", and "錨點" is the same with that by "縮" only. Some words in index list on Help UI can be found. Most cannot. (Were the words in index list generated by Lucene engine?) Searching one-character words causes no problem. B. The same phrases as those in A, but each character is seperated from other charchters with space-characters: "B C", "D E F", "B C D E F". And the same characters with changed order: "C B", "F E D", "E F D", etc. Real samples: "手 號", "手 號 段", "手 號 段 按", "按 一 下", "下 一 按", "動 手", "號 編" Result: Any topics containing those characters are shown. With many redundant ones, of course. (But this is not unimportant. For this is the only workaround for the moment.) C. Adding one character into the phrases in A, we have "A BC", "A DEF", "BC A", etc. to test. Real samples: "手 編號" Result: No topics found. So, the searching engine can handle two or more uncombined Chinese characters well. It often has difficulties in all the others cases. The searching engine often cannot handle two or more combined Chinese characters. (In contrast to this, the engine can handle a set of combined English characters, for example "style", correctly.) List of single keywords which may be inputed for searching: 印 縮排 快鍵 錨點 摘要 大綱 手動 編號 段落 名片 標籤 顯示 還原 按一下 功能表 印表機 記憶體 定位點 控制項 資料庫 項目符號 編號類型 向左對齊 自動儲存 檔案特性 保護記錄 直接格式 同義詞詞典 合併列印精靈 頁數的有條件的文字 List of multiple keywords which are seperated with space-character and may be inputed for searching: 頁數 有條件文字 手動 編號 (What I will say in the following may be wrong.) I surmise that Lucene has tried to solve the problem of more satisfactory segmentation. I am not a programmer. Please examine the following: CJKTokenizer: http://lucene.apache.org/java/3_0_2/api/contrib-analyzers/org/apache/lucene/analysis/cjk/package-summary.html#package_description Three kinds of analyzer are presented in the above webpage. A (simplised) Chinese sample sentence is segmented well with SmartChineseAnalyzer. In contrast to this the ChineseAnalyzer "[i]ndex unigrams (individual Chinese characters) as a token". I am not sure if only the latter analyzer is used in LibreOffice so that it does not generate an index containing more satisfactory tokenized things, like "BC", "DEF", etc, rather it generates only "A", "B", "C", "D", etc. There is indeed an index appearing on the Help UI in which tokens like "BC", "DEF", etc, are there. I don't know whether this one is the one generated by Lucene engine. Would the same problem occur in Korean or Janpanese help UI? 27.12.2011 Blogs in which cjktokenizer was discussed. But I am not sure if they contributed to the current version of CJKTokenizer. http://tw.myblog.yahoo.com/ys-blog/article?mid=966&sc=1 http://blog.csdn.net/liangjian103103103/article/details/6547611 http://tw.myblog.yahoo.com/ys-blog/article?mid=966&sc=1 -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ Libreoffice-bugs mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs
