https://bugs.freedesktop.org/show_bug.cgi?id=40665

[email protected] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|LibO 3.4.3 release          |LibO 3.5.0 Beta2

--- Comment #5 from [email protected] 2011-12-26 19:35:32 PST ---
LO: 3.5 Beta2 (in Chinese or English UI) with zh-TW help installed
OS: Windows XP-sp3
I had limited access to test all kinds of cases with beta2 as possible.

For non-Chinese developers' imagination, let "BC", "DEF" stand for two and
three characters meaningful Chinese phrase respectively which may be inputed
for searching. 

A. One meaningful phrases or two or more meaningful phrases seperated with
space-character(s): "BC", "DEF", "BC DEF"
Real samples: "手動", "編號", "段落", "按一下"
Result: often the same messages: No topics found (this message are translated
in English)

If the "conditional text" is searched in English help file, "conditional" and
"text" are marked in the returned topic-text. In contrast to this only "縮", but
not "排" is marked in the returned topic-text. The same occurs by searching
"快鍵". It seems that only the first character of the searching words is marked
and only the first character of searching word was recognised as searching
word. The result by searching "縮排", "表單", and "錨點" is the same with that by "縮"
only. 

Some words in index list on Help UI can be found. Most cannot. (Were the words
in index list generated by Lucene engine?)

Searching one-character words causes no problem. 
B. The same phrases as those in A, but each character is seperated from other
charchters with space-characters: "B C", "D E F", "B C D E F". And the same
characters with changed order: "C B", "F E D", "E F D", etc.
Real samples: "手 號", "手 號 段", "手 號 段 按", "按 一 下", "下 一 按", "動 手", "號 編"
Result: Any topics containing those characters are shown. With many redundant
ones, of course.
(But this is not unimportant. For this is the only workaround for the moment.)

C. Adding one character into the phrases in A, we have "A BC", "A DEF", "BC A",
etc. to test.
Real samples: "手 編號"
Result: No topics found. 

So, the searching engine can handle two or more uncombined Chinese characters
well. It often has difficulties in all the others cases. The searching engine
often cannot handle two or more combined Chinese characters. (In contrast to
this, the engine can handle a set of combined English characters, for example
"style", correctly.) 


List of single keywords which may be inputed for searching:
印
縮排
快鍵
錨點
摘要
大綱
手動
編號
段落
名片
標籤
顯示
還原
按一下
功能表
印表機
記憶體
定位點
控制項
資料庫
項目符號
編號類型
向左對齊
自動儲存
檔案特性
保護記錄
直接格式
同義詞詞典
合併列印精靈
頁數的有條件的文字 

List of multiple keywords which are seperated with space-character and may be
inputed for searching:
頁數 有條件文字
手動 編號

(What I will say in the following may be wrong.) I surmise that Lucene has
tried to solve the problem of more satisfactory segmentation. I am not a
programmer. Please examine the following:
CJKTokenizer:
http://lucene.apache.org/java/3_0_2/api/contrib-analyzers/org/apache/lucene/analysis/cjk/package-summary.html#package_description

Three kinds of analyzer are presented in the above webpage. A (simplised)
Chinese sample sentence is segmented well with SmartChineseAnalyzer. In
contrast to this the ChineseAnalyzer "[i]ndex unigrams (individual Chinese
characters) as a token". I am not sure if only the latter analyzer is used in
LibreOffice so that it does not generate an index containing more satisfactory
tokenized things, like "BC", "DEF", etc, rather it generates only "A", "B",
"C", "D", etc. 

There is indeed an index appearing on the Help UI in which tokens like "BC",
"DEF", etc, are there. I don't know whether this one is the one generated by
Lucene engine.

Would the same problem occur in Korean or Janpanese help UI?

27.12.2011

Blogs in which cjktokenizer was discussed. But I am not sure if they
contributed to the current version of CJKTokenizer.
http://tw.myblog.yahoo.com/ys-blog/article?mid=966&sc=1
http://blog.csdn.net/liangjian103103103/article/details/6547611
http://tw.myblog.yahoo.com/ys-blog/article?mid=966&sc=1

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to