http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=13064
Bug ID: 13064
Summary: Indexing problem with ICU on control characters
Change sponsored?: ---
Product: Koha
Version: master
Hardware: All
OS: All
Status: NEW
Severity: normal
Priority: P5 - low
Component: Searching
Assignee: [email protected]
Reporter: [email protected]
QA Contact: [email protected]
The ICU configuration files contains a rule to remove control characters :
<transform rule="[:Control:] Any-Remove"/>
This rule is before tokenization.
The problem is that "[:Control:]" regex contains linefeed, carriage return and
tab. See http://www.regular-expressions.info/posixbrackets.html.
So when several lines are indexed, last word of line is joined with first line
of next line. Thoses words are then not searchable.
For example :
First line
Second line
This will become "First lineSecond line", tokenized as "First", "lineSecond"
and "line".
--
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/