I tried it using the latest version of PREG_CLASS_SEARCH_EXCLUDE from Drupal 6, and the value of PREG_CLASS_UNICODE_WORD_BOUNDARY in Drupal 7. Using the latest from Drupal 6 resolves the bug, while using the new constant from Drupal 7 the bug still occurs, because as Robert noted, it doesn't include the Prolonged Sound mark.
So, I've pushed a patch to gerrit with the latest Drupal 6 version: https://reviews.mahara.org/2394 Here's how to test it: 1. Go to Adminisration/Configure Site 2. Under "Search Settings", make sure you've got the "internal" search plugin activated 3. Under "General Settings", tick the "Enable Profile Search" feature 4. Create a journal entry whose text contains the string サーバー which means "server". 5. Create another journal entry, whose text contains the string サバ which means "mackerel". 6. Navigate to Portfolio->Pages. 7. You should have a sideblock called "Search my portfolio". Search for サバ Expected Result: You will only find the journal entry containing サバ Erroneous Result: You will find both journal entries, the one with サーバー and the one with サバ -- You received this bug notification because you are a member of Mahara Contributors, which is subscribed to Mahara. Matching subscriptions: Subscription for all Mahara Contrib members https://bugs.launchpad.net/bugs/1072972 Title: Internal search ignores 'KATAKANA-HIRAGANA PROLONGED SOUND MARK' Status in Mahara ePortfolio: In Progress Status in Mahara 1.7 series: New Bug description: Mahara's (1.5.6) internal search cannot handle Japanese character 'KATAKANA-HIRAGANA PROLONGED SOUND MARK'. This character 'ー' is frequently used. For example 'データ (data)', 'サーバー (server)' or 'ポートフォリオ (portfolio)'. The cause of problem is line 1102 in search/internal/lib.php. 1102: $text = preg_replace('/['. PREG_CLASS_SEARCH_EXCLUDE . ']+/u', ' ', $text); In this line, mahara replaces special characters specified by PREG_CLASS_SEARCH_EXCLUDE with ' '. And 'KATAKANA-HIRAGANA PROLONGED SOUND MARK' is included in PREG_CLASS_SEARCH_EXCLUDE. The solution of this problem is very simple. Just remove 'KATAKANA- HIRAGANA PROLONGED SOUND MARK' (code 0x30fc) from PREG_CLASS_SEARCH_EXCLUDE. We can find the definition on line 1198-1225. 1221: '\x{3099}-\x{309e}\x{30a0}\x{30fb}-\x{30fe}\x{3190}-\x{319f}\x{31c0}-\x{31cf}'. should be replaced with 1221: '\x{3099}-\x{309e}\x{30a0}\x{30fb}\x{30fd}\x{30fe}\x{3190}-\x{319f}\x{31c0}-\x{31cf}'. P.S. The definition of PREG_CLASS_SEARCH_EXCLUDE is originally from Drupal, and this fix was already applied. http://api.drupal.org/api/drupal/modules!search!search.module/constant/PREG_CLASS_SEARCH_EXCLUDE/6 To manage notifications about this bug go to: https://bugs.launchpad.net/mahara/+bug/1072972/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~mahara-contributors Post to : [email protected] Unsubscribe : https://launchpad.net/~mahara-contributors More help : https://help.launchpad.net/ListHelp

