https://bugs.documentfoundation.org/show_bug.cgi?id=163652

            Bug ID: 163652
           Summary: Properties of the com.sun.star.util.SearchDescriptor
                    do not cover Matchdiacritics
           Product: LibreOffice
           Version: 24.8.2.1 release
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: sdk
          Assignee: [email protected]
          Reporter: [email protected]

In the predefined properties for com.sun.star.util.SearchDescriptor, there is
no Property labelled as "SearchDiacriticSensitive" to search with sensitivity
for diacritics as there is "SearchCaseSensitive" to search with sensitivity for
case.

The reason it should be there is:
1. Match Case and Match Diacritics are very similar in implementation. with
SearchCaseSensitive off: "s" matches  "s", "S" and "ß" (sharp S used in the
German language). Where as with SearcDiacriticSensitive off: "s" would match
"s", "ś", "ṣ", etc.

2. Right now Lo's implementation doesn't follow the collation strength rules.
We can search while ignoring case and accents but not just case (since accent
is ignored by default). Ideally IMHO we should have simple option to toggle
between the first three collation strengths:

[Quote]
The Strength attribute determines whether accent or case is taken into account
when collating or comparing text strings. In writing systems without case or
accent, the Strength attribute controls similarly important features.
The possible values are: primary (1), secondary (2), tertiary (3), quaternary
(4), and identity (I). 

To ignore:

    —accent and case, use the primary strength level
    —case only, use the secondary strength level
    —neither accent nor case, use the tertiary strength level

Almost all characters can be distinguished by the first three strength levels,
therefore in most locales the default Strength attribute is set at the tertiary
level. However if the Alternate attribute (described in a following row) is set
to shifted, then the quaternary strength level can be used to break ties among
white space characters, punctuation marks, and symbols that would otherwise be
ignored.
[End of Quote]
https://www.ibm.com/docs/en/db2/11.5?topic=collation-unicode-algorithm-based-collations
https://www.php.net/manual/en/collator.setstrength.php

with the SearchCaseSensitive property we can switch between strength level 2 &
3. And by implementing SearchDiacriticSensitive property we could switch
between strength level 1 & 2.

3. Microsoft Office API, Apple and Opensearch provide feature for
diacritic-sensitivity or ASCIIfolding:
https://learn.microsoft.com/en-us/office/vba/api/word.find.matchdiacritics
https://developer.apple.com/documentation/foundation/nsstring/compareoptions/1412313-diacriticinsensitive
https://opensearch.org/docs/latest/analyzers/token-filters/asciifolding/

4. The idea should be that if we can type it, then we should be able to find
it.
With today's keyboard layouts (whether Android or Apple phone or on Computers),
it is very easy to type diacritics and accents. So we should have a property to
find them as well.

It will be great if LO also has an equivalent so that it will help with macros
and other search and replace features.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to