[Koha-bugs] [Bug 38729] Linker should consider diacritics

bugzilla-daemon--- via Koha-bugs Fri, 31 Jan 2025 05:50:47 -0800

https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=38729


--- Comment #13 from Janusz Kaczmarek <[email protected]> ---
(In reply to Marcel de Rooy from comment #12)
> (In reply to David Cook from comment #7)
> > This is an interesting one for sure, and it seems like a very real problem.
> > 
> > However, I think that you need to make this feature optional with your
> > current implementation. Your patch would unexpectedly change existing
> > behaviour that people rely upon.
> > 
> > For instance, there are lots of cases where your bib heading doesn't
> > perfectly match the authority (e.g. punctuation difference, minor spelling
> > difference in one of the words in the heading, whitespace difference, etc),
> > but you want it to still match and use the authorized form.
> 
> Good points. Janusz, could you address that concern please?
> Changing status to reflect need for feedback.

Well, lets try. The main aim with this patch is to distinguish between
Latin-based letters that should not be equated with each other. I am aware that
in English alphabet there are just 26 basic letters and may be that from an
anglophone point of view 'Å', 'Ä', 'Ą', 'Á' (to name only a few letters based
on Latin letter 'A') all equal to 'A', but there do not -- they are separate
letters in the respective alphabets (Swedish, German, Polish, Hungarian...):
https://en.wikipedia.org/wiki/Swedish_alphabet,
https://en.wikipedia.org/wiki/Hungarian_alphabet , etc.

Also, it is not a local issue IMO but rather a general one. It will be an issue
in every catalogue collecting international literature. You should definitely
distinguish names like 'Jamroz', 'Jamroż' and 'Jamróz'. I agree it will not
occur in one in ten cases, but this should not be a reason to ignore the issue.
The fact that this is relatively rare is an explanation for why it went
undetected for so long. (BTW this issue reveals itself more easily with large
scale catalogues, with several hundred thousand or several million records.)

At the same time, traditionally, when searching, we expect to find all three
names ('Jamroz', 'Jamroż' and 'Jamróz') by searching without diacritical marks,
i.e. 'Jamroz' (important especially for those who do not use accrual
keyboards). And not just in the Polish catalogue, but in every catalogue. So
Elasticsearch works IMO correctly with current default settings. 

As for the inaccuracy of the record, as a librarian I prefer to have a
controlled field unlinked to the authority record than linked incorrectly and
then perhaps modified incorrectly as a consequence (e.g. Änkor linked to Ankor
will become Ankor when the Ankor authority record is first edited, cf. Bug
33401).

Finally, regarding punctuation, here I am comparing the search_form, which is
constructed by _get_search_heading. This method, among other things, removes
the parentheses and punctuation from the end of each subfield, and so largely
standardizes the notation, bypassing some of the possible problems of
inaccurate notation.

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Koha-bugs mailing list
[email protected]
https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs
website : http://www.koha-community.org/
git : http://git.koha-community.org/
bugs : http://bugs.koha-community.org/

[Koha-bugs] [Bug 38729] Linker should consider diacritics

Reply via email to