https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=24720
--- Comment #11 from David Gustafsson <[email protected]> --- (In reply to David Cook from comment #6) > I will say though that my own manual texting of the regular expression looks > good. > > I've plugged in English, French, and Chinese into my tests and it's all > working as expected. (I tried Arabic as well and I think that worked too, > although I imagine Séverine could confirm that better than me.) > > If we can just get those automated unit tests to prove that, then I think > we'd be good to go. I added one simple test. I first attempted to test multiple variants with different scripts but it turned out to be more effort than think is warranted for this simple line of code. Firstly, since sort field are concatenated you have to add multiple different fields with mappings for different strings to be testable (since multiple values on same field are concatenated), no big deal but is it really worth it? (Could break into separate method and test that, but then we are missing out making sure it's actually called in Koha::SearchEngine::Elasticsearch::marc_records_to_documents and don't get the same coverage, and we would be creating a method just to be able to test this piece of code in isolation, which if we where to be consistent and treat the whole code-base this way would create a mess). I don't know if an issue with my terminal/editor or something else, but when testing other scripts, like Chinese I also get error messages like "UTF-8 "\xC3" does not map to Unicode at /usr/share/perl5/MARC/File/Encode.pm line 35." before even reaching the regular expression. What we are testing is effectively be the trivial regular expression "s/^\W+//u", that is to replace all non word unicode characters at the beginning of the string with the empty string (remove them). I think the Perl core implementation regarding separating word and non-word characters in different scripts is not something that we should be writing tests for, I would trust Perl in this case. But sure, to make sure we get to that piece of code, and that the regexp does not do something completely unexpected doesn't hurt. (In reply to Nick Clemens from comment #7) > I agree with the ask for unit tests - they can be added to tests for > marc_records_to_documents in t/db_dependent/Koha/SearchEngine/Elasticsearch.t > > Additionally, I think this should be optional in search field configs: > 1 - Title already has options for 'non-filing' characters - so if you want > to ignore a leading '[' you can > 2 - Author names may contain real punctation, consider the band '!!!' > https://en.wikipedia.org/wiki/!!! (In reply to David Cook from comment #8) > Good point, Nick! > > There's nothing more frustrating for a library user than the search not > retrieving records that you know it has... I made a mistake describing the bug as "Strip special chars from search fields", so I can understand the confusion. It should be sort, not search fields. I have corrected this in the commit message. This fix doesn't have any impact on search fields so will not effect what is search on (the stripping of special characters for search fields is already handled in Elastic). So the band name '!!!' would still be searchable, but would probably have the same sort weight as other strings containing only non-word characters, but this is more of an edge case that will probably never have any major impact in real life situations. To for example start allowing some special characters and not others I think could be a potientially larger source of confusion, it's also hard to know how to decide which characters should be exempted. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
