On 2017/07/05 15:28, Michael Paquier wrote:
I have finally been able to look at this patch.
Thanks for reviewing and the new version of the patch.
(Surprised to see that generate_unaccent_rules.py is inconsistent on
MacOS, runs fine on Linux).
def get_plain_letter(codepoint, table):
"""Return the base codepoint without marks."""
if is_letter_with_marks(codepoint, table):
- return table[codepoint.combining_ids[0]]
+ if len(table[codepoint.combining_ids[0]].combining_ids) > 1:
+ # Recursive to find the plain letter
+ return get_plain_letter(table[codepoint.combining_ids[0]],table)
+ elif is_plain_letter(table[codepoint.combining_ids[0]]):
+ return table[codepoint.combining_ids[0]]
+ else:
+ return None
elif is_plain_letter(codepoint):
return codepoint
else:
- raise "mu"
+ return None
The code paths returning None should not be reached, so I would
suggest adding an assertion instead. Callers of get_plain_letter would
blow up on None, still that would make future debugging harder.
def is_letter_with_marks(codepoint, table):
- """Returns true for plain letters combined with one or more marks."""
+ """Returns true for letters combined with one or more marks."""
# See
http://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values
return len(codepoint.combining_ids) > 1 and \
- is_plain_letter(table[codepoint.combining_ids[0]]) and \
+ (is_plain_letter(table[codepoint.combining_ids[0]]) or\
+ is_letter_with_marks(table[codepoint.combining_ids[0]],table))
and \
all(is_mark(table[i]) for i in codepoint.combining_ids[1:]
This was already hard to follow, and this patch makes its harder. I
think that the thing should be refactored with multiple conditions.
if is_letter_with_marks(codepoint, table):
- charactersSet.add((codepoint.id,
+ if get_plain_letter(codepoint, table) <> None:
+ charactersSet.add((codepoint.id,
This change is not necessary as a letter with marks is not a plain
character anyway.
Testing with characters having two accents, the results are produced
as wanted. I am attaching an updated patch with all those
simplifications. Thoughts?
Thanks, so pretty. The patch is fine to me.
---
Thanks and best regards,
Dang Minh Huong
---
このEメールはアバスト アンチウイルスによりウイルススキャンされています。
https://www.avast.com/antivirus
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers