On 24/01/2026 14:57, Pádraig Brady wrote:
On 24/01/2026 14:17, [email protected] wrote:
Op 24-01-2026 om 13:45 schreef Pádraig Brady:
On 22/01/2026 16:39, [email protected] wrote:
If you come up with a script, you'll have to send it out to
each of the translators. Don't use me as an intermediate.
Ok I'll have a look at doing that,
and possibly proposing updated translations.
It may not be in place for this release,
but should be in place for the next.
Well... what should I do meanwhile with the current POT file?
If I announce it to the translators now, there will be some
who will do all the work to update the strings, only to find
out upon the next release that most of that work could have
been a lot easier.
How about postponing the 9.10 release for two weeks and work
together on a script in these last days of January?
That does sound reasonable.
I asked claude sonnet 4.5 to code a script to split
the combined options in the existing po files
to separate translations per option.
It basically one shotted it in about 10 seconds.
At that stage there are only whitespace differences
and msgmerge is able to fuzzy match those fine.
Testing the attached script with the latest pl.po for example:
$ mkdir -p po/test
$ cd po/test
$ wget https://translationproject.org/latest/coreutils/pl.po
$ ./split.py pl.po | msguniq --no-wrap > pl-split.po
I tested that this merged in well with the latest pot:
$ msgmerge pl-split.po ../coreutils.pot > pl-new.po
split.py will keep the older single line layout in translations,
but that can be adjusted over time if desired,
and the highlighting code works fine with that layout anyway.
So I presume we could just update all latest po files with this script,
then upload all these "split" po files without involving translators?
Then before release we can download and merge as usual.
cheers,
Padraig
#!/usr/bin/env python3
import sys
import re
def split_po_entries(lines):
i = 0
while i < len(lines):
line = lines[i]
if line.strip() == 'msgid ""':
start_i = i
msgid_lines = []
i += 1
while i < len(lines) and lines[i].startswith('"'):
msgid_lines.append(lines[i])
i += 1
if i < len(lines) and lines[i].strip() == 'msgstr ""':
msgstr_lines = []
i += 1
while i < len(lines) and lines[i].startswith('"'):
msgstr_lines.append(lines[i])
i += 1
def is_option(line):
if line.startswith('" --'):
return True
if line.startswith('" -'):
text = line[4:]
if text.startswith('M '):
return False
if len(text) > 0 and text[0] != ' ':
return True
if re.match(r'^" \S+ -\S\S \S+ ', line):
return True
if re.match(r'^" [a-zA-Z0-9_]+=[a-zA-Z0-9_]+ ', line):
return True
return False
has_options = any(is_option(line) for line in msgid_lines)
if has_options:
first_non_empty = None
for j, line in enumerate(msgid_lines):
if line.strip() not in ('""', '"\\n"'):
first_non_empty = j
break
if first_non_empty is not None and is_option(msgid_lines[first_non_empty]):
msgid_lines = msgid_lines[first_non_empty:]
msgstr_lines = msgstr_lines[first_non_empty:] if first_non_empty < len(msgstr_lines) else msgstr_lines
msgid_groups = []
msgstr_groups = []
msgid_indices = [0]
for j in range(1, len(msgid_lines)):
if is_option(msgid_lines[j]):
msgid_indices.append(j)
msgid_indices.append(len(msgid_lines))
msgstr_indices = [0]
for j in range(1, len(msgstr_lines)):
if is_option(msgstr_lines[j]):
msgstr_indices.append(j)
msgstr_indices.append(len(msgstr_lines))
for k in range(len(msgid_indices) - 1):
msgid_groups.append(msgid_lines[msgid_indices[k]:msgid_indices[k+1]])
for k in range(len(msgstr_indices) - 1):
msgstr_groups.append(msgstr_lines[msgstr_indices[k]:msgstr_indices[k+1]])
for msgid_group, msgstr_group in zip(msgid_groups, msgstr_groups):
print('msgid ""')
for line in msgid_group:
print(line, end='')
print('msgstr ""')
for line in msgstr_group:
print(line, end='')
print()
continue
for j in range(start_i, i):
print(lines[j], end='')
continue
print(line, end='')
i += 1
if __name__ == '__main__':
if len(sys.argv) > 1:
with open(sys.argv[1]) as f:
split_po_entries(f.readlines())
else:
split_po_entries(sys.stdin.readlines())