Re: [bitcoin-dev] BIP 39: Add language identifier strings for wordlists

Sjors Provoost via bitcoin-dev Fri, 05 Jan 2018 08:09:45 -0800

I’m not a fan of language specific word lists within the current BIP-39 
standard. Very few wallets support anything other than English, which can lead 
to vendor lock-in and long term loss of funds if a rare non-English wallet 
disappears.


However, because people can memorize things better in their native tongue, 
supporting multiple languages seems quite useful.

I would prefer a new standard where words are mapped to integers rather than to 
a literal string. For each language a mapping from words to integers would be 
published. In addition to that, there would be a mapping from original language 
words to matching (in terms of integer value, not meaning) English words that 
people can print on an A4 paper. This would allow them to enter a mnemonic into 
e.g. a hardware wallet that only support English. Such lists are more likely to 
be around 100 years from now than some ancient piece of software.

This would not work with the current BIP-39 (duress) password, but this feature 
could be replaced by appending words (with or without a checksum for that 
addition).

A replacement for BIP-39 would be a good opportunity to produce a better 
English dictionary as Nic Johnson suggested a while ago:
        • all words are 4-8 characters
        • all 4-character prefixes are unique (very useful for hardware wallets)
        • no two words have edit distance < 2

Wallets need to be able to distinguish between the old and new standard, so 
un-upgraded BIP 39 wallets should consider all new mnemonics invalid. At the 
same time, some new wallets may not wish to support BIP39. They shouldn't be 
burdened with storing the old word list.

A solution is to sort the new word list such that reused words appear first. 
When generating a mnemonic, at least one word unique to the new list must be 
present. A wallet only needs to know the index of the last BIP39 overlapping 
word. They reject a proposed mnemonic if none of the elements use a word with a 
higher index.

For my above point and some related ideas, see: 
https://github.com/satoshilabs/slips/issues/103

Sjors

> Op 5 jan. 2018, om 14:58 heeft nullius via bitcoin-dev 
> <bitcoin-dev@lists.linuxfoundation.org> het volgende geschreven:
> 
> I propose and request as an enhancement that the BIP 39 wordlist set should 
> specify canonical native language strings to identify each wordlist, as well 
> as short ASCII language codes.  At present, the languages are identified only 
> by their names in English.
> 
> Strings properly vetted and recommended by native speakers should facilitate 
> language identification in user interface options or menus.  Specification of 
> language identifier strings would also promote interface consistency between 
> implementations; this may be important if a user creates a mnemonic in 
> Implementation A, then restores a wallet using that mnemonic in 
> Implementation B.
> 
> As an independent implementer who does not know *all* these different 
> languages, I monkey-pasted language-native strings from a popular wiki site.  
> I cannot guarantee that they be all accurate, sensible, or even 
> non-embarrassing.
> 
> https://github.com/nym-zone/easyseed/blob/1a6e48bbdac9366d9d5d1912dc062dfc3f0db2c6/easyseed.c#L99
> ```
>       LANG(english,                   u8"English",    "en",   ascii_space ),
>       LANG(chinese_simplified,        u8"汉语", "zh-CN",ascii_space ),
>       LANG(chinese_traditional,       u8"漢語", "zh-TW",ascii_space ),
>       LANG(french,                    u8"Français",   "fr",   ascii_space ),
>       LANG(italian,                   u8"Italiano",   "it",   ascii_space ),
>       LANG(japanese,                  u8"日本語",        "ja",   u8"\u3000"  ),
>       LANG(korean,                    u8"한국어",        "ko",   ascii_space ),
>       LANG(spanish,                   u8"Español",    "es",   ascii_space )
> ```
> 
> Per the comment at #L85 of the quoted file, I also know that for my short 
> identifiers for Chinese, “zh-CN” and “zh-TW”, are imprecise at best—insofar 
> as Hong Kong uses Traditional; and overseas Chinese may use either.  For 
> differentiating the two Chinese writing variants, are there any appropriate 
> standardized or customary short ASCII language IDs similar to ISO 3166-1 
> alpha-2 which are purely linguistic, and not fit to present-day political 
> boundaries?
> 
> My general suggestion is that the specification of appropriate strings in 
> bitcoin:bips/bip-0039/bip-0039-wordlists.md be made part of the process for 
> accepting new wordlists.  My specific request is that such strings be 
> ascertained for the wordlists already existing, preferably from the persons 
> involved in the original pull requests therefor.
> 
> Should this proposal be “concept ACKed” by appropriate parties, then I may 
> open a pull request suggesting an appropriate format for specifying this 
> information in the repository.  However, I will must needs leave the vetting 
> of appropriate strings to native speakers or experts in the respective 
> languages.
> 
> Prior references:  The wordlist additions at PRs #92, #130 (Japanese); #100 
> (Spanish); #114 (Chinese, both variants); #152 (French); #306 (Italian); #570 
> (Korean); #621 (Indonesian, *proposed*, open).
> _______________________________________________
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

signature.asc
Description: Message signed with OpenPGP

_______________________________________________
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

Re: [bitcoin-dev] BIP 39: Add language identifier strings for wordlists

Reply via email to