I've been asked by Dr. Curr and Professor Snead to forward this message to
this mailing list, as it may be of general interest to Bitcoin users.

Dear Colleague:

In 1967, during excavation for the construction of a new shopping center in
Monroeville, Pennsylvania, workers uncovered a vault containing a cache of
ancient scrolls[1].  Most were severely damaged, but those that could be
recovered confirmed the existence of a secret society long suspected to
been active in the region around the year 200 BC.

Based on a translation of these documents, we now know that the society,
Cult of the Bound Variable, was devoted to the careful study of
over two millennia before the invention of the digital computer.

While the Monroeville scrolls make reference to computing machines made of
sandstone, most researchers believed this to be a poetic metaphor and that
"computers" were in fact the initiates themselves, carrying out the
unimaginably tedious steps of their computations with reed pens on

Within the vault, a collection of sandstone wheels marked in a language
consisting of 32 glyphs was found. After 15 years of study, we have
completed the translation of what is known as "Codex32," a document that
describes the functions of the wheels. It was discovered that the wheels
a system of cryptographic computations that was used by cult members to
safeguard their most valuable secrets.

The Codex32 system allows secrets to be carved into multiple tablets and
scattered to the far corners of the earth. When a sufficient number of
tablets are
brought together the stone wheels are manipulated in a manner to recover the
secrets. This finding may be of particular interest to the Bitcoin

Below we provide a summary of the cult's secret sharing system, which is
graciously hosted at
We are requesting a record assignment in the Bibliography of Immemorial
Philosophy (BIP) repository.

Thank you for your consideration.

Dr. Leon O. Curr and Professor Pearlwort Snead
Department of Archaeocryptography
Harry Q. Bovik Institute for the Advancement

[1] http://www.boundvariable.org/task.shtml

-----BEGIN BIP-----

  BIP: ????
  Layer: Applications
  Title: codex32
  Author: Leon Olsson Curr and Pearlwort Sneed <pearlw...@wpsoftware.net>
  Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-????
  Status: Draft
  Type: ????
  Created: 2023-02-13
  License: BSD-3-Clause
  Post-History: FIXME



This document describes a standard for backing up and restoring the master
seed of a
[https://github.com/bitcoin/bips/blob/master/bip-0032.mediawiki BIP-0032]
hierarchical deterministic wallet, using Shamir's secret sharing.
It includes an encoding format, a BCH error-correcting checksum, and
algorithms for share generation and secret recovery.
Secret data can be split into up to 31 shares.
A minimum threshold of shares, which can be between 1 and 9, is needed to
recover the secret, whereas without sufficient shares, no information about
the secret is recoverable.


This document is licensed under the 3-clause BSD license.


BIP-0032 master seed data is the source entropy used to derive all private
keys in an HD wallet.
Safely storing this secret data is the hardest and most important part of
However, there is a tension between security, which demands limiting the
number of backups, and resilience, which demands widely replicated backups.
Encrypting the seed does not change this fundamental tradeoff, since it
leaves essentially the same problem of how to back up the encryption key(s).

To allow users freedom to make this tradeoff, we use Shamir's secret
sharing, which guarantees that any number of shares less than the threshold
leaks no information about the secret.
This approach allows increasing safety by widely distributing the generated
shares, while also providing security against the compromise of one or more
shares (as long as fewer than the threshold have been compromised).

[https://github.com/satoshilabs/slips/blob/master/slip-0039.md SLIP-0039]
has essentially the same motivations as this standard.
However, unlike SLIP-0039, this standard also aims to be simple enough for
hand computation.
Users who demand a higher level of security for particular secrets, or have
a general distrust in digital electronic devices, have the option of using
hand computation to backup and restore secret data in an interoperable
Note that hand computation is optional, the particular details of hand
computation are outside the scope of this standard, and implementers do not
need to be concerned with this possibility.

[https://github.com/bitcoin/bips/blob/master/bip-0039.mediawiki BIP-0039]
serves the same purpose as this standard: encoding master seeds for storage
by users.
However, BIP-0039 has no error-correcting ability, cannot sensibly be
extended to support secret sharing, has no support for versioning or other
metadata, and has many technical design decisions that make implementation
and interoperability difficult (for example, the use of SHA-512 to derive
seeds, or the use of 11-bit words).



A codex32 string is similar to a Bech32 string defined in [
https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki BIP-0173].
It reuses the base32 character set from BIP-0173, and consists of:

* A human-readable part, which is the string "ms" (or "MS").
* A separator, which is always "1".
* A data part which is in turn subdivided into:
** A threshold parameter, which MUST be a single digit between "2" and "9",
or the digit "0".
*** If the threshold parameter is "0" then the share index, defined below,
MUST have a value of "s" (or "S").
** An identifier consisting of 4 Bech32 characters.
** A share index, which is any Bech32 character. Note that a share index
value of "s" (or "S") is special and denotes the unshared secret (see
section "Unshared Secret").
** A payload which is a sequence of up to 74 Bech32 characters. (However,
see '''Long codex32 Strings''' below for an exception to this limit.)
** A checksum which consists of 13 Bech32 characters as described below.

As with Bech32 strings, a codex32 string MUST be entirely uppercase or
entirely lowercase.
The lowercase form is used when determining a character's value for
checksum purposes.
For presentation, lowercase is usually preferable, but uppercase SHOULD be
used for handwritten codex32 strings.


The last thirteen characters of the data part form a checksum and contain
no information.
Valid strings MUST pass the criteria for validity specified by the Python3
code snippet below.
The function <code>ms32_verify_checksum</code> must return true when its
argument is the data part as a list of integers representing the characters
converted using the bech32 character table from BIP-0173.

To construct a valid checksum given the data-part characters (excluding the
checksum), the <code>ms32_create_checksum</code> function can be used.

<source lang="python">
MS32_CONST = 0x10ce0795c2fd1e62a

def ms32_polymod(values):
    GEN = [
    residue = 0x23181b3
    for v in values:
        b = (residue >> 60)
        residue = (residue & 0x0fffffffffffffff) << 5 ^ v
        for i in range(5):
            residue ^= GEN[i] if ((b >> i) & 1) else 0
    return residue

def ms32_verify_checksum(data):
    if len(data) >= 96:                      # See Long codex32 Strings
        return ms32_verify_long_checksum(data)
    if len(data) <= 93:
        return ms32_polymod(data) == MS32_CONST
    return False

def ms32_create_checksum(data):
    if len(data) > 80:                       # See Long codex32 Strings
        return ms32_create_long_checksum(data)
    values = data
    polymod = ms32_polymod(values + [0] * 13) ^ MS32_CONST
    return [(polymod >> 5 * (12 - i)) & 31 for i in range(13)]

===Error Correction===

A codex32 string without a valid checksum MUST NOT be used.
The checksum is designed to be an error correcting code that can correct up
to 4 character substitutions, up to 8 unreadable characters (called
erasures), or up to 13 consecutive erasures.
Implementations SHOULD provide the user with a corrected valid codex32
string if possible.
However, implementations SHOULD NOT automatically proceed with a corrected
codex32 string without user confirmation of the corrected string, either by
prompting the user, or returning a corrected string in an error message and
allowing the user to repeat their action.
We do not specify how an implementation should implement error correction.
However, we recommend that:

* Implementations make suggestions to substitute non-bech32 characters with
bech32 characters in some situations, such as replacing "B" with "8", "O"
with "0", "I" with "l", etc.
* Implementations interpret "?" as an erasure.
* Implementations optionally interpret other non-bech32 characters, or
characters with incorrect case, as erasures.
* If a string with 8 or fewer erasures can have those erasures filled in to
make a valid codex32 string, then the implementation suggests such a string
as a correction.
* If a string consisting of valid Bech32 characters in the proper case can
be made valid by substituting 4 or fewer characters, then the
implementation suggests such a string as a correction.

===Unshared Secret===

When the share index of a valid codex32 string (converted to lowercase) is
the letter "s", we call the string a codex32 secret.
The subsequent data characters in a codex32 secret, excluding the final
checksum of 13 characters, is a direct encoding of a BIP-0032 HD master

The master seed is decoded by converting the data to bytes:

* Translate the characters to 5 bits values using the bech32 character
table from BIP-0173, most significant bit first.
* Re-arrange those bits into groups of 8 bits. Any incomplete group at the
end MUST be 4 bits or less, and is discarded.

Note that unlike the decoding process in BIP-0173, we do NOT require that
the incomplete group be all zeros.

For an unshared secret, the threshold parameter (the first character of the
data part) is ignored (beyond the fact it must be a digit for the codex32
string to be valid).
We recommend using the digit "0" for the threshold parameter in this case.
The 4 character identifier also has no effect beyond aiding users in
distinguishing between multiple different master seeds in cases where they
have more than one.

===Recovering Master Seed===

When the share index of a valid codex32 string (converted to lowercase) is
not the letter "s", we call the string an codex32 share.
The first character of the data part indicates the threshold of the share,
and it is required to be a non-"0" digit.

In order to recover a master seed, one needs a set of valid codex32 shares
such that:

* All shares have the same threshold value, the same identifier, and the
same length.
* All of the share index values are distinct.
* The number of codex32 shares is exactly equal to the (common) threshold

If all the above conditions are satisfied, the <code>ms32_recover</code>
function will return a codex32 secret when its argument is the list of
codex32 shares with each share represented as a list of integers
representing the characters converted using the bech32 character table from

<source lang="python">
bech32_inv = [
    0, 1, 20, 24, 10, 8, 12, 29, 5, 11, 4, 9, 6, 28, 26, 31,
    22, 18, 17, 23, 2, 25, 16, 19, 3, 21, 14, 30, 13, 7, 27, 15,

def bech32_mul(a, b):
    res = 0
    for i in range(5):
        res ^= a if ((b >> i) & 1) else 0
        a *= 2
        a ^= 41 if (32 <= a) else 0
    return res

def bech32_lagrange(l, x):
    n = 1
    c = []
    for i in l:
        n = bech32_mul(n, i ^ x)
        m = 1
        for j in l:
            m = bech32_mul(m, (x if i == j else i) ^ j)
    return [bech32_mul(n, bech32_inv[i]) for i in c]

def ms32_interpolate(l, x):
    w = bech32_lagrange([s[5] for s in l], x)
    res = []
    for i in range(len(l[0])):
        n = 0
        for j in range(len(l)):
            n ^= bech32_mul(w[j], l[j][i])
    return res

def ms32_recover(l):
    return ms32_interpolate(l, 16)

===Generating Shares===

If we already have ''t'' valid codex32 strings such that:

* All strings have the same threshold value ''t'', the same identifier, and
the same length
* All of the share index values are distinct

Then we can derive additional shares with the <code>ms32_interpolate</code>
function by passing it a list of exactly ''t'' of these codex32 strings,
together with a fresh share index distinct from all of the existing share
The newly derived share will have the provided share index.

Once a user has generated ''n'' codex32 shares, they may discard the
codex32 secret (if it exists).
The ''n'' shares form a ''t'' of ''n'' Shamir's secret sharing scheme of a
codex32 secret.

There are two ways to create an initial set of ''t'' valid codex32 strings,
depending on whether the user already has an existing master seed to split.

====For an existing master seed====

Before generating shares for an existing master seed, it first must be
converted into a codex32 secret, as described above.
The conversion process consists of:

* Choosing a threshold value ''t'' between 2 and 9, inclusive
* Choosing a 4 bech32 character identifier
** We do not define how to choose the identifier, beyond noting that it
SHOULD be distinct for every master seed the user may need to disambiguate.
* Setting the share index to "s"
* Setting the payload to a Bech32 encoding of the master seed, padded with
arbitrary bits
* Generating a valid checksum in accordance with the Checksum section

Along with the codex32 secret, the user must generate ''t''-1 other codex32
shares, each with the same threshold value, the same identifier, and a
distinct share index.
The set of share indexes may be chosen arbitrarily.
The payload of each of these codex32 shares is chosen uniformly at random
such that it has the same length as the payload of the codex32 secret.
For each share, a valid checksum must be generated in accordance with the
Checksum section.

The codex32 secret and the ''t''-1 codex32 shares form a set of ''t'' valid
codex32 strings from which additional shares can be derived as described

====For a fresh master seed====

In the case that the user wishes to generate a fresh master seed, the user
chooses a threshold value ''t'' and an identifier, then generates ''t''
random codex32 shares, using the generation procedure from the previous
As before, each share must have the same threshold value ''t'', the same
identifier, and a distinct share index.

With this set of ''t'' codex32 shares, new shares can be derived as
discussed above. This process generates a fresh master seed, whose value
can be retrieved by running the recovery process on any ''t'' of these

===Long codex32 Strings===

The 13 character checksum design only supports up to 80 data characters.
Excluding the threshold, identifier and index characters, this limits the
payload to 74 characters or 46 bytes.
While this is enough to support the 32-byte advised size of BIP-0032 master
seeds, BIP-0032 allows seeds to be up to 64 bytes in size.
We define a long codex32 string format to support these longer seeds by
defining an alternative checksum.

<source lang="python">
MS32_LONG_CONST = 0x43381e570bf4798ab26

def ms32_long_polymod(values):
    GEN = [
    residue = 0x23181b3
    for v in values:
        b = (residue >> 70)
        residue = (residue & 0x3fffffffffffffffff) << 5 ^ v
        for i in range(5):
            residue ^= GEN[i] if ((b >> i) & 1) else 0
    return residue

def ms32_verify_long_checksum(data):
    return ms32_long_polymod(data) == MS32_LONG_CONST

def ms32_create_long_checksum(data):
    values = data
    polymod = ms32_long_polymod(values + [0] * 15) ^ MS32_LONG_CONST
    return [(polymod >> 5 * (14 - i)) & 31 for i in range(15)]

A long codex32 string follows the same specification as a regular codex32
string with the following changes.

* The payload is a sequence of between 75 and 103 Bech32 characters.
* The checksum consists of 15 Bech32 characters as defined above.

A codex32 string with a data part of 94 or 95 characters is never legal as
a regular codex32 string is limited to 93 data characters and a long
codex32 string is at least 96 characters.

Generation of long shares and recovery of the master seed from long shares
proceeds in exactly the same way as for regular shares with the
<code>ms32_interpolate</code> function.

The long checksum is designed to be an error correcting code that can
correct up to 4 character substitutions, up to 8 unreadable characters
(called erasures), or up to 15 consecutive erasures.
As with regular checksums we do not specify how an implementation should
implement error correction, and all our recommendations for error
correction of regular codex32 strings also apply to long codex32 strings.


This scheme is based on the observation that the Lagrange interpolation of
valid codewords in a BCH code will always be a valid codeword.
This means that derived shares will always have valid checksum, and a
sufficient threshold of shares with valid checksums will derive a secret
with a valid checksum.

The header system is also compatible with Lagrange interpolation, meaning
all derived shares will have the same identifier and will have the
appropriate share index.
This fact allows the header data to be covered by the checksum.

The checksum size and identifier size have been chosen so that the encoding
of 128-bit seeds and shares fit within 48 characters.
This is a standard size for many common seed storage formats, which has
been popularized by the 12 four-letter word format of the BIP-0039 mnemonic.

The 13 character checksum is adequate to correct 4 errors in up to 93
characters (80 characters of data and 13 characters of the checksum). This
is somewhat better quality than the checksum used in SLIP-0039.

For 256-bit seeds and shares our strings are 74 characters, which fits into
the 96 character format of the 24 four-letter word format of the BIP-0039
mnemonic, with plenty of room to spare.

A longer checksum is needed to support up to 512-bit seeds, the longest
seed length specified in BIP-0032, as the 13 character checksum isn't
adequate for more than 80 data characters.
While we could use the 15 character checksum for both cases, we prefer to
keep the strings as short as possible for the more common cases of 128-bit
and 256-bit master seeds.
We only guarantee to correct 4 characters no matter how long the string is.
Longer strings mean more chances for transcription errors, so shorter
strings are better.

The longest data part using the regular 13 character checksum is 93
characters and corresponds to a 400-bit secret.
At this length, the prefix <code>MS1</code> is not covered by the checksum.
This is acceptable because the checksum scheme itself requires you to know
that the <code>MS1</code> prefix is being used in the first place.
If the prefix is damaged and a user is guessing that the data might be
using this scheme, then the user can enter the available data explicitly
using the suspected <code>MS1</code> prefix.

==Backwards Compatibility==

codex32 is an alternative to BIP-0039 and SLIP-0039.
It is technically possible to derive the BIP32 master seed from seed words
encoded in one of these schemes, and then to encode this seed in codex32.
For BIP-0039 this process is irreversible, since it involves hashing the
original words.
Furthermore, the resulting seed will be 512 bits long, which may be too
large to be safely and conveniently handled.

SLIP-0039 seed words can be reversibly converted to master seeds, so it is
possible to interconvert between SLIP-0039 and codex32.
However, SLIP-0039 '''shares''' cannot be converted to codex32 shares
because the two schemes use a different underlying field.

The authors of this BIP do not recommend interconversion.
Instead, users who wish to switch to codex32 should generate a fresh seed
and sweep their coins.

==Reference Implementation==

* [https://secretcodex32.com/docs/2023-02-14--bw.ps Reference PostScript
* FIXME add Python implementation
* FIXME add Rust implementation

==Test Vectors==

===Test vector 1===

This example shows the codex32 format, when used without splitting the
secret into any shares.
The data part contains 26 Bech32 characters, which corresponds to 130 bits.
We truncate the last two bits in order to obtain a 128-bit master seed.

codex32 secret (Bech32):

Master secret (hex): <code>318c6318c6318c6318c6318c6318c631</code>

* human-readable part: <code>ms</code>
* separator: <code>1</code>
* k value: <code>0</code> (no secret splitting)
* identifier: <code>test</code>
* share index: <code>s</code> (the secret)
* data: <code>xxxxxxxxxxxxxxxxxxxxxxxxxx</code>
* checksum: <code>4nzvca9cmczlw</code>

===Test vector 2===

This example shows generating a new master seed using "random" codex32
shares, as well as deriving an additional codex32 share, using ''k''=2 and
an identifier of <code>NAME</code>.
Although codex32 strings are canonically all lowercase, it's also valid to
use all uppercase.

Share with index <code>A</code>:

Share with index <code>C</code>:

* Derived share with index <code>D</code>:
* Secret share with index <code>S</code>:
* Master secret (hex): <code>d1808e096b35b209ca12132b264662a5</code>

Note that per BIP-0173, the lowercase form is used when determining a
character's value for checksum purposes.
In particular, given an all uppercase codex32 string, we still use
lowercase <code>ms</code> as the human-readable part during checksum

===Test vector 3===

This example shows splitting an existing 128-bit master seed into "random"
codex32 shares, using ''k''=3 and an identifier of <code>cash</code>.
We appended two zero bits in order to obtain 26 Bech32 characters (130 bits
of data) from the 128-bit master seed.

Master secret (hex): <code>ffeeddccbbaa99887766554433221100</code>

Secret share with index <code>s</code>:

Share with index <code>a</code>:

Share with index <code>c</code>:

* Derived share with index <code>d</code>:
* Derived share with index <code>e</code>:
* Derived share with index <code>f</code>:

Any three of the five shares among <code>acdef</code> can be used to
recover the secret.

Note that the choice to append two zero bits was arbitrary, and any of the
following four secret shares would have been valid choices.
However, each choice would have resulted in a different set of derived

* <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qqjzqud4m0d6nln</code>
* <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qpte35dvzkjpt0r</code>
* <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qzfatvdwq5692k6</code>
* <code>ms13cashsllhdmn9m42vcsamx24zrxgs3qrsx6ydhed97jx2</code>

===Test vector 4===

This example shows converting a 256-bit secret into a codex32 secret,
without splitting the secret into any shares.
We appended four zero bits in order to obtain 52 Bech32 characters (260
bits of data) from the 256-bit secret.

256-bit secret (hex):

* codex32 secret:

Note that the choice to append four zero bits was arbitrary, and any of the
following sixteen codex32 secrets would have been valid:


===Test vector 5===

This example shows generating a new 512-bit master seed using "random"
codex32 characters and appending a checksum.
The payload contains 103 Bech32 characters, which corresponds to 515 bits.
The last three bits are discarded when converting to a 512-bit master seed.

This is an example of a '''Long codex32 String'''.

* Secret share with index <code>S</code>:
* Master secret (hex):


===Mathematical Companion===

Below we use the Bech32 character set to denote values in GF[32].
In Bech32, the letter <code>Q</code> denotes zero and the letter
<code>P</code> denotes one.
The digits <code>0</code> and <code>2</code> through <code>9</code> do
''not'' denote their numeric values.
They are simply elements of GF[32].

The generating polynomial for our BCH code is as follows.

We extend GF[32] to GF[1024] by adjoining a primitive cube root of unity,
<code>ζ</code>, satisfying <code>ζ^2 = ζ + P</code>.

We select <code>β := G ζ</code> which has order 93, and construct the
product <code>(x - β^i)</code> for <code>i</code> in <code>{17, 20, 46, 49,
52, 77, 78, 79, 80, 81, 82, 83, 84}</code>.
The resulting polynomial is our generating polynomial for our 13 character

    x^13 + E x^12 + M x^11 + 3 x^10 + G x^9 + Q x^8 + E x^7 + E x^6 + E x^5
+ L x^4 + M x^3 + C x^2 + S x + S

For our long checksum, we select <code>γ := E + X ζ</code>, which has order
1023, and construct the product <code>(x - γ^i)</code> for <code>i</code>
in <code>{32, 64, 96, 895, 927, 959, 991, 1019, 1020, 1021, 1022, 1023,
1024, 1025, 1026}</code>.
The resulting polynomial is our generating polynomial for our 15 character
checksum for long strings:

    x^15 + 0 x^14 + 2 x^13 + E x^12 + 6 x^11 + F x^10 + E x^9 + 4 x^8 + X
x^7 + H x^6 + 4 x^5 + X x^4 + 9 x^3 + K x^2 + Y x^1 + H

(Reminder: the character <code>0</code> does ''not'' denote the zero of the

-----END BIP-----
bitcoin-dev mailing list

Reply via email to