Hi Pádraig,

Pádraig Brady <p...@draigbrady.com> writes:

> A 58 character encoding that:
>  - avoids visually ambiguous 0OIl characters
>  - uses only alphanumeric characters
> Described at:
>  - https://tools.ietf.org/html/draft-msporny-base58-03
>
> This implementation uses GMP (or gnulib's gmp fallback).
> Performance is good in comparison to other implementations.
> For example when using libgmp, encoding is 6 times faster,
> and decoding 28 times faster than the implementation
> using arbitrary precision ints in cypthon 3.13.
>
> Memory use is proportional to the size of input.
>
> Encoding benchmarks:
>
>   $ time yes | head -c65535 | src/basenc --base58 -w0 >file.enc
>   real    0m1.533s
>
>   $ ./configure --without-libgmp && make  # gnulib gmp
>   $ time yes | head -c65535 | src/basenc --base58 -w0 >file.enc
>   real    0m3.587s
>
>   # dnf install python3-base58
>   $ time yes | head -c65535 | base58 >file.enc  # cpython 3.13
>   real    0m9.700s
>
> Decoding benchmarks:
>
>   $ time src/basenc --base58 -d <file.enc >/dev/null
>   real    0m0.299s
>
>   $ ./configure --without-libgmp && make  # gnulib gmp
>   $ time src/basenc --base58 -d <file.enc >/dev/null
>   real    0m1.469s
>
>   $ time base58 -d <file.enc >/dev/null  # cpython 3.13
>   real    0m8.302s
>
> * src/basenc.c (base_decode_ctx_finalize, base_encode_ctx_init,
> base_encode_ctx, base_encode_ctx_finalize): New functions to
> provide more general processing functionality.
> (base58_{de,en}code_ctx{_init,,_finalize}): New functions to
> accumulate all input before calling ...
> (base58_{de,en}code): ... the GMP based encoding/decoding routines.
> (do_encode, do_decode): Call the ctx variants if enabled.
> * doc/coreutils.texi (basenc invocation): Describe the new option,
> and indicate the main use case being interactive user use.
> * src/local.mk: Link basenc with GMP.
> * tests/basenc/basenc.pl: Add test cases.
> * NEWS: Mention the new feature.
> ---
>  NEWS                   |   5 +
>  doc/coreutils.texi     |   9 +
>  src/basenc.c           | 361 ++++++++++++++++++++++++++++++++++++++++-
>  src/local.mk           |   1 +
>  tests/basenc/basenc.pl |  42 +++++
>  5 files changed, 413 insertions(+), 5 deletions(-)

Interesting, is this encoding used anywhere outside of bitcoin?

Just curious, the encoding seems interesting regardless.

> +static void
> +base58_encode (char const* data, size_t data_len,
> +               char *out, idx_t *outlen)
> +{
> +    affirm (base_length (data_len) <= *outlen);
> +
> +    size_t leading_zeros = 0;
> +    while (leading_zeros < data_len && data[leading_zeros] == 0)
> +      leading_zeros++;
> +
> +    /* Init GMP integer from binary (base 256) data.  */
> +    mpz_t num;
> +    mpz_init (num);
> +    mpz_import (num, data_len, 1, 1, 0, 0, data);
> +
> +    char *ptr = out + *outlen;  /* Start just beyond end.  */
> +
> +    /* Convert to base 58 by repeatedly dividing by 58.  */
> +    mpz_t quotient, remainder;
> +    mpz_init (quotient);
> +    mpz_init (remainder);
> +    while (mpz_cmp_ui (num, 0) > 0)
> +      {
> +        mpz_fdiv_qr_ui (quotient, remainder, num, 58);
> +        unsigned long rem_val = mpz_get_ui (remainder);
> +        *(--ptr) = base58_alphabet[rem_val];
> +        mpz_set (num, quotient);
> +      }
> +
> +    /* Account for leading zeros.  */
> +    ptr -= leading_zeros;
> +    memset (ptr, '1', leading_zeros);
> +
> +    /* Prepare return.  */
> +    *outlen -= (ptr - out);
> +    memmove (out, ptr, *outlen);
> +
> +    mpz_clear (num);
> +    mpz_clear (quotient);
> +    Mpz_clear (remainder);
> +
> +    return;
> +}

Did you mean to use 4 spaces for indentation here?

Collin

Reply via email to