Those all sound like good things to add to the testsuite. (The tests in the 
metamath-exe and metamath-knife repos are I think better developed, although in 
principle some of their tests could be fed back to a testsuite which could be 
run on any verifier).

On September 16, 2025 12:44:47 PM PDT, Matthew House <[email protected]> 
wrote:
>A few issues I spotted, many of which are nitpicks and some of which are
>more serious:
>
>   - The verifier does not check that each file only uses printable ASCII +
>   "\t\n\f\r". This can be problematic if not checked, since differences in
>   how characters outside this set are handled (e.g., what counts as white
>   space) can lead to discrepancies between verifiers.
>   - The verifier does not check that keyword tokens are separated by white
>   space, e.g., it accepts $($) as a valid comment, or foo$a|- bar$. as a
>   valid $a statement.
>   - The verifier does not check that comments do not contain $( or $).
>   - The verifier does not check that ${ $} delimiters for blocks are
>   balanced within each file.
>   - The verifier does not check that math symbols do not contain $.
>   - The verifier accepts a meaningless dangling $ character at the end of
>   a file.
>   - The verifier does not check that active constants and variables cannot
>   be redeclared, nor that variables cannot be declared as constants or vice
>   versa. It has no support for variables becoming inactive at the end of
>   their containing block.
>   - The verifier does not check that math symbols in $d statements are
>   distinct active variables.
>   - The verifier does not check that all labels are unique, nor that all
>   labels are distinct from all declared math symbols.
>   - The verifier does not check that typecodes are active constants.
>   - The verifier does not check that $f statements use an active variable,
>   nor that all variables in an $e, $a, or $p statement have an active $f
>   statement.
>   - The verifier does not check that a variable has at most one active $f
>   statement at a time, nor that every $f statement for a variable has the
>   same typecode.
>   - The verifier does not ignore repeated inclusions of the same file. For
>   instance, a file test.mm containing $[ test.mm $] will lead to a runaway
>   memory leak.
>   - The verifier does not ignore comments in certain contexts in
>   statements. For instance, it will interpret foo $a $( baz $) |- bar $.
>   as a statement with typecode $( and expression $( baz $) |- bar, which
>   is nonsense and will lead to discrepancies between verifiers.
>   - The verifier does not check for integer overflow in compressed-label
>   steps. This allows multiple ways to write step numbers, as well as ways to
>   write the special code Z as a number, which will lead to discrepancies
>   between verifiers.
>   - The verifier has no support for unknown steps ?. After reading a proof
>   with a ? step, a typical verifier will continue verifying the rest of
>   the database as usual, but warn the user that it contains incomplete proofs.
>   - *The verifier does not check that a proof cannot use its own label in
>   a step.* This allows incorrect proofs such as $[ set.mm $] evil $p |- ph
>   $= ( evil ) AB $. which insist upon themselves.
>   - *The verifier does not check that typecodes match when substituting
>   variables.* This allows incorrect proofs such as $[ set.mm $] evil $p |-
>   ph $= ( vx wn weq wal equid ax-gen ax6 pm2.24ii ) BCZJDZBEAKBJFGBJHI $.
>   which exploit syntax ambiguities.
>
>Writing a correct Metamath verifier requires very careful attention to
>detail, which it seems that LLMs still struggle with, short of directed
>hand-holding or extraordinarily thorough test suites. The original
>mmverify.py code is quite compact for how many details it captures.
>
>Matthew House
>
>On Tue, Sep 16, 2025 at 5:03 AM Zarathustra Goertzel <[email protected]>
>wrote:
>
>> I was curious if Codex could write a Metamath verifier, so asked ChatGPT
>> for a list of verifiers and what languages it thought would be both *easy
>> and interesting*: it chose Go.
>>
>> I started from my mmverify.py repo (where iIhave a too-slow verifier in
>> another weird language, MeTTa <https://metta-lang.dev/>), so it had
>> access to reference implementations.
>>
>> The task was completed pretty easily (perhaps in 40-60min of AI-work) with
>> my feeding in some errors.  It seems to pass David Wheeler's Metamath-test
>> suite <https://github.com/zariuq/metamath-test>.
>>
>> It's here if anyone's curious:
>> https://github.com/zariuq/mmverify.py/tree/mmverifyGo/go
>>
>> FWIW, I decided to try to see if it could do Metamath Zero, and I must say
>> it struggles (even to just do .mm0 + .mmu verification).
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Metamath" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/d/msgid/metamath/e55ea1ba-b3c2-4018-80a1-a46149ead54fn%40googlegroups.com
>> <https://groups.google.com/d/msgid/metamath/e55ea1ba-b3c2-4018-80a1-a46149ead54fn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>
>-- 
>You received this message because you are subscribed to the Google Groups 
>"Metamath" group.
>To unsubscribe from this group and stop receiving emails from it, send an 
>email to [email protected].
>To view this discussion visit 
>https://groups.google.com/d/msgid/metamath/CADBQv9aKA%3DMBJY6SvJNqneUhm54jP1FQ186WK8VSgSbCENt57w%40mail.gmail.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Metamath" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/metamath/836103D7-290E-4EF9-93C2-7F1CFE6620CD%40panix.com.

Reply via email to