Re: Two questions: quality, understanding

Edward K. Ream Sun, 30 Oct 2011 11:47:55 -0700

On Sun, Oct 30, 2011 at 9:19 AM, Edward K. Ream <[email protected]> wrote:


> Leo needs some "subsystem" tests.

These tests can be packaged as unit tests, but they are different from
typical unit tests that verify that particular methods work as
advertised.

Such tests have several advantages:

1. Subsystem tests are an expanded form of "double-entry accounting".
As such, they test that a subsystem *as a whole* works as expected,
*regardless* of how the subsystem is designed, constructed, or
redesigned and revised.  Thus, subsystem tests could be called safer
and more stable.

2. Subsystem tests are a solution to a problem that has bedeviled me
ever since I learned about unit testing: namely the fact that great
hunks of Leo's code have no unit tests, and likely never will.
Subsystem tests could be said to cover many methods even if there are
no specific tests for them.

Let's look now at the first two subsystems that come to mind: syntax
coloring and importing.

Syntax coloring
============

Leo's syntax coloring code is extremely complex.  This complexity
involves how pieces work together.  That fact that individual pieces
*appear* to work really does not mean much.

Let **L** be the set of languages that Leo's colorizer supports.  For
each language lang in L, let **T(lang)** be the set of all colorizing
tags supported by the language, and let **M(lang)** be the set of
pattern matchers in leo\modes\lang.py.

I plan to write "covering" tests that verify the following, for each
language lang in L:

A.  the test generates every in T(lang).

B. the test calls every matcher in M(lang).

C. there is a color setting in leoSettings.leo for every tag in T(lang).

The intention of these tests is simply that Leo's colorizer tests will
find and report any possible syntax problem in any any of the
languages in L.

Importers
=======

This idea arose yesterday as I struggled to find a way to verify that
the newly-revised html importer worked properly.

In fact, there is a fatal flaw in the present scheme that tests the
html import process.  Let me digress to show what the flaw is.

The recently reported bug is, in essence, that the present
import-checking code can not deduce that the differences in the
following two lines are *not* a real failure of the import process.
Original::

    <table ...>\t<tr valign="top">\t<td width="377">

Imported::

    <table ...>\n<tr valign="top">\n<td width="377">

That is, tabs in the original have been replaced by newlines.

Yesterday I spent several hours working on "comparison" code that
would somehow deal with this difference.  I came up with a "fancy"
comparison that would deal with the fact that the imported version of
the code has many more lines than the original.

But in fact, the fancy comparison fails in exactly the same places
that the original comparison code failed!

After much thought, and talking out loud to Rebecca, I suddenly saw
that *string-oriented* comparison has *no chance* of working, and even
if it could be made to work, it would be unsatisfactory.

Happily, there is a much better way:  *scanner-based* comparisons.
This is, in concept, very close to a *parser-based* comparison, that
is, verifying that the parse trees of the original and imported
versions are identical.

However, scanner-based comparisons have several *big* advantages over
a parser-based comparisons:

1.  Leo's importers *already contain* all the essentials to define a
language-specific scanner for every language known to the importers.
This is the key advantage.

2. Unlike parsers, scanners explicitly recognize whitespace, comments
and any other "irrelevant" details that would be ignored by typical
parsers.

3. The "import scanners" would be tailored towards comparisons, so
that real import errors can be reported as clearly as possible.  Each
scanner will know (via language-specific settings), exactly which
constructs can, and cannot, be ignored.

4.  Import scanners are a *direct* embodiment of *language-specific*
double-entry accounting.  If the scans of the original and imported
file return the same sequence of tokens, then *by definition* the
import process has been successful.  The present character-oriented
comparison is a series of hacks; scanner-based comparison will be
fundamentally sound.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/leo-editor?hl=en.

Re: Two questions: quality, understanding

Reply via email to