I use spell a lot.
But often I get random misspelled words output that I cannot find in the
source.
Today I looked closer and saw that deroff -w was corrupting my non-roff
input.
$ jot 1000 | number | grep [a-z] | xargs | tee ~/tmp/long-line | spell
hree
hund
nty
sevent
thi
tw
ty
With same input through deroff -w
cat ~/tmp/long-line | deroff -w | egrep -5
'^hree|^hund$|^nty|^sevent$|^thi$|^tw$|^ty$'
You can see the corruption from deroff itself.
The problem happens at around
750 words, 6147 characters, then at
10248 characters, then at 14344 characters, ...
spell by default uses deroff -w
-w Output a word list, one `word' (string of letters, digits, and
apostrophes, beginning with a letter; apostrophes are removed)
per line, and all other characters ignored. Normally, the output
follows the original, with the deletions mentioned above.
(apostrophes are not removed but other punctuation and spaces are
removed. deroff -w removes punctuation at end of words but not in
middle of word unless is dash or underscore which splits words.)
I didn't install delatex or detex
but wrote a simple wrapper and added to spell script a -n option
to use it to make a word list. The deroff manual is not complete and I
didn't read the source yet to full mimic -w.
Is this a known spell or deroff -w bug causing it to chop my long lines
causing misspellings? Should I open a gnats ticket for this?