(I am posting this to chat because it is obviously not a bug in
LFS/BLFS. But it is amusing and puzzling and I hope someone here will
comment.)
(If you are wondering what the title means then see here[1].
Sometimes reviewing legacy code feels the same way.)
I often grep /usr/share/dict/words when I am not sure how to spell a
word. This usually works very well, but recently I have found a few
unexpected results.
When I wanted to know the correct number of 'c's and 'r's in the word
"occur" and its variants I ended up grepping for '^occurre'. Try it!
Some of the first few results are sensible but most of the rest of it
looks like fake Latin. (Or maybe it is real Latin; I wouldn't know.)
It gets worse if you want to know how many consecutive 's's a word
should have. Grepping for 'sss$' reports many strange things. I
wondered if some algorithm had automatically appended an 's' to many
words ending in "ss". But no! If you grep for 'ss$' (and page the
results because there are lots of them) you will see that most words
ending in "ss" do not have an "sss" variant. So where are these
relatively few 'sss$' matches coming from? I have no idea.
And if you really want a laugh, grep for '^sss'. Seriously?
But there is more! Did you think that /usr/share/dict/words had one
word per line? Then you were wrong! At least for the Unix definition
of wrong! This time I will show the results.
$ wc -- /usr/share/dict/words
1671704 1671731 16861960 /usr/share/dict/words
Yes, /usr/share/dict/words has *more* words than lines! How can this
be? Maybe there is a character that we think is a letter but that
grep does not? Let's take a wild guess and grep for "'". Seriously,
try it!
Have you finished laughing yet?
The really funny thing is that it misspells "Muad'Dib"[2].
Comments?
Regards,
Jeremy Henty
[1]: https://en.wikipedia.org/wiki/Bog_snorkelling
[2]: https://en.wikipedia.org/wiki/Muad'Dib
--
http://lists.linuxfromscratch.org/listinfo/lfs-chat
FAQ: http://www.linuxfromscratch.org/blfs/faq.html
Unsubscribe: See the above information page