Beyond that: Are weak keys even detectable using a ciphertext-only attack (beyond simply trying them - but that can be done with *any* small set of keys)?

Yes, generally, that's the definition of a weak key.

But that's an odd attack to defend against - why not just try all the weak keys (or, again, any small subset of keys) and see if they work?

Because that's the definition of brute forcing, and generally the key distribution is close to uniform in any [symmetric] system that is worth a second glance?

do "continuous online testing": Compute the entropy of the generated ciphertext, and its correlation with the plaintext, and sound an alarm if what you're getting looks "wrong".

This is a decent idea. Of course, there are scads of problems that are not detectable by a simple memoryless markov model, but this would be a decent sanity check on all but the smallest of plaintexts. I would also want continuous monitoring of my HWRNG outputs; maybe I wouldn't want a simple entropy check, which a properly-functioning HWRNG will fail with a probability predicted by chance, but perhaps a graphical display of the previous values. I'm not a visual thinker, but I don't think any amount of statistics are going to be as useful in detecting deviations from uniformity as a plot and a human brain.