In case others might find it useful, I've shared this beasting script I have. It's especially useful when trying to harden or replicate a SolrCloud test timing issue. Nothing works better at flushing out the weeds than running the same test N times against itself. If you find it hard to replicate what you see on jenkins, chances are that this will do it in my experience.
There is an easy to read results file that will note the failed runs and sub dirs to look at for the logs. https://gist.github.com/markrmiller/dbdb792216dc98b018ad - Mark