When we looked at the issue (I’m not looking at the code), it didn’t look like a lack of CPU problem - it looked like, by default you would get one thread, I suppose trying to be the same as you’d get previously by default. Except, you’d now get Lucene’s parallel segment search code path with that one thread instead of the standard code path. And that was slow. To get the old behavior, you’d have to use -1 as you suggest.
And it does seem like a bug to me, and I also think the default should have been -1 (or any fix that would still get you the standard Lucene code path if you haven’t turned on parallel segment search)