The 360/91 and maybe the 6600 were prominent in books on pipelined architectures for many years.
The goal of the 360/91 was one instruction per clock cycle, so not superscalar, but still a tough goal with slow core memory. (16 way interleaved, but at over 10 times the cycle time.) A pipelined execution unit is not necessary for superscalar performance, as a machine could simultaneously execute any number of one cycle instructions. Means for fetching, decoding, and arranging execution units to execute those instructions may or may not need pipelining. (Following RISC philosophy, one could leave many of the important decisions up to the compiler, and use wide enough memory to fetch enough instructions on one cycle.) -- glen