According to the 21066 HRM, the processor loads its initial I-stream from the SROM. All Icache bits are loaded from the SROM, including the cache block metadata. The blocks are loaded in sequential order starting with block 0 and ending with block 255. For the 20166, the Icache is loaded LSB first filling from left to right (i.e. bit 0 of LW0 will be the first bit loaded). This is the resulting order of each cache block:
BHT LW7 LW5 LW3 LW1 V ASM ASN TAG LW6 LW4 LW2 LW0 I thought I had some code do unmultiplex each bit stream from an SROM image and then reconstruct the resulting memory image, but I can't find it or I just thought about doing that.