First thing I'd check are the versions of uboot, the kernel and romfs on the
boards.
I have also seen similar sensor misbehaviour on one of our boards where the I2C
bus was broken. Can you plug a USB cable into the back and watch the uboot
bootup sequence?
You should see a message similar to this:
CPU: AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
No Security/Kasumi support
Bootstrap Option C - Boot ROM Location EBC (16 bits)
32 kB I-Cache 32 kB D-Cache
Board: ROACH2
I2C: ready
DRAM: 512 MiB
Flash: 128 MiB
In: serial
Out: serial
Err: serial
CPLD: 2.1
USB: Host(int phy)
SN: ROACH2.2 batch=D#9#98 software fixups match
MAC: 02:44:01:02:09:62
DTT: 1 is 21 C
DTT: 2 is 21 C
Net: ppc_4xx_eth0
...
Make sure the "DTT" lines have sensible values, and that there are no errors
about the IIC bus scrolling past.
Jason
On 29 Oct 2015, at 6:06, Gary, Dale E. <[email protected]> wrote:
> Hi All,
>
> We have 8 identical (I hope) ROACH-2 boards. We have been using 4 of them
> for a long time, and I just brought two more online, but one of them is
> behaving differently than the others. One problem is that the sensor list is
> different. If I run the katcp routine to get the sensor list
>
> reply, sensors = ro[5].fpga.blocking_request(Message.request('sensor-list'))
>
> the message returns quickly but has a short list:
>
> sensors:
> [<Message inform sensor-list (mode, current\_mode, none, discrete, raw)>,
> <Message inform sensor-list (raw.temp.ambient, Ambient\_board\_temperature,
> millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
> <Message inform sensor-list (raw.temp.ppc, PowerPC\_temperature,
> millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
> <Message inform sensor-list (raw.temp.fpga, FPGA\_temperature, millidegrees,
> integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
> <Message inform sensor-list (raw.temp.inlet, Inlet\_ambient\_temperature,
> millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
> <Message inform sensor-list (raw.temp.outlet, Outlet\_ambient\_temperature,
> millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>]
>
>
> If I run the katcp routine to get the sensor values, the command takes ~ 8 s
> and returns bad values
>
> reply, vals =
> ro[5].fpga.blocking_request(Message.request('sensor-value'),timeout=10)
>
> vals:
> [<Message inform sensor-value (1446089462529, 1, mode, unknown, raw)>,
> <Message inform sensor-value (1446089463542, 1, raw.temp.ambient, nominal,
> -1)>,
> <Message inform sensor-value (1446089464554, 1, raw.temp.ppc, nominal, -1)>,
> <Message inform sensor-value (1446089465566, 1, raw.temp.fpga, nominal, -1)>,
> <Message inform sensor-value (1446089468602, 1, raw.temp.inlet, nominal, 0)>,
> <Message inform sensor-value (1446089471638, 1, raw.temp.outlet, nominal,
> 0)>]
>
> Doing this on a good board returns immediately and gives the much longer
> sensor list:
>
> vals:
> [<Message inform sensor-value (1446089513976, 1, mode, unknown, raw)>,
> <Message inform sensor-value (1446089513984, 1, raw.temp.ambient, nominal,
> 34000)>,
> <Message inform sensor-value (1446089513984, 1, raw.temp.ppc, nominal,
> 49000)>,
> <Message inform sensor-value (1446089513984, 1, raw.temp.fpga, nominal,
> 59000)>,
> <Message inform sensor-value (1446089513987, 1, raw.fan.chs1, nominal,
> 7650)>,
> <Message inform sensor-value (1446089513990, 1, raw.fan.chs2, nominal,
> 7650)>,
> <Message inform sensor-value (1446089513993, 1, raw.fan.fpga, nominal,
> 5730)>,
> <Message inform sensor-value (1446089513996, 1, raw.fan.chs0, nominal,
> 7650)>,
> <Message inform sensor-value (1446089513998, 1, raw.temp.inlet, nominal,
> 34000)>,
> <Message inform sensor-value (1446089514000, 1, raw.temp.outlet, nominal,
> 32750)>,
> <Message inform sensor-value (1446089514006, 1, raw.voltage.1v, nominal,
> 1004)>,
> <Message inform sensor-value (1446089514007, 1, raw.voltage.1v5, nominal,
> 1498)>,
> <Message inform sensor-value (1446089514007, 1, raw.voltage.1v8, nominal,
> 1808)>,
> <Message inform sensor-value (1446089514007, 1, raw.voltage.2v5, nominal,
> 2497)>,
> <Message inform sensor-value (1446089514008, 1, raw.voltage.3v3, nominal,
> 3360)>,
> <Message inform sensor-value (1446089514008, 1, raw.voltage.5v, nominal,
> 5098)>,
> <Message inform sensor-value (1446089514008, 1, raw.voltage.12v, nominal,
> 3936)>,
> <Message inform sensor-value (1446089514009, 1, raw.voltage.3v3aux, nominal,
> 3388)>,
> <Message inform sensor-value (1446089514015, 1, raw.voltage.5vaux, nominal,
> 5055)>,
> <Message inform sensor-value (1446089514015, 1, raw.current.3v3, nominal,
> 420)>,
> <Message inform sensor-value (1446089514015, 1, raw.current.2v5, nominal,
> 1009)>,
> <Message inform sensor-value (1446089514016, 1, raw.current.1v8, nominal,
> 500)>,
> <Message inform sensor-value (1446089514016, 1, raw.current.1v5, error,
> 7050)>,
> <Message inform sensor-value (1446089514016, 1, raw.current.1v, error,
> 31240)>,
> <Message inform sensor-value (1446089514017, 1, raw.current.5v, nominal,
> 7777)>,
> <Message inform sensor-value (1446089514017, 1, raw.current.12v, nominal,
> 16127)>]
>
> I also tried telneting into the bad ROACH, and got the same short list, long
> sensor value request time, and bad sensor values.
>
> Another symptom is that setting the ADC registers on the KATADC board does
> not seem to work, or at least one of the ADCs is misbehaving in the same way
> they did when we were not setting the registers correctly.
>
> Has anyone seen this before? Is it hardware, firmware, software?
>
> Thanks,
> Dale