First thing I'd check are the versions of uboot, the kernel and romfs on the 
boards.

I have also seen similar sensor misbehaviour on one of our boards where the I2C 
bus was broken. Can you plug a USB cable into the back and watch the uboot 
bootup sequence? 

You should see a message similar to this:

CPU:   AMCC PowerPC 440EPx Rev. A at 533.333 MHz (PLB=133 OPB=66 EBC=66)
      No Security/Kasumi support
      Bootstrap Option C - Boot ROM Location EBC (16 bits)
      32 kB I-Cache 32 kB D-Cache
Board: ROACH2
I2C:   ready
DRAM:  512 MiB
Flash: 128 MiB
In:    serial
Out:   serial
Err:   serial
CPLD:  2.1
USB:   Host(int phy)
SN:    ROACH2.2 batch=D#9#98 software fixups match
MAC:   02:44:01:02:09:62
DTT:   1 is 21 C
DTT:   2 is 21 C
Net:   ppc_4xx_eth0
...

Make sure the "DTT" lines have sensible values, and that there are no errors 
about the IIC bus scrolling past.

Jason 


On 29 Oct 2015, at 6:06, Gary, Dale E. <[email protected]> wrote:

> Hi All,
> 
> We have 8 identical (I hope) ROACH-2 boards.  We have been using 4 of them 
> for a long time, and I just brought two more online, but one of them is 
> behaving differently than the others.  One problem is that the sensor list is 
> different.  If I run the katcp routine to get the sensor list
> 
> reply, sensors = ro[5].fpga.blocking_request(Message.request('sensor-list'))
> 
> the message returns quickly but has a short list:
> 
> sensors:
> [<Message inform sensor-list (mode, current\_mode, none, discrete, raw)>,
>  <Message inform sensor-list (raw.temp.ambient, Ambient\_board\_temperature, 
> millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
>  <Message inform sensor-list (raw.temp.ppc, PowerPC\_temperature, 
> millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
>  <Message inform sensor-list (raw.temp.fpga, FPGA\_temperature, millidegrees, 
> integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
>  <Message inform sensor-list (raw.temp.inlet, Inlet\_ambient\_temperature, 
> millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>,
>  <Message inform sensor-list (raw.temp.outlet, Outlet\_ambient\_temperature, 
> millidegrees, integer, -2147483648, 2147483647, -2147483648, 2147483647)>]
> 
> 
> If I run the katcp routine to get the sensor values, the command takes ~ 8 s 
> and returns bad values
> 
> reply, vals = 
> ro[5].fpga.blocking_request(Message.request('sensor-value'),timeout=10)
> 
> vals:
> [<Message inform sensor-value (1446089462529, 1, mode, unknown, raw)>,
>  <Message inform sensor-value (1446089463542, 1, raw.temp.ambient, nominal, 
> -1)>,
>  <Message inform sensor-value (1446089464554, 1, raw.temp.ppc, nominal, -1)>,
>  <Message inform sensor-value (1446089465566, 1, raw.temp.fpga, nominal, -1)>,
>  <Message inform sensor-value (1446089468602, 1, raw.temp.inlet, nominal, 0)>,
>  <Message inform sensor-value (1446089471638, 1, raw.temp.outlet, nominal, 
> 0)>]
> 
> Doing this on a good board returns immediately and gives the much longer 
> sensor list:
> 
> vals:
> [<Message inform sensor-value (1446089513976, 1, mode, unknown, raw)>,
>  <Message inform sensor-value (1446089513984, 1, raw.temp.ambient, nominal, 
> 34000)>,
>  <Message inform sensor-value (1446089513984, 1, raw.temp.ppc, nominal, 
> 49000)>,
>  <Message inform sensor-value (1446089513984, 1, raw.temp.fpga, nominal, 
> 59000)>,
>  <Message inform sensor-value (1446089513987, 1, raw.fan.chs1, nominal, 
> 7650)>,
>  <Message inform sensor-value (1446089513990, 1, raw.fan.chs2, nominal, 
> 7650)>,
>  <Message inform sensor-value (1446089513993, 1, raw.fan.fpga, nominal, 
> 5730)>,
>  <Message inform sensor-value (1446089513996, 1, raw.fan.chs0, nominal, 
> 7650)>,
>  <Message inform sensor-value (1446089513998, 1, raw.temp.inlet, nominal, 
> 34000)>,
>  <Message inform sensor-value (1446089514000, 1, raw.temp.outlet, nominal, 
> 32750)>,
>  <Message inform sensor-value (1446089514006, 1, raw.voltage.1v, nominal, 
> 1004)>,
>  <Message inform sensor-value (1446089514007, 1, raw.voltage.1v5, nominal, 
> 1498)>,
>  <Message inform sensor-value (1446089514007, 1, raw.voltage.1v8, nominal, 
> 1808)>,
>  <Message inform sensor-value (1446089514007, 1, raw.voltage.2v5, nominal, 
> 2497)>,
>  <Message inform sensor-value (1446089514008, 1, raw.voltage.3v3, nominal, 
> 3360)>,
>  <Message inform sensor-value (1446089514008, 1, raw.voltage.5v, nominal, 
> 5098)>,
>  <Message inform sensor-value (1446089514008, 1, raw.voltage.12v, nominal, 
> 3936)>,
>  <Message inform sensor-value (1446089514009, 1, raw.voltage.3v3aux, nominal, 
> 3388)>,
>  <Message inform sensor-value (1446089514015, 1, raw.voltage.5vaux, nominal, 
> 5055)>,
>  <Message inform sensor-value (1446089514015, 1, raw.current.3v3, nominal, 
> 420)>,
>  <Message inform sensor-value (1446089514015, 1, raw.current.2v5, nominal, 
> 1009)>,
>  <Message inform sensor-value (1446089514016, 1, raw.current.1v8, nominal, 
> 500)>,
>  <Message inform sensor-value (1446089514016, 1, raw.current.1v5, error, 
> 7050)>,
>  <Message inform sensor-value (1446089514016, 1, raw.current.1v, error, 
> 31240)>,
>  <Message inform sensor-value (1446089514017, 1, raw.current.5v, nominal, 
> 7777)>,
>  <Message inform sensor-value (1446089514017, 1, raw.current.12v, nominal, 
> 16127)>]
> 
> I also tried telneting into the bad ROACH, and got the same short list, long 
> sensor value request time, and bad sensor values.
> 
> Another symptom is that setting the ADC registers on the KATADC board does 
> not seem to work, or at least one of the ADCs is misbehaving in the same way 
> they did when we were not setting the registers correctly.
> 
> Has anyone seen this before?  Is it hardware, firmware, software?
> 
> Thanks,
> Dale


Reply via email to