Jason,

The parts both looked fine, they just didn't turn on. No indication of heat damage on the part, board, or traces.

The first roach we found this on is SN 020145 and the second one is SN 030191.

Thanks,
Jason


At 01:16 PM 6/19/2012, Jason Manley wrote:
Thanks for the feedback!

I'd also like to understand this problem a little better.

Q13 sits on the 5V rail and the P-channel MOSFET is rated at -11A with a 13mohm on resistance. That's good for over 50W. Was there any indication of heat damage on the failed parts (due to overloading or maybe the heat sinking was bad)?

Or did the part look fine, it just didn't turn on anymore? There is a resistor between the gate and source, so the gate doesn't float even when not in use. It really shouldn't have broken due to ESD after it was installed.

We have had batches of boards with bad components before... once even due to passives (termination resistors that didn't all have the right resistance) which is the last thing we expect to fail. These are usually all caught in the factory during the standard off-the-line tests.

Can you supply the serial numbers of these boards so we can add this issue to our database? We'll then monitor to see if it re-occurs on any other boards.

Jason

On 19 Jun 2012, at 19:05, John Ford wrote:

>> Jason R. and John,
>> Was the roach running a particularly intensive design at the time
>> around the failure? Just wondering why this part would be failing. Is
>> the current limit somehow being exceeded?
>
> We don't know about the first one, because it came to us from Socorro, but
> the second roach was being used to test the tutorials, so I don't think it
> was particularly heavily loaded.
>
> I had a thought that we should check the serial numbers and see if they
> are from the same batch.  Maybe some bad parts or ESD damage?
>
> John
>
>> Thanks,
>> Glenn
>>
>> On Tue, Jun 19, 2012 at 9:52 AM, Jason Ray <j...@nrao.edu> wrote:
>>> The first time I was troubleshooting this problem, I did see a fault on
>>> the
>>> 1V supply with roach_monitor.py.  I didn't check roach_monitor.py on the
>>> second roach because the problem was so fresh in our mind we just jumped
>>> to
>>> the finish line and checked the mosfet with a meter, then replaced it.
>>>
>>> For reference, the part in question is Q13 (FD6675BZ).
>>>
>>> Thanks,
>>> Jason
>>>
>>>
>>>
>>> At 09:33 AM 6/19/2012, Jason Manley wrote:
>>>>
>>>> Good sleuthing!
>>>>
>>>> FWIW, roach_monitor.py is supposed to be able to pull the log out of
>>>> the
>>>> Actel Fusion, which should have logged a fault on the 1V rail before
>>>> shutting-down the board. This should work independent of PPC or dmesg
>>>> states. I'm afraid I have little faith in the Fusion/Xport combo to
>>>> reliably
>>>> catch these issues, but it has helped me a few times.
>>>>
>>>> If it works, it only retrieves the reason for the last shutdown, so
>>>> you'll
>>>> have to plug a laptop into the Xport to query it directly after it
>>>> self-shutdown.
>>>>
>>>> Jason
>>>>
>>>> On 19 Jun 2012, at 15:23, John Ford wrote:
>>>>
>>>>> Hi all.  We've had a couple of ROACH failures with identical causes.
>>>>> Maybe some of you have seen this, but it's worth keeping in mind in
>>>> case
>>>>> you have a problem.
>>>>>
>>>>> The symptom is that the ROACH would sort of power on, but then turn
>>>> off
>>>>> spontaneously.  On one, as soon as the bof was loaded the roach would
>>>>> turn
>>>>> off.  The other one would come on for a brief few seconds and then
>>>> turn
>>>>> or, or it would cycle on and off.  The monitor readout in dmesg gave
>>>>> non-sense readings.
>>>>>
>>>>> In any event, the cause was traced to the +1 volt supply MOSFET
>>>> switch.
>>>>> Replacing that mosfet fixed both roaches.  Kudos to Jason Ray for
>>>>> finding
>>>>> the problem originally.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>
>
>


Reply via email to