Hi Jason:

I guess the latest CPLD is quite important for us.

But for Uboot, I am not sure. Will all PPC registers be re-initialized again in 
the linux core? Or the linux core use default value initialized by Uboot? Will 
Uboot load OS into Dram before set the program pointer to the OS start address? 
Or OS can load itself into the Dram?

Thanks

Wan

-----Original Message-----
From: Jason Manley [mailto:jasonman...@gmail.com]
Sent: Wednesday, 4 November 2009 7:40 PM
To: Wormnes, Kjetil (ATNF, Marsfield)
Cc: Marc Welz; David George; Cheng, Wan (ATNF, Marsfield); 
casper@lists.berkeley.edu
Subject: Re: [casper] Fwd: Re: SPDO ROACH spectrometer

Also, make sure you're running newwer versions of uboot and the CPLD
image. Bus settings changed some months back and improved stability
significantly.

Uboot will report the versions, and I recommend:

U-Boot 2008.10-svn2226 (Aug  7 2009 - 16:06:44)
...
Monitor Revision: 8.3.1698
CPLD Revision:    8.1.0

at the very least, you should have CPLD Revision 8.0.1588.

The only outstanding bug that regularly affects me is that u-boot
sometimes doesn't detect the PPC's SDRAM on startup. The system then
hangs. Replacing the DIMM with registered memory (same as FPGA DIMM)
apparently fixes this.

Jason

On 04 Nov 2009, at 07:56, Kjetil Wormnes wrote:

> Hi all,
>
> For reference I've attached a summary of our problems below, and a
> few things I have attempted to do to isolate it. The short of it is
> that we are unable to transfer large amounts of data across the
> ethernet reliably regardless of;
> --kernel version
> --whether we are usb mount or nfs mount root file system.
> --network protocol used for transfer
>
> The way the crash happens varies, and is not repeatable. Sometime it
> seems to be a userspace crash, sometimes it is a kernel panic. I
> have been unable to see any real pattern in the crash reports. This
> to me seems to indicate that the root cause of the problem may be
> common, and either an obscure kernel problem or possibly something
> in the interface between the kernel and the hardware or in the
> hardware itself.
>
> It wouldn't be a big effort to re implement our software to run on a
> remote machine and talk to the ROACH over KATCP, rather than run
> locally on the ppc. But since it would require a complete rewrite of
> the software, we haven't tested this yet. Perhaps it is worth trying.
>
> The catch is that I am still really unsure whether we are dealing
> with many symptoms of the same problem; or many different problems.
>
> Anyway, I would like to thank you for all your input, and will let
> you know if and how we find a satisfactory solution.
>
> cheers
>
> Kjetil
>
>
>
> Here is the summary:
>
>
>
> *The problem*
> The system crashes when downloading large files. There appears to be
> varying causes for this crash that may or may not have a common
> underlying reason.
>
> I have attempted to isolate the problem by
> * Downloading using different protocols and software; ssh and two
> different ftp servers.
> * Mounting the filesystem over NFS as opposed to USB
> * Installing well-known and used kernels, and comparing to custom
> kernels.
> SSH
> SSH always crashes with "Invalid MAC on input" or related error
> messages. This appears to be a problem with SSH.
>
> *FTP*
> System instabilities were observed using two different ftp servers;
> proftpd and pure-ftpd.
>
> In the best case, with pure-ftpd was able to download 2-3 files,
> each of size about 2GB before system crashing. Looking through the
> call stack seemed to indicate that the crash happened in EMAC
> interface functions. (ie ethernet).
>
> However, we have no way of knowing whether these crashes are in fact
> rather side-effects of the USB subsystem misbehaving. Jason from the
> Casper mailing list has once again reconfirmed that USB on powerpcs
> is "notoriously unreliable".
>
> *DIFFERENT KERNELS - DIFFERENT PROBLEMS*
> Using some kernels (the latest) saw the link unable to come up at
> all, while both a custom compiled older kernel (a couple of months
> ago) and a downloaded image, "uImage-20091006-mmcfix" both saw the
> link come up, but with all the crashes described.
>
> *ELIMINATING USB AS A CAUSE*
> To eliminate the effects of USB, I mounted the root filesystem
> remotely using NFS. I make a few observations;
>
> *SSH*
> Still dies from time to time with the "Invalid MAC" error message.
> This was expected as we have already pretty much determined that
> this error is ssh-specific and not related to our other worries.
>
> *ETHERNET*
> Comes up nicely. System mounts remotely and file access has not
> caused any obvious problems. In fact I have not really had any
> problems that I can trace directly back to the Ethernet.
>
> That being said, the systems seems to crash after a little while
> with this setup also. The error messages have been varying. Only
> once has it been a kernel crash, and then, looking at the call stack
> it no longer appears to crash inside EMAC access functions.
>
> The download speeds seem quite variable; but this is probably more
> likely due to the network since the operating system is over NFS
> than the ROACH board itself.
>
>
>
>
>
> Jason Manley wrote:
>> Marc Welz or David George built that kernel. They are the best
>> people  to ask about this. I've cc'd them, though I'm not sure
>> either would  have the config file from that release. It might be
>> easiest to  checkout an older svn version.
>>
>> Might I suggest that instead of recording data to a USB HDD, that
>> you  rather record it across the network to another computer? If
>> you don't  want to use KATCP for dumping the data directly from
>> your FPGA, you  can always mount an NFS network share on your ROACH
>> and record the  data there. The USB on the PPC platforms are
>> notoriously unreliable.
>>
>> Jason
>>
>> On 03 Nov 2009, at 03:05, Kjetil Wormnes wrote:
>>
>>
>>> Hi Jason,
>>>
>>> Thank you again for your reply. I can use FTP or even write my
>>> own  little raw socket transfer routine, and it seems to work, I
>>> can  transfer  a few gigabyte-size files.
>>>
>>> However, at the end of this, the other problem kicks in; causing
>>> a  system crash. I believe this is a kernel problem, as it
>>> exhibits  itself differently with different kernels I have tried.
>>>
>>> So, putting the ssh problem aside as something that we can work
>>> around and returning to the other request I made;
>>>
>>> I am compiling my own kernel because I seem to need to in order
>>> to  get EHCI and EXT3 to work properly.
>>>
>>> However, when I do, EMAC can't autonegotiate a link, and even
>>> forcing it to something doesn't work. The link comes up, then
>>> drops  out again... repeatedly.
>>>
>>> The interesting thing is this problem *does not* occur when I
>>> compile my kernel using an svn checkout from a couple of months
>>> ago.  Even with the exact same .config file.
>>>
>>> At least this is the case as far as I can tell.
>>>
>>> Now, in order to be 100% sure that it is in fact a difference in
>>> the  source that is causing this problem, rather than just
>>> the .config. I  would love it if you could send me the .config
>>> file used to compile  the uImage-20091006-mmcfix kernel.
>>>
>>> The ethernet interface does appear to be more stable with that
>>> kernel, but unfortunately I can't use it as it doesn't allow USB
>>> 2.0  speeds, so if you please, the .config file would be very
>>> useful.
>>>
>>> Thanks again for all your help
>>>
>>>
>>> Kjetil
>>>
>>>
>>> Jason Manley wrote:
>>>
>>>> There appears to be some issue with ssh on ROACH with large
>>>> transfers.  It is definitely not a hardware problem as other
>>>> network transfers  work fine. Both Andrew Martens and myself
>>>> regularly transfer large  amounts of data (>1GB) using KATCP.
>>>> This  ssh bug has become a low  priority for us as we concentrate
>>>> on  other things. If you do not want  to try'n debug it yourself,
>>>> I  recommend you try an FTP server.
>>>>
>>>> Kjetil, you are correct; at present, KATCP does not support
>>>> transfer  of arbitrary files from filesystem.
>>>>
>>>> Jason
>>>>
>>>> On 02 Nov 2009, at 00:51, Kjetil Wormnes wrote:
>>>>
>>>>
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> thank you for your reply. The SUN link was very descriptive.
>>>>>
>>>>> Firstly, it appears the problem is still there with the kernel
>>>>> build  you suggested/ After a few megabytes, the connection
>>>>> closes  telling  me; "Corrupted MAC on input".
>>>>>
>>>>> But interestingly it seems to have solved another problem that
>>>>> I  was  having with one of our ROACH boards. It would be great
>>>>> if you  could  send me the .config file for that build so I can
>>>>> compare it  with  mine. I have a custom kernel as I like ext3
>>>>> support and a  few other  bits and pieces, but have been having
>>>>> some issues  getting the  network to establish a stable link.
>>>>>
>>>>> Now, back to the problem; We have a locally attached harddrive
>>>>> that  we are writing our data to over USB. Occasionally we want
>>>>> to  connect  and download these. That's why I am using ssh. I
>>>>> can't  really use  KATCP for this, can I?
>>>>>
>>>>> Thanks again,
>>>>>
>>>>> Kjetil
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Jason Manley wrote:
>>>>>
>>>>>
>>>>>> Um, no, this is probably a different problem. You are getting
>>>>>> these  errors while using SSH/SCP, right? The hardware problem
>>>>>> with  faulty  PHY manifests as one or more of the PHY LEDs
>>>>>> flashing on/ off (there  are three red ones next to the PHY
>>>>>> chip). If your link  is stable, then  I believe the hardware
>>>>>> is  fine.
>>>>>>
>>>>>> The "MAC" problem appears to be software related, and comes
>>>>>> and   goes  depending on the kernel build. It does not refer to
>>>>>> the  MAC  address,  but rather ssh's Machine Authentication
>>>>>> Code.  Check out 
>>>>>> http://blogs.sun.com/janp/entry/ssh_messages_code_bad_packet
>>>>>>      for some info.
>>>>>>
>>>>>> Dave's made various changes to try'n fix it, and increasing
>>>>>> some   software buffer has solved it for me. I no longer see
>>>>>> this  problem,  but it's probably been masked rather than
>>>>>> solved.  Also,  you never see  it using KATCP, which is one
>>>>>> more reason to  use that  method for larger  transfers.
>>>>>>
>>>>>> WRT large (>1GB) transfers, remember that it will take a long
>>>>>> time  to  pull that much data off the FPGA. It does so in
>>>>>> pages  of  ~4000Bytes at  a time. Also make sure you're using
>>>>>> the  latest  kernel. We discovered a  bug in this paging system
>>>>>> during  the  workshop. 
>>>>>> http://casper.berkeley.edu/svn/trunk/roach/sw/binaries/linux/uImage-20091006-mmcfix
>>>>>>      should be good. I have never tried pulling such volumes
>>>>>> over  the  SSH  shell, but it works fine with KATCP.
>>>>>>
>>>>>> I will ask him to comment further.
>>>>>>
>>>>>> Jason
>>>>>>
>>>>>>
>>>>>> On 30 Oct 2009, at 01:25, John Ford wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>> casper collaborators,
>>>>>>>>
>>>>>>>> appended below is further info on roach ethernet problems
>>>>>>>> seen   at  CSIRO:
>>>>>>>> any ideas?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> If I recall correctly, Alan mentioned this problem at the
>>>>>>> workshop,  and
>>>>>>> the problem was that some of the PHY chips were faulty at
>>>>>>> one    point.  This
>>>>>>> may be what's going on.  Hopefully someone knows for sure!
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> dan
>>>>>>>>
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: Re: SPDO ROACH spectrometer
>>>>>>>> Date: Fri, 30 Oct 2009 09:19:01 +1100
>>>>>>>> From: Kjetil Wormnes <kjetil.worm...@csiro.au>
>>>>>>>> To: Dan Werthimer <d...@ssl.berkeley.edu>
>>>>>>>>
>>>>>>>> Hi Dan and Wan
>>>>>>>>
>>>>>>>> I can confirm that we are seeing at least some of the
>>>>>>>> problems  with
>>>>>>>> another ROACH board as well. This time it is connected
>>>>>>>> directly  to a
>>>>>>>> computer with a short CATY5 cable.
>>>>>>>>
>>>>>>>> So maybe this indicates that it is less likely to be a
>>>>>>>> hardware   problem?
>>>>>>>> Incidentally, the error message that happens when attempting
>>>>>>>> to   download
>>>>>>>> a large file over sftp is "Corrupted MAC on input".
>>>>>>>>
>>>>>>>> cheers
>>>>>>>>
>>>>>>>> Kjetil
>>>>>>>>
>>>>>>>> Dan Werthimer wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> hi wan,
>>>>>>>>>
>>>>>>>>> i don't know of anyone who has roach ethernet
>>>>>>>>> problems at 100 Mbit/sec.
>>>>>>>>>
>>>>>>>>> i'm cc'ing casper community to see if anyone has any ideas.
>>>>>>>>> in general, it's good to post questions to cas...@lists,
>>>>>>>>> so that everyone can help answer, and everyone can see the
>>>>>>>>> answers,
>>>>>>>>> and the info will be captured in the wiki/email archive.
>>>>>>>>>
>>>>>>>>> if you want you can buy or ask digicom if they can send you
>>>>>>>>> another national PHY chip and see if this helps.
>>>>>>>>>
>>>>>>>>> also you might want to try using short cable, and/or a cat6
>>>>>>>>> cable.
>>>>>>>>> is your roach connected directly to a computer, or going
>>>>>>>>> through a switch?  might be interesting to try a different NIC
>>>>>>>>> or different switch or different computer.
>>>>>>>>>
>>>>>>>>> best,
>>>>>>>>>
>>>>>>>>> dan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/29/2009 02:47 PM, wan.ch...@csiro.au wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hi Dan:
>>>>>>>>>>
>>>>>>>>>> I believe you have done a very nice job.
>>>>>>>>>>
>>>>>>>>>> My problem is Ethernet port is not very reliable. Even
>>>>>>>>>> running at
>>>>>>>>>> 100MHz, the Ethernet port will be disconnected at some
>>>>>>>>>> times.   Normally,
>>>>>>>>>> it can resume after reboot whole system.
>>>>>>>>>>
>>>>>>>>>> And I could not transfer big file through ethernet. Small
>>>>>>>>>> files  like a
>>>>>>>>>> few MB are all right. But I could not download 1GB file
>>>>>>>>>> from   Roach at
>>>>>>>>>> all.
>>>>>>>>>>
>>>>>>>>>> So Dan, could this problem be solved by replacing the on
>>>>>>>>>> board  PHY?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>> Wan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>>
>>
>>
>


Reply via email to