On Mon, Apr 13, 2009 at 11:23 AM, Sarah Jelinek <Sarah.Jelinek at sun.com> 
wrote:
> Hi Mike,
>
> Thank you for taking the time to use AI and to provide such detailed
> feedback. We appreciate it! My comments/questions inline...

I'm happy to do so.

>> Wanboot is slow. ?Really slow. ?I'm not sure of the exact time, but
>> downloading the 167 MB boot_archive took ~40 minutes. ?In tests that I
>> did last week, I was able to push over 900 Mbit/sec between the same
>> two boxes. ?If wanboot cannot be improved due to problems with
>> openboot or similar, the boot_archive needs to be stripped down to the
>> point that it knows about network drivers and whatever is needed to
>> load the image into a ramdisk.
>>
>
> Really? I have been testing sparc the last week and it only took on the
> order of 5 minutes or so to download the boot archive with wanboot. When you
> were able to push the 900Mbit/sec speed in testing, what were the
> differences, other than wanboot delivering the data? Are others attached and
> using the same network?
>

This problem is not unique to AI.  I've recently switched over to
wanboot for S9+ installations and have seen the same thing across many
platforms (e.g. V240, 480R, 25K, T2000, T5220, and others).  Where
available, the firmware that addresses the "wanboot reflects packets"
bug was applied.

Here's an example of the wanboot log transferring a file between a
zone in the primary LDom and a guest LDom in the same T5220.  (I
trimmed some stuff out the middle of each line to prevent line wraps.)

Apr 13 18:04:27 soltrain21 wanbootfs: Read 72 of 368 kB (19%)
Apr 13 18:04:27 soltrain21 wanbootfs: Read 152 of 368 kB (41%)
Apr 13 18:04:28 soltrain21 wanbootfs: Read 232 of 368 kB (63%)
Apr 13 18:04:28 soltrain21 wanbootfs: Read 312 of 368 kB (84%)
Apr 13 18:04:28 soltrain21 wanbootfs: Read 368 of 368 kB (100%)
Apr 13 18:04:28 soltrain21 wanbootfs: Download complete
Apr 13 18:04:28 soltrain21 miniroot: Read 3672 of 183600 kB (2%)
Apr 13 18:05:30 soltrain21 miniroot: Read 7352 of 183600 kB (4%)
Apr 13 18:05:54 soltrain21 miniroot: Read 11032 of 183600 kB (6%)
Apr 13 18:06:15 soltrain21 miniroot: Read 14712 of 183600 kB (8%)
Apr 13 18:06:42 soltrain21 miniroot: Read 18392 of 183600 kB (10%)
Apr 13 18:07:19 soltrain21 miniroot: Read 22072 of 183600 kB (12%)
Apr 13 18:07:45 soltrain21 miniroot: Read 25752 of 183600 kB (14%)
Apr 13 18:07:52 soltrain21 miniroot: Read 29432 of 183600 kB (16%)
Apr 13 18:08:16 soltrain21 miniroot: Read 33112 of 183600 kB (18%)
Apr 13 18:08:40 soltrain21 miniroot: Read 36792 of 183600 kB (20%)
Apr 13 18:08:49 soltrain21 miniroot: Read 40472 of 183600 kB (22%)
Apr 13 18:09:30 soltrain21 miniroot: Read 44152 of 183600 kB (24%)
Apr 13 18:10:24 soltrain21 miniroot: Read 47832 of 183600 kB (26%)
Apr 13 18:10:31 soltrain21 miniroot: Read 51512 of 183600 kB (28%)
Apr 13 18:11:06 soltrain21 miniroot: Read 55192 of 183600 kB (30%)
Apr 13 18:11:13 soltrain21 miniroot: Read 58872 of 183600 kB (32%)
Apr 13 18:11:59 soltrain21 miniroot: Read 62552 of 183600 kB (34%)
...
Apr 13 18:26:30  soltrain21 Download complete

That is, in the 451 seconds between 18:04:28 and 18:11:59 it
transferred 58880 KB, for a rate of less than 130 KB/s or about 1
Mb/sec.  Once the wanboot image is downloaded, a snippet of code that
I insert into system.conf uses profetch to download the proper
configuration of JASS.  Here's the apache access_log snippit from
that:

[13/Apr/2009:18:27:27 +0000] "GET /cgi-bin/getjass?release=2008Q4
HTTP/1.1" 200 37753986

Take a look at the attached graph.  Throughput while the wanboot
program is downloading is typically 5 MB/s or less.  When profetch
downloads JASS, throughput jumps to over 40 MB/sec.  Note that getjass
is assembling a tar.gz file during the transfer so it is unlikely that
the network is the bottleneck.

> I would like a bit more information on the network during the wanboot
> process. Normal network trouble shooting data, like bandwidth, dropped
> packets, retries... As I said, I did not see the times you are describing.

I captured kstats during the test with...

while true ; do
    now=`date +%T`
    kstat -p vsw:{0,1} e1000g:{0,1} | sed "s/^/$now /"
    sleep 10
done | tee /var/tmp/wanboot-kstat

After the test I look for badness with:

nawk '$0 ~ /(err|no|flo)/ && $NF != 0' /var/tmp/wanboot-kstat

It does show an increasing number of "unknowns" for e1000g0, but
otherwise shows no output.  I can provide detailed output and
configuration info off-list if needed.

>> I started out with an LDom with 700 MB of memory. ?That failed with an
>> error message that made a lot of sense to me, but my guess is that the
>> typical person may be confused. ?I lost the exact message.
>>
>> After the first failure, I added 1 GB of RAM. ?This time it errored
>> out when pkg couldn't find pkg.opensolaris.org. ?Because vi is not in
>> the miniroot (sigh) and my svccfg-foo is lacking, I worked around this
>> with something like:
>>
>
> well.. file enhancement requests for inclusion of these if you want.
> However, in trying to keep the microroot small choices have to be made about
> what we must include.

Understood.  I was more commenting about error messages in the 700 MB
case.  I don't really expect to be able to do anything useful with any
OS that has less than 1 GB of memory.

>>
>> # cd /lib/svc/method
>> # mv auto-installer auto-installer.orig
>> # cat > auto-installer
>> #! /bin/sh
>>
>> http_proxy=...
>> export http_proxy
>> /lib/svc/method/auto-installer.orig "$@"
>> ^D
>> # svcadm clear auto-install
>>
>> This got the installation going. ?Then I ran into bug 6804.
>>
>> http://defect.opensolaris.org/bz/show_bug.cgi?id=6804
>>
>> I added 300 MB of memory (now at 2024 MB) and tried again. ?With the
>> aforementioned http_proxy workaround applied again, the installation
>> completed without problems. ?Hooray!
>>
>>
>> Along the way I also had troubles due to...
>>
>> - When the install server is rebooted, it doesn't start serving
>> ?whatever it was serving before. ?Each time I needed to run
>> ?"installadm start sparc-preview"
>>
>
> I believe this will be fixed with the putback for 7218 which was integrated
> on 4/6. Not sure if the bits you installed had this changeset.

I downloaded the iso a day or two after it was announced.  I probably
didn't use fresh enough AI bits.

>>
>> - installadm doesn't cause an apache instance to start to serve
>> ?wanboot.cgi. ?As such, after rebooting the zone I needed to run
>>
>> ? ? ? ?/usr/apache2/2.2/bin/httpd \
>> ? ? ? ? ? ? ? ?-f /var/installadm/ai-webserver/ai-httpd.conf
>>
>>
>>
>
> Hmm.. this should work I believe as a result of the putback for 4488. Let me
> take a look at the changes that went in for this bug and get back to you.

Again, could be due to somewhat hold AI bits.  It could also be that I
realized that apache wasn't running (and started it) before I realized
that installadm wasn't serving (7218).

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: wanboot-download.png
Type: image/png
Size: 13414 bytes
Desc: not available
URL: 
<http://mail.opensolaris.org/pipermail/caiman-discuss/attachments/20090413/2feb36ef/attachment.png>

Reply via email to