Hi Laszlo,

Thanks for looping me in.

On Thu, Aug 08, 2019 at 03:08:22PM +0200, Laszlo Ersek wrote:
> (+ Andrew, Leif, Mike; Liming)
> 
> On 08/07/19 06:25, Bob Feng wrote:
> (3) In my normal edk2 clone, I cleaned the tree, applied your patches
> (again on top of commit 96603b4f02b9), and started a build:
> 
> $ . edksetup.sh
> $ nice make -C "$EDK_TOOLS_PATH" -j $(getconf _NPROCESSORS_ONLN)
> $ nice -n 19 build \
>     -a IA32 \
>     -p OvmfPkg/OvmfPkgIa32.dsc \
>     -t GCC48 \
>     -b NOOPT \
>     -n 4 \
>     -D SMM_REQUIRE \
>     -D SECURE_BOOT_ENABLE \
>     -D NETWORK_TLS_ENABLE \
>     -D NETWORK_IP6_ENABLE \
>     -D NETWORK_HTTP_BOOT_ENABLE \
>     --report-file=.../build.ovmf.32.report \
>     --log=.../build.ovmf.32.log \
>     --cmd-len=65536 \
>     --hash \
>     --genfds-multi-thread
> 
> This command located Python3:
> 
> > WORKSPACE        = .../edk2
> > EDK_TOOLS_PATH   = .../edk2/BaseTools
> > CONF_PATH        = .../edk2/Conf
> > PYTHON_COMMAND   = /usr/bin/python3.6
> >
> >
> > Processing meta-data .
> > Architecture(s)  = IA32
> > Build target     = NOOPT
> > Toolchain        = GCC48
> 
> The build launched fine.
> 
> After 10-20 seconds into the build, I interrupted it with Ctrl-C:
> 
> > build.py...
> >  : error 7000: Failed to execute command
> >         make tbuild 
> > [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/ShellPkg/Library/UefiShellDebug1CommandsLib/UefiShellDebug1CommandsLib]
> >
> >
> > build.py...
> >  : error 7000: Failed to execute command
> >         make tbuild 
> > [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/ShellPkg/Library/UefiShellDriver1CommandsLib/UefiShellDriver1CommandsLib]
> >
> >
> > build.py...
> >  : error 7000: Failed to execute command
> >         make tbuild 
> > [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/CryptoPkg/Library/OpensslLib/OpensslLib]
> >
> >
> > build.py...
> >  : error 7000: Failed to execute command
> >         make tbuild 
> > [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/MdePkg/Library/BaseLib/BaseLib]
> >
> > - Aborted -
> > Build end time: 14:05:56, Aug.08 2019
> > Build total time: 00:00:15
> 
> As next step, I repeated the same "build" command as above, in order to
> continue the interrupted build. Unfortunately, this failed:
> 
> > WORKSPACE        = .../edk2
> > EDK_TOOLS_PATH   = .../edk2/BaseTools
> > CONF_PATH        = .../edk2/Conf
> > PYTHON_COMMAND   = /usr/bin/python3.6
> >
> >
> > Processing meta-data
> > .Architecture(s)  = IA32
> > Build target     = NOOPT
> > Toolchain        = GCC48
> >
> > Active Platform          = .../edk2/OvmfPkg/OvmfPkgIa32.dsc
> > ..... done!
> >
> > Fd File Name:OVMF (.../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/OVMF.fd)
> >
> > Generate Region at Offset 0x0
> >    Region Size = 0x40000
> >    Region Name = DATA
> >
> > Generate Region at Offset 0x40000
> >    Region Size = 0x1000
> >    Region Name = None
> >
> > Generate Region at Offset 0x41000
> >    Region Size = 0x1000
> >    Region Name = DATA
> >
> > Generate Region at Offset 0x42000
> >    Region Size = 0x42000
> >    Region Name = None
> >
> > Generate Region at Offset 0x84000
> >    Region Size = 0x348000
> >    Region Name = FV
> >
> > Generating FVMAIN_COMPACT FV
> >
> > Generating PEIFV FV
> > ###### ['GenFv', '-a', 
> > '.../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/Ffs/PEIFV.inf', '-o', 
> > '.../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/PEIFV.Fv', '-i', 
> > '.../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/PEIFV.inf']
> > Return Value = 2
> > GenFv: ERROR 0001: Error opening file
> >   
> > .../edk2/Build/OvmfIa32/NOOPT_GCC48/FV/Ffs/52C05B14-0B98-496c-BC3B-04B50211D680PeiCore/52C05B14-0B98-496c-BC3B-04B50211D680.ffs
> >
> >
> >
> >
> > build.py...
> >  : error 7000: Failed to generate FV
> >
> >
> >
> > build.py...
> >  : error 7000: Failed to execute command
> >
> >
> > - Failed -
> > Build end time: 14:06:25, Aug.08 2019
> > Build total time: 00:00:06
> 
> To be honest, I'm not sure what to ask for, at this point.
> 
> - On one hand, this is certainly not ideal. Continuing a manually
> interrupted build should preferably work -- that's a form of incremental
> build. And, it did work in my v3 testing; see bullet (5) in:
> 
>   4ea3d3fa-2210-3642-2337-db525312d312@redhat.com">http://mid.mail-archive.com/4ea3d3fa-2210-3642-2337-db525312d312@redhat.com
>   https://edk2.groups.io/g/devel/message/44246
> 
> (Is this perhaps a regression from the V6 update, which was related to
> incremental builds?)
> 
> - On the other hand, this is not necessarily show-stopper, and I'm quite
> out of capacity for testing further versions of this full patch set.
> Perhaps you can work on this issue incrementally -- bugfixes can be
> accepted during the freeze periods.

I think there are two (independent) circumstances where I would be
happy for the support to be included even given this bug:
1) The parallel autogen is only invoked (at this point in time) when
   requested by an explicit command line parameter.
or
2) The failure is detected and its cause clearly printed for the user.

>From my reading of the above, neither is true.

At which point, I think we would either make one of those true, or
root cause and fix the actual error, in order to be able to accept
this into the tree. Regardless of which side of the stable tag.

I *really* don't want for us to knowingly end up with a build system
that "sometimes breaks sporadically and you need to git clean the
repository and try again".

> I don't feel comfortable giving Tested-by or Regression-tested-by in
> this state, but I also won't block the patch set from being merged.
> 
> Note that this problem appears repeatable, and it reproduces using
> Python2 as well. It should be possible for you to reproduce and to
> debug.

It being reproducible by Python 2 is actually really positive, since
it suggests Python 3 async i/o is not involved.

> (4) In this test, I repeated (3), but instead of interrupting the build
> with Ctrl-C, I introduced a syntax error to one of the C source files
> under OvmfPkg (I simply appended the constant "1" to the end of the
> file).
> 
> As expected, the build failed (and correctly stopped, too):
> 
> > .../edk2/OvmfPkg/VirtioNetDxe/SnpReceive.c:186:1: error: expected 
> > identifier or '(' before numeric constant
> >  1
> >  ^
> > make: *** 
> > [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/OvmfPkg/VirtioNetDxe/VirtioNet/OUTPUT/SnpReceive.obj]
> >  Error 1
> >
> >
> > build.py...
> >  : error 7000: Failed to execute command
> >         make tbuild 
> > [.../edk2/Build/OvmfIa32/NOOPT_GCC48/IA32/OvmfPkg/VirtioNetDxe/VirtioNet]
> >
> >
> > build.py...
> >  : error F002: Failed to build module
> >         .../edk2/OvmfPkg/VirtioNetDxe/VirtioNet.inf [IA32, GCC48, NOOPT]
> >
> > - Failed -
> > Build end time: 14:29:18, Aug.08 2019
> > Build total time: 00:00:38
> 
> I undid the syntax error, and repeated the "build" command.
> 
> The build resumed fine, and produced a functional OVMF binary. Good.

Not unexpected, but good to have verified.

> (5) I also verified that changes to C files, made after the build
> completed successfully for the first time, would cause those files to be
> re-built, if the "build" command was repeated. So that's OK too.
> 
> ... All in all, I think the series is mature enough to merge, in order
> to expose it to wider testing by the community, with the soft feature
> freeze just around the corner. The main functionality seems to work,
> there don't seem to be show-stoppers. IMO a BaseTools series doesn't
> have to be *perfect* -- as long as it doesn't get in the way of people
> doing their work, it should be possible to improve upon, incrementally.
> Therefore, from my side, I'm willing to give you a (somewhat reserved)
> 
> Acked-by: Laszlo Ersek <ler...@redhat.com>
> 
> for the series.
> 
> I suggest seeking feedback from the other stewards as well.
> 
> To reiterate, the only issue I have found is that the build could not be
> resumed after I interrupted it with Ctrl-C, in section (3). If there is
> consensus to push the v8 series with that, I would suggest filing a
> TianoCore BZ about issue (3) first, and to reference the BZ as a "known
> issue" in the commit message of patch#4 or patch#5.

I will throw in a transitional
Nacked-by: Leif Lindholm <leif.lindh...@linaro.org>
for now.

If it can happen from a Ctrl-C, it can happen from an OOM-event, a
lost network connection, and a bunch of other things. And we could
live with a corrupted state causing breakage on next build attempt -
but not an opaque breakage. At a minimum, it needs to be clear what
has caused the breakage.

Best Regards,

Leif

-=-=-=-=-=-=-=-=-=-=-=-
Groups.io Links: You receive all messages sent to this group.

View/Reply Online (#45186): https://edk2.groups.io/g/devel/message/45186
Mute This Topic: https://groups.io/mt/32779325/21656
Group Owner: devel+ow...@edk2.groups.io
Unsubscribe: https://edk2.groups.io/g/devel/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to