I've still havn't cracked this riddle. Here's some additional debugging information:
distccd[22108] (dcc_scan_args) found input file "/proj/Platform/SGF/Source/Common/BankBill.cpp" distccd[22108] (dcc_scan_args) found object/output file "objs/Release/BankBill.o" distccd[22108] compile from BankBill.cpp to BankBill.o distccd[22108] (dcc_run_job) temp input file (null) distccd[22108] (dcc_run_job) original input file /proj/Platform/SGF/Source/Common/BankBill.cpp distccd[22108] (dcc_input_tmpnam) input file /proj/Platform/SGF/Source/Common/BankBill.cpp distccd[22108] (dcc_run_job) temp input file /tmp/distccd_42d13927.ii distccd[22108] (dcc_r_token_int) got DOTI000c8925 distccd[22108] (dcc_r_file) received 821541 bytes to file /tmp/distccd_42d13927.ii distccd[22108] (dcc_r_file_timed) 821541 bytes received in 0.002498s, rate 321171kB/s distccd[22108] (dcc_set_input) changed input from "/proj/Platform/SGF/Source/Common/BankBill.cpp" to "/tmp/distccd_42d13927.ii" distccd[22108] (dcc_set_input) command after: cc -fexceptions -frtti -fPIC -fno-defer-pop -fno-strict-aliasing -Wall -Wno-unknown-pragmas -Winvalid-pch -Werror -O3 -g -fvisibility=hidden -c /tmp/distccd_42d13927.ii -o objs/Release/BankBill.o distccd[22108] (dcc_set_output) changed output from "objs/Release/BankBill.o" to "/tmp/distccd_4dc23927.o" distccd[22108] (dcc_set_output) command after: cc -fexceptions -frtti -fPIC -fno-defer-pop -fno-strict-aliasing -Wall -Wno-unknown-pragmas -Winvalid-pch -Werror -O3 -g -fvisibility=hidden -c /tmp/distccd_42d13927.ii -o /tmp/distccd_4dc23927.o distccd[22108] (dcc_run_job) 2. temp input file /tmp/distccd_42d13927.ii distccd[22108] (dcc_check_compiler_masq) /usr/bin/cc is not a symlink distccd[22108] (dcc_spawn_child) forking to execute: cc -fexceptions -frtti -fPIC -fno-defer-pop -fno-strict-aliasing -Wall -Wno-unknown-pragmas -Winvalid-pch -Werror -O3 -g -fvisibility=hidden -c /tmp/distccd_42d13927.ii -o /tmp/distccd_4dc23927.o It seems as if I change the temp_o or temp_i, then it can't find the files. In my case, I don't need to protect against having unique files. Is it possible to keep the full paths to the files the same, just on the other server? I would imagine I would have to set TMPDIR to /, and just update all the temp_o and temp_i references and keeo them to the orig_output / orig_input. Would there be any more to this, or am I completely missing something? -Jeff On Tue, Jun 29, 2010 at 12:54 PM, Jeff Kilpatrick <[email protected] > wrote: > You are correct; I did miss some spots that I've copied into this email. > I've done a few more changes locally, and am continuing my testing. > > The code paste wasn't intended to be a patch. I'm a non-programmer > (integration engineering only), so I wouldn't think any of my changes would > be up to par for an official submit. > > -Jeff > > > On Tue, Jun 29, 2010 at 12:50 PM, Fergus Henderson <[email protected]>wrote: > >> >> On Tue, Jun 29, 2010 at 11:15 AM, Jeff Kilpatrick < >> [email protected]> wrote: >> >>> Hey Fergus. >>> >>> You are correct about the "another problem which may happen". I applied >>> the fix you suggested, and set the temp_o and temp_i back to orig_output and >>> orig_input through the dcc_set_output() calls, and I am now getting >>> consistent checksums. I will be doing builds all through the afternoon to >>> confirm checksums match every single time. >>> >>> Thank you all so very much. You have literally saved us thousands of >>> hours in compile time, per week. >>> >>> -Jeff >>> >>> My changes: >>> >>> serve.c: >>> >>> if (cpp_where == DCC_CPP_ON_SERVER) { >>> if (dcc_r_many_files(in_fd, temp_dir, compr) >>> // || dcc_set_output(argv, temp_o) >>> || dcc_set_output(argv, orig_output) >>> || tweak_arguments_for_server(argv, temp_dir, deps_fname, >>> &dotd_target, &tweaked_argv)) >>> goto out_cleanup; >>> >>> if ((ret = dcc_r_token_file(in_fd, "DOTI", temp_i, compr)) >>> || (ret = dcc_set_input(argv, orig_input)) >>> || (ret = dcc_set_output(argv, orig_output))) >>> >>> // || (ret = dcc_set_input(argv, temp_i)) >>> // || (ret = dcc_set_output(argv, temp_o))) >>> goto out_cleanup; >> >> >> When posting patches to the mailing list, please use "svn diff" or "diff >> -u". >> If that's all you've changed, I don't think your patch is correct. >> You'd need to also update the code which sends the object file back to the >> client: >> >> if ((ret = dcc_x_file(out_fd, temp_o, "DOTO", compr, NULL))) >> goto out_cleanup; >> >> Also, I think your change may cause problems in non-pump mode if two >> different clients attempt to compile the same object file at the same time. >> >> Cheers, >> Fergus. >> >> >>> >>> On Tue, Jun 29, 2010 at 11:59 AM, Fergus Henderson <[email protected]>wrote: >>> >>>> On Tue, Jun 29, 2010 at 9:52 AM, Jeff Kilpatrick < >>>> [email protected]> wrote: >>>> >>>>> Yes, I have tried both pump and regular mode, and both behave the same >>>>> way. >>>>> >>>> >>>> Well, I don't think it is exactly the same way. In the non-pump case, >>>> distcc does the preprocessing locally, sends the ".ii" file to the server, >>>> and the server then invokes gcc with the name of the ".ii" file, e.g. >>>> /tmp/distccd_ac31c96a.ii... that is what gcc ends up embedding in the >>>> object >>>> file. >>>> In the pump case, the source file names used on the server are the same >>>> as the source file names used on the client, so the problem in your >>>> original >>>> email won't happen in that case. >>>> >>>> But there is another problem which may happen in both cases: >>>> distcc changes the command line on the server to use a different object >>>> file name, e.g. "-o ./tmp/distccd_ac31c96a.o", >>>> and gcc may embed the name of the object file in the object file. >>>> In the non-pump case, this changing of the object file name is needed to >>>> ensure that two different distcc invocations on the same server don't try >>>> to >>>> write to the same file. >>>> But in the pump case, where the compilation is being invoked in a >>>> temporary directory, I don't think it is actually necessary to change the >>>> object file name... >>>> I think the code to do that has just been inherited for historical >>>> reasons from the non-pump case. >>>> So it may be possible to modify distcc to avoid doing that in the pump >>>> case. >>>> The code which changes the object file name is in the dcc_run_job() >>>> function in src/serve.c (look in particular for the calls to >>>> dcc_set_output(), but other parts of the function would need modification >>>> too). >>>> But I guess if you're not going to be using pump mode, that wouldn't >>>> help you. >>>> >>>> You may find that the object files are more deterministic if you don't >>>> pass the "-g" flag to the compiler. >>>> >>>> Cheers, >>>> Fergus. >>>> >>>> >>>>> A lot of the projects that I will be compiling include boost, and I >>>>> believe that the pump fails on those, and falls back to regular mode. >>>>> >>>>> -Jeff >>>>> >>>>> >>>>> On Tue, Jun 29, 2010 at 10:48 AM, Fergus Henderson >>>>> <[email protected]>wrote: >>>>> >>>>>> Did you try using pump mode? >>>>>> That should give you a better build speed-up and may also avoid this >>>>>> issue. >>>>>> >>>>>> On Jun 29, 2010 6:32 AM, "Jeff Kilpatrick" <[email protected]> >>>>>> wrote: >>>>>> > Oops, my original response went directly to Ihar, rather than to the >>>>>> list. >>>>>> > >>>>>> > ---- >>>>>> > >>>>>> > >>>>>> > >>>>>> > Thank you for your response. >>>>>> > >>>>>> > We do have a tool internally that could 'scrub' the object file of >>>>>> its >>>>>> > dynamic symbols, and could be adapted for this purpose. However, I'm >>>>>> > hesitant to modify anything with the .o and .so with an external >>>>>> tool, as in >>>>>> > some cases, it may be hiding a legitimate issue. Once an exception >>>>>> makes it >>>>>> > into the code, its tempting to continue adding exceptions to fix >>>>>> issues. >>>>>> > Before you know it, you have 600 branches with unique 'fixes' to >>>>>> them :) >>>>>> > >>>>>> > Once we get a consistent checksum on the .o and .so files, they'll >>>>>> be >>>>>> > packaged into a .iso, which will also need to be repeatable. This >>>>>> can be >>>>>> > challenging as well, since attributes on the files can affect the >>>>>> final >>>>>> > checksum. >>>>>> > >>>>>> > -Jeff >>>>>> > >>>>>> > >>>>>> > On Tue, Jun 29, 2010 at 6:58 AM, Ihar `Philips` Filipau < >>>>>> > [email protected]> wrote: >>>>>> > >>>>>> >> Hi Jeff! >>>>>> >> >>>>>> >> You can try to collect the check-sum only for the ELF segments >>>>>> which are >>>>>> >> actually derived from the the source code, omitting the segments >>>>>> with the >>>>>> >> extra compiler's info. I do not know any ready tool for the >>>>>> purpose, but >>>>>> >> coding something like this - print on stdout all segments except >>>>>> the >>>>>> >> black-listed - shouldn't be too complicated. >>>>>> >> >>>>>> >> >>>>>> >> On Tue, Jun 29, 2010 at 11:41 AM, Jeff Kilpatrick < >>>>>> >> [email protected]> wrote: >>>>>> >> >>>>>> >>> Thank you for your response. >>>>>> >>> >>>>>> >>> Yes, this is the only difference in the object file. We've taken >>>>>> great >>>>>> >>> pains over the last few years, removing anything that would cause >>>>>> checksums >>>>>> >>> to mismatch. >>>>>> >>> >>>>>> >>> I will do some research myself, and talk to a few developers to >>>>>> see if >>>>>> >>> they can help me. >>>>>> >>> >>>>>> >>> Thanks >>>>>> >>> -Jeff >>>>>> >>> >>>>>> >>> >>>>>> >>> On Tue, Jun 29, 2010 at 1:32 AM, Martin Pool <[email protected]> >>>>>> wrote: >>>>>> >>> >>>>>> >>>> On 29 June 2010 13:02, Jeff Kilpatrick < >>>>>> [email protected]> >>>>>> >>>> wrote: >>>>>> >>>> > Hello, >>>>>> >>>> > >>>>>> >>>> > At my work, we've just begun to investigate how much of an >>>>>> impact that >>>>>> >>>> > distcc will have on our builds. >>>>>> >>>> > >>>>>> >>>> > We typically perform 200 builds a week, ranging from a thousand >>>>>> lines >>>>>> >>>> of >>>>>> >>>> > code, up to 600,000 lines of code each. Our back end build >>>>>> scripts are >>>>>> >>>> based >>>>>> >>>> > on python, and use Linux make to build. We are running VMWare >>>>>> images on >>>>>> >>>> a >>>>>> >>>> > blade cluster, and each of our three new build servers have >>>>>> 20Ghz >>>>>> >>>> processing >>>>>> >>>> > power, with 4G of RAM. Our primary build environments are loop >>>>>> back >>>>>> >>>> ISOs, >>>>>> >>>> > from a central CIFS server, and are unioned together with >>>>>> unionfs. Our >>>>>> >>>> > source code is then copied into this environment, and we >>>>>> proceed with >>>>>> >>>> our >>>>>> >>>> > build, using chroot to enter our build environment. Our >>>>>> 'distcc' >>>>>> >>>> machines >>>>>> >>>> > use the same loop back system, with only our OS and distcc >>>>>> being >>>>>> >>>> accessible. >>>>>> >>>> >>>>>> >>>> That's pretty cool. >>>>>> >>>> >>>>>> >>>> > One of the most important things for our builds, due to the >>>>>> market that >>>>>> >>>> we >>>>>> >>>> > are in, is that our builds must be reproducible, with >>>>>> repeatable >>>>>> >>>> md5sums on >>>>>> >>>> > our shared objects, based on the same label and same >>>>>> dependencies. In >>>>>> >>>> our >>>>>> >>>> > recent tests, we were able to take a particular build from 24 >>>>>> minutes >>>>>> >>>> to 14 >>>>>> >>>> > minutes, then finally 5 minutes, using distcc and adjusting our >>>>>> VMs. >>>>>> >>>> > However, when performing an md5sum on our final shared objects >>>>>> / object >>>>>> >>>> > files, the checksums change every build. We dropped down to >>>>>> just using >>>>>> >>>> g++ >>>>>> >>>> > to perform our linking, all locally, but our object files are >>>>>> still >>>>>> >>>> > mismatching. >>>>>> >>>> > >>>>>> >>>> > In the object files' `objdump -s` output, it appears that an >>>>>> entry is >>>>>> >>>> being >>>>>> >>>> > made into all our object files with the following syntax >>>>>> >>>> "distccd_XXXXX", >>>>>> >>>> > with XXXXX being a seemingly random combination of characters. >>>>>> >>>> >>>>>> >>>> Hi Jeff, >>>>>> >>>> >>>>>> >>>> I think this is coming from gcc recording the input file name in >>>>>> the >>>>>> >>>> object file. distccd_xxxx.ii is the temporary file name used on >>>>>> the >>>>>> >>>> server. >>>>>> >>>> >>>>>> >>>> > In the same object file, compiled locally without distcc, we >>>>>> get a >>>>>> >>>> rather >>>>>> >>>> > generic <built-in> placeholder. >>>>>> >>>> >>>>>> >>>> I think this means it's coming from the builtin preprocessor. >>>>>> >>>> >>>>>> >>>> I probably won't have time to work on this myself but if you have >>>>>> a >>>>>> >>>> programmer interested in it there are two possible avenues: >>>>>> >>>> >>>>>> >>>> - make gcc read from a file called <built-in> in a temporary >>>>>> subdirectory >>>>>> >>>> >>>>>> >>>> - find some way to stop it recording the compiler input file name >>>>>> >>>> >>>>>> >>>> Is that the only difference in the object files? It's pretty >>>>>> common >>>>>> >>>> for compilers to also record something about the time the >>>>>> compilation >>>>>> >>>> was run or for source files to build this in, which would mean >>>>>> they >>>>>> >>>> change every time. >>>>>> >>>> >>>>>> >>>> > >>>>>> >>>> > I've reviewed the source code for distcc, and seen a few >>>>>> references to >>>>>> >>>> this >>>>>> >>>> > distccd_xxxxx. Unfortunately, I'm not a programmer, and thus am >>>>>> at a >>>>>> >>>> loss on >>>>>> >>>> > how to further troubleshoot this, or even if its possible to >>>>>> get >>>>>> >>>> consistent >>>>>> >>>> > checksums with distcc. >>>>>> >>>> > >>>>>> >>>> > >>>>>> >>>> > Versions >>>>>> >>>> > ======= >>>>>> >>>> > g++ (Gentoo 4.3.2-r4 p1.8, pie-10.1.5) 4.3.2 >>>>>> >>>> > >>>>>> >>>> > distcc 3.1 i686-pc-linux-gnu >>>>>> >>>> > (protocols 1, 2 and 3) (default port 3632) >>>>>> >>>> > built Mar 29 2010 10:55:35 >>>>>> >>>> > >>>>>> >>>> > Kernel: 2.6.9-89.ELsmp >>>>>> >>>> > >>>>>> >>>> > Command being issued: >>>>>> >>>> > DISTCC_VERBOSE=1 make -j24 CXX="distcc" >>>>>> >>>> > >>>>>> >>>> > Here's the partial output of objdump -s: >>>>>> >>>> > 04f0 00030000 5f6d6f76 655f636f 6e737472 ...._move_constr >>>>>> >>>> > 0500 7563745f 66776b2e 68000300 00474454 uct_fwk.h....GDT >>>>>> >>>> > 0510 79706573 2e68000a 00007365 72646566 ypes.h....serdef >>>>>> >>>> > 0520 732e6800 01000073 75666669 782e6870 s.h....suffix.hp >>>>>> >>>> > 0530 70000b00 00646973 74636364 5f616333 p....distccd_ac3 >>>>>> >>>> > 0540 31633936 612e6969 000c0000 61646c5f 1c96a.ii....adl_ >>>>>> >>>> > 0550 62617272 6965722e 68707000 0d000062 barrier.hpp....b >>>>>> >>>> > 0560 6f6f6c5f 6677642e 68707000 0e000069 ool_fwd.hpp....i >>>>>> >>>> > 0570 6e746567 72616c5f 635f7461 672e6870 ntegral_c_tag.hp >>>>>> >>>> > 0580 70000e00 00766f69 645f6677 642e6870 p....void_fwd.hp >>>>>> >>>> > >>>>>> >>>> > Thank you for reviewing my issue. >>>>>> >>>> > >>>>>> >>>> > -Jeff >>>>>> >>>> > >>>>>> >>>> > __ >>>>>> >>>> > distcc mailing list http://distcc.samba.org/ >>>>>> >>>> > To unsubscribe or change options: >>>>>> >>>> > https://lists.samba.org/mailman/listinfo/distcc >>>>>> >>>> > >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> -- >>>>>> >>>> Martin >>>>>> >>>> >>>>>> >>> >>>>>> >>> >>>>>> >>> __ >>>>>> >>> distcc mailing list http://distcc.samba.org/ >>>>>> >>> To unsubscribe or change options: >>>>>> >>> https://lists.samba.org/mailman/listinfo/distcc >>>>>> >>> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> Don't walk behind me, I may not lead. >>>>>> >> Don't walk in front of me, I may not follow. >>>>>> >> Just walk beside me and be my friend. >>>>>> >> -- Albert Camus (attributed to) >>>>>> >> >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Fergus Henderson <[email protected]> >>>> >>> >>> >> >> >> -- >> Fergus Henderson <[email protected]> >> > >
__ distcc mailing list http://distcc.samba.org/ To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/distcc
