Noticed this thread. The example given is using a rather old version. The same is also happening on all my AQUA machines in v6.12.12. Error information:
<core_client_version>6.12.12</core_client_version> <![CDATA[ <stderr_txt> ERROR! Cannot open input file: instance.txt_21jan11_hm_16_038_200_00__1_238. Exiting 23:34:45 (572): called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>21jan11_hm_16_038_200_000_1_238_1_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Regards/Ed On Tue, Feb 1, 2011 at 5:03 PM, Kamran Karimi <[email protected]> wrote: > I don't think we are using the latest version, but I can do an upgrade. > The problem seems to occur in rapid successions, and then disappears for > a while. Here is a computer with a few such cases. You can see how the > WU name differs from the input file name: > http://aqua.dwavesys.com/results.php?hostid=60 > > -Kamran > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of David Anderson > Sent: Tuesday, February 01, 2011 2:30 PM > To: [email protected] > Subject: Re: [boinc_dev] Error when submitting work to the DB (corrupted > command line arguments) > > Hmmm. I can't reproduce this, using the same WU template file. > Are you using the newest server software (i.e., trunk)? > > -- David > > On 01-Feb-2011 2:06 PM, Kamran Karimi wrote: > > Hi all, > > > > We have been encountering strange errors recently: A rather large > number of our tasks fail because the command line arguments are > incorrect. We traced one specific failed task and were lead to the > bin/create_work program. Here is the trace: > > > > 1) These commands are used to submit work to BOINC's DB: > > > > ln -s `pwd`/ibadfp_p1/instance.txt `bin/dir_hier_path > instance.txt_31jan11_am_16_030_200_000_1_235` > > ln -s `pwd`/ibadfp_p1/ipfile.bin `bin/dir_hier_path > instance.txt_31jan11_am_16_030_200_000_1_235.ip` > > ln -s `pwd`/ibadfp_p1/qubit.param `bin/dir_hier_path > instance.txt_31jan11_am_16_030_200_000_1_235.param` > > bin/create_work -appname fokker_planck -wu_name > 31jan11_am_16_030_200_000_1_235 \ > > -wu_template ibadfp_p1/31jan11_am_16_030_200_000_1_235_wu \ > > -result_template ibadfp_p1/31jan11_am_16_030_200_000_1_235_result > instance.txt_31jan11_am_16_030_200_000_1_235 > instance.txt_31jan11_am_16_030_200_000_1_235.ip > instance.txt_31jan11_am_16_030_200_000_1_235.param > > > > > > 2) These are the contents of the file > ibadfp_p1/31jan11_am_16_030_200_000_1_235_wu: > > > > <file_info> > > <number>0</number> > > </file_info> > > <file_info> > > <number>1</number> > > </file_info> > > <file_info> > > <number>2</number> > > </file_info> > > <workunit> > > <file_ref> > > <file_number>0</file_number> > > > <open_name>instance.txt_31jan11_am_16_030_200_000_1_235</open_name> > > <copy_file/> > > </file_ref> > > <file_ref> > > <file_number>1</file_number> > > > <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name> > > <open_name>ipfile.bin</open_name> > > <copy_file/> > > </file_ref> > > <file_ref> > > <file_number>2</file_number> > > > <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name > > > > <open_name>qubit.param</open_name> > > <copy_file/> > > </file_ref> > > <command_line> --T 0.03 --t_f 0.0002 --n_particles_per 2 > --h_ramp_mid 0.000 --h_ramp_width 0.0025 --time_step_factor 48.0 > --gamma_frac 1.0 --t_fraci 0.38 --t_fracf 0.88 --input_file > instance.txt_31jan11_am_16_030_200_000_1_235</command_line> > > <target_nresults>1</target_nresults> > > <max_success_results>1</max_success_results> > > <min_quorum>1</min_quorum> > > <rsc_fpops_est>5e13</rsc_fpops_est> > > <rsc_memory_bound>5e7</rsc_memory_bound> > > <rsc_fpops_bound>1e20</rsc_fpops_bound> > > <rsc_disk_bound>1e8</rsc_disk_bound> > > <delay_bound>1728000</delay_bound> > > </workunit> > > > > > > Please note the command line argument "--input_file" > > > > > > 3) This is what we see in the database: > > > > <workunit> > > <file_ref> > > > <file_name>instance.txt_31jan11_am_16_030_200_000_1_235</file_name> > > > <open_name>instance.txt_31jan11_am_16_030_200_000_1_235</open_name> > > <copy_file/> > > </file_ref> > > <file_ref> > > > <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name> > > > <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name> > > <open_name>ipfile.bin</open_name> > > <copy_file/> > > </file_ref> > > <file_ref> > > > <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name > > > > > <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name > > > > <open_name>qubit.param</open_name> > > <copy_file/> > > </file_ref> > > <command_line> > > --T 0.03 --t_f 0.0002 --n_particles_per 2 --h_ramp_mid 0.000 > --h_ramp_width 0.0025 --time_step_factor 48.0 --gamma_frac 1.0 --t_fraci > 0.38 --t_fracf 0.88 --input_file > instance.txt_31jan11_am_16_030_200_00_11_235 > > </command_line> > > </workunit> > > > > > > Please note how the last few characters corresponding to the > "--input-file" parameter have changed from "200_000_1_235" to > "200_00_11_235" > > > > The result is a failure on a volunteer computer, with the app > complaining that it can't open the input file. > > > > How could this happen? Any help in resolving this issue is > appreciated. > > > > -Kamran > > _______________________________________________ > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
