Hello:

 

                The S@H Beta site just released a new executable for plan
class cuda50.  For some reason the 

beta server does not recognize plan class cudaNN  (NN = 50, 42, 32, 22,
etc.); it  will only deliver WUs to 

a computer whose app_info file contains plan class cuda_fermi,
opencl_nvidia_100, etc.  So just as I 

had done when cuda42 came out, I changed all the references in these plan
classes from cuda42 to cuda50.  

I copied the Boinc data directory to a different hard disk, shut off the
network, and restarted Boinc.  

This was at 20:55:31.  Everything appeared to be working: Boinc deleted no
WUs, was using the new 

cuda50 executable to process the WUs it had, and communication with the
server proceeded normally, 

at least 10 times.  Then at 22:34:13, almost 2 hours after the change,
Boinc, the server, whatever, decided 

to delete about 5,000 WUs, saying "Result 01mr13ab.6367.7433.11.16.27_2 is
no longer usable," etc.  

There is no error message, absolutely no hint as to what is wrong.   Now  a
computer that processed 790 

WUs yesterday has only 501 WUs total, some of which are Astropulse, so my
statistics program says I have 

about 0.91 days' supply.  The server says I have exceeded my quota of 35
WUs/day (35???) and won't give me 

any more, and sometime in the middle of the night it decided "Your
app_info.xml file doesn't have a usable 

version of SETI@home v7."  It is Tuesday, the server will be down all day,
and it will be sometime late in the 

evening before I can download any more WUs.

 

                Several weeks ago a similar incident happened.  I was
carrying something heavy and bumped into the 

computer that was processing S@H WUs.  Boinc must have been writing the
client_state file at the time because 

it was clobbered and Boinc would not run.  For some reason client_state_prev
was unacceptable also.  I make a 

copy of the client_state file every time a WU finishes.  I do that because
my statistics program was having trouble 

finding out which GPU the WU was processed on, and I needed to see what
client_state file it was looking at.  

Thus I have a month's supply of client_state files for every computer.  So
to fix the clobbered client_state file, 

I took the one from the previously finished WU - it was literally seconds
old - and used it to replace the clobbered 

client_state file.  Boinc objected and flushed all the WUs.  

 

                There are three facts of which you are obviously unaware:

 

1.       It is intensely humiliating to experience the loss of thousands of
workunits, especially after one has spent hours trying to avoid that exact
situation, and after taking exactly the same actions that had avoided that
loss previously.

2.       You could not realize how much we users have invested in S@H.
First, it costs about $70 a month to process S@H WUs on a computer with a
modern CPU and two medium-sized GPUs.  Second, it takes about one-half to an
hour a day to check each computer to see that it is operating OK.  Third,
there is a huge emotional investment in competing for credits that have no
extrinsic value, which is compounded by the fact that, for whatever reason,
S@H is not analyzing the data we return to it; there have been no results
published since about 2007, that I know of.  So yeah, we fight tooth and
nail for workunits and credits because there is no other measure of progress
and success.  S@H is not a no-holds-barred, all's fair war between users and
the server; it is supposed to be a cooperative venture to find evidence of
extraterrestrial life.

3.       Up until about the 1920s the United States was a Christian country.
While it may seem unfair, until the 20's if a person did not understand the
Christian paradigm, he or she simply could not compete.  That changed in the
30's, in part, I believe, because as a highly industrialized country and a
world leader we had to change the societal emphasis from religion and
ethical behavior to technological competence.  Nevertheless, under the
Christian paradigm when a person makes a mistake God no longer sends a
plague of locusts, huge floods, or asks for a human sacrifice; under the new
deal, He sends a message telling the person what is wrong and how to fix it.
If you want to see this exact same point in secular language, then read Dale
Carnegie's How to Win Friends and Influence People. In any case, theology is
the study of God at work in the world.  Successful people no longer destroy,
physically or emotionally, those who err, at least the first few times;
instead they show them how to fix the situation, perhaps change their
behavior, and move on.

 

Flushing all or part of a user's workunits without any clue of the source of
the problem is not evidence of cooperation to find signs of extraterrestrial
life; it is wanton cruelty equivalent to a plague of locusts or a flood.  It
does not solve the problem. 

 

When Boinc perceives the necessity to flush thousands of workunits, why
can't it be made bright enough to understand that this is not a result that
benefits anyone, not S@H, not Berkeley, and certainly not the user?
Instead, why can't Boinc output a (long) message saying what its problem is,
post the detested OK-Cancel dialog box, and just wait for input?  I would
much rather come down in the morning to a computer that has done nothing
useful all night than to a computer with all but about 10% of its workunits
wasted (for all to see).  Yes, it is true that the novice user will not
understand either a (long) message or the need to intervene.  But it is also
true that a novice user is unlikely to edit his or her app_info file.  In
any case, it is kinder to give even a novice user the opportunity to fix the
problem than it is to punish them for an action that was almost certainly
well intended.

 

 

Charles Elliott

 

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to