SETI Beta is for testing new versions for the S@H applications. Things *will* go wrong. If you don't like testing software, then this is not the project for you.
Work units are removed from the active DB after some time. Any credit earned stays. This saves DB space. The daily quota is reduced on error, and increased (to some maximum) on success. -----Original Message----- From: boinc_dev [mailto:[email protected]] On Behalf Of Charles Elliott Sent: Tuesday, May 07, 2013 10:08 AM To: [email protected] Subject: [boinc_dev] Workunits no longer usable Hello: The S@H Beta site just released a new executable for plan class cuda50. For some reason the beta server does not recognize plan class cudaNN (NN = 50, 42, 32, 22, etc.); it will only deliver WUs to a computer whose app_info file contains plan class cuda_fermi, opencl_nvidia_100, etc. So just as I had done when cuda42 came out, I changed all the references in these plan classes from cuda42 to cuda50. I copied the Boinc data directory to a different hard disk, shut off the network, and restarted Boinc. This was at 20:55:31. Everything appeared to be working: Boinc deleted no WUs, was using the new cuda50 executable to process the WUs it had, and communication with the server proceeded normally, at least 10 times. Then at 22:34:13, almost 2 hours after the change, Boinc, the server, whatever, decided to delete about 5,000 WUs, saying "Result 01mr13ab.6367.7433.11.16.27_2 is no longer usable," etc. There is no error message, absolutely no hint as to what is wrong. Now a computer that processed 790 WUs yesterday has only 501 WUs total, some of which are Astropulse, so my statistics program says I have about 0.91 days' supply. The server says I have exceeded my quota of 35 WUs/day (35???) and won't give me any more, and sometime in the middle of the night it decided "Your app_info.xml file doesn't have a usable version of SETI@home v7." It is Tuesday, the server will be down all day, and it will be sometime late in the evening before I can download any more WUs. Several weeks ago a similar incident happened. I was carrying something heavy and bumped into the computer that was processing S@H WUs. Boinc must have been writing the client_state file at the time because it was clobbered and Boinc would not run. For some reason client_state_prev was unacceptable also. I make a copy of the client_state file every time a WU finishes. I do that because my statistics program was having trouble finding out which GPU the WU was processed on, and I needed to see what client_state file it was looking at. Thus I have a month's supply of client_state files for every computer. So to fix the clobbered client_state file, I took the one from the previously finished WU - it was literally seconds old - and used it to replace the clobbered client_state file. Boinc objected and flushed all the WUs. There are three facts of which you are obviously unaware: 1. It is intensely humiliating to experience the loss of thousands of workunits, especially after one has spent hours trying to avoid that exact situation, and after taking exactly the same actions that had avoided that loss previously. 2. You could not realize how much we users have invested in S@H. First, it costs about $70 a month to process S@H WUs on a computer with a modern CPU and two medium-sized GPUs. Second, it takes about one-half to an hour a day to check each computer to see that it is operating OK. Third, there is a huge emotional investment in competing for credits that have no extrinsic value, which is compounded by the fact that, for whatever reason, S@H is not analyzing the data we return to it; there have been no results published since about 2007, that I know of. So yeah, we fight tooth and nail for workunits and credits because there is no other measure of progress and success. S@H is not a no-holds-barred, all's fair war between users and the server; it is supposed to be a cooperative venture to find evidence of extraterrestrial life. 3. Up until about the 1920s the United States was a Christian country. While it may seem unfair, until the 20's if a person did not understand the Christian paradigm, he or she simply could not compete. That changed in the 30's, in part, I believe, because as a highly industrialized country and a world leader we had to change the societal emphasis from religion and ethical behavior to technological competence. Nevertheless, under the Christian paradigm when a person makes a mistake God no longer sends a plague of locusts, huge floods, or asks for a human sacrifice; under the new deal, He sends a message telling the person what is wrong and how to fix it. If you want to see this exact same point in secular language, then read Dale Carnegie's How to Win Friends and Influence People. In any case, theology is the study of God at work in the world. Successful people no longer destroy, physically or emotionally, those who err, at least the first few times; instead they show them how to fix the situation, perhaps change their behavior, and move on. Flushing all or part of a user's workunits without any clue of the source of the problem is not evidence of cooperation to find signs of extraterrestrial life; it is wanton cruelty equivalent to a plague of locusts or a flood. It does not solve the problem. When Boinc perceives the necessity to flush thousands of workunits, why can't it be made bright enough to understand that this is not a result that benefits anyone, not S@H, not Berkeley, and certainly not the user? Instead, why can't Boinc output a (long) message saying what its problem is, post the detested OK-Cancel dialog box, and just wait for input? I would much rather come down in the morning to a computer that has done nothing useful all night than to a computer with all but about 10% of its workunits wasted (for all to see). Yes, it is true that the novice user will not understand either a (long) message or the need to intervene. But it is also true that a novice user is unlikely to edit his or her app_info file. In any case, it is kinder to give even a novice user the opportunity to fix the problem than it is to punish them for an action that was almost certainly well intended. Charles Elliott _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
