-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 John Hardin kirjoitti 1.8.2016 0:00: > Folks: > > It looks like we didn't get another successful weekly masscheck again, > even though if you check the counts today they are above the > thresholds. > > I suspect this is happening due to some results being submitted "late". > > I think we might want to look into making a change to the masscheck > timing rules, specifically: the cutoff for having enough corpora to > run the scoring and produce a rules update is not a specific time, but > is instead related to the following masscheck run. > > In other words: > > There is still a cutoff time for the masscheck run, but it only means > "the scoring won't start prior to this time." > > If the corpora are above the thresholds when this time is reached, the > scoring and update process commences immediately. > > If not, that doesn't mean we've missed an update, at least not yet. > > If another result set comes in for that pass, and that result set > pushes it over the thresholds, then we can start the scoring and rule > generation process. > > The actual hard cutoff for pass X would be sometime after pass X+1 > starts. Perhaps if the cutoff time for pass X+1 is reached and pass X > is still waiting, then we give up on pass X. > > This way, a late result set that satisfies the threshholds will just > delay the rule generation, not prevent it. > > This can use some refinement: > > If we've started scoring and another result set for that pass comes > in, do we incorporate that into the score generation? We probably > should; the decision could be based on when the delayed results come > in (we don't want to keep resetting the scoring process and collide > with the following pass) and how large the new results are (we might > want to ignore a late small result set, but incorporate a late large > result set). > > If we do that, does the scoring process need to restart from the > beginning? Or can we just do something like add N more passes onto the > genetic scorer? > > If we're still running a score generation for pass X and pass X+1 has > reached its cutoff and has enough corpora to satisfy the thresholds > and immediately start the scoring process, do we give up on processing > pass X? I would think yes. > > If we're still scoring pass X and pass X+1 was delayed but now has > enough corpora and wants to start its scoring pass, do we give up on > processing pass X? Probably yes, but this might still result in a long > series of missed rules if the timing is just wrong. > > Granted this does introduce some inter-pass coordination that's not > currently there - pass X will need to know whether pass X+1 has > started processing, or pass X+1 will need to have a way to tell pass X > to stop processing because it wants to start. > > > Comments solicited.
One nice thing to have would a definition of the time window for the posted logs. I want to do the check at the cheapest possible hours to run a heavy computation, but that does not mean I can not honor the definition if there is such. I have adjusted my schedules and they should work. But if for reason I fail, I need to know. This is a hobby project to me, and I do this that seriously. I even do that in a cloud VM with 32 cores, if my local pc is not available, so I do pay. I do not want that to be in vain.. - -- Jari Fredriksson Bitwell Oy +358 400 779 440 [email protected] https://www.bitwell.biz - cost effective hosting and security for ecommerce -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iEYEARECAAYFAlefVMAACgkQKL4IzOyjSrac8ACfU3f/w5alrAVD0i9Zm1RN8nyT +zQAn28jxkdXPWe1EDwtnoJJ4IF7um8R =GMKU -----END PGP SIGNATURE-----
