-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John Hardin kirjoitti 1.8.2016 0:00:
> Folks:
> 
> It looks like we didn't get another successful weekly masscheck again,
> even though if you check the counts today they are above the
> thresholds.
> 
> I suspect this is happening due to some results being submitted "late".
> 
> I think we might want to look into making a change to the masscheck
> timing rules, specifically: the cutoff for having enough corpora to
> run the scoring and produce a rules update is not a specific time, but
> is instead related to the following masscheck run.
> 
> In other words:
> 
> There is still a cutoff time for the masscheck run, but it only means
> "the scoring won't start prior to this time."
> 
> If the corpora are above the thresholds when this time is reached, the
> scoring and update process commences immediately.
> 
> If not, that doesn't mean we've missed an update, at least not yet.
> 
> If another result set comes in for that pass, and that result set
> pushes it over the thresholds, then we can start the scoring and rule
> generation process.
> 
> The actual hard cutoff for pass X would be sometime after pass X+1
> starts. Perhaps if the cutoff time for pass X+1 is reached and pass X
> is still waiting, then we give up on pass X.
> 
> This way, a late result set that satisfies the threshholds will just
> delay the rule generation, not prevent it.
> 
> This can use some refinement:
> 
> If we've started scoring and another result set for that pass comes
> in, do we incorporate that into the score generation? We probably
> should; the decision could be based on when the delayed results come
> in (we don't want to keep resetting the scoring process and collide
> with the following pass) and how large the new results are (we might
> want to ignore a late small result set, but incorporate a late large
> result set).
> 
> If we do that, does the scoring process need to restart from the
> beginning? Or can we just do something like add N more passes onto the
> genetic scorer?
> 
> If we're still running a score generation for pass X and pass X+1 has
> reached its cutoff and has enough corpora to satisfy the thresholds
> and immediately start the scoring process, do we give up on processing
> pass X? I would think yes.
> 
> If we're still scoring pass X and pass X+1 was delayed but now has
> enough corpora and wants to start its scoring pass, do we give up on
> processing pass X? Probably yes, but this might still result in a long
> series of missed rules if the timing is just wrong.
> 
> Granted this does introduce some inter-pass coordination that's not
> currently there - pass X will need to know whether pass X+1 has
> started processing, or pass X+1 will need to have a way to tell pass X
> to stop processing because it wants to start.
> 
> 
> Comments solicited.

One nice thing to have would a definition of the time window for the
posted logs.

I want to do the check at the cheapest possible hours to run a heavy
computation, but that does not mean I can not honor the definition if
there is such.

I have adjusted my schedules and they should work. But if for reason I
fail, I need to know. This is a hobby project to me, and I do this that
seriously. I even do that in a cloud VM with 32 cores, if my local pc is
not available, so I do pay. I do not want that to be in vain..

- -- 
Jari Fredriksson
Bitwell Oy
+358 400 779 440
[email protected]
https://www.bitwell.biz - cost effective hosting and security for
ecommerce
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iEYEARECAAYFAlefVMAACgkQKL4IzOyjSrac8ACfU3f/w5alrAVD0i9Zm1RN8nyT
+zQAn28jxkdXPWe1EDwtnoJJ4IF7um8R
=GMKU
-----END PGP SIGNATURE-----

Reply via email to