On 7/31/2016 5:00 PM, John Hardin wrote:
Folks:
It looks like we didn't get another successful weekly masscheck again,
even though if you check the counts today they are above the thresholds.
I suspect this is happening due to some results being submitted "late".
I think we might want to look into making a change to the masscheck
timing rules, specifically: the cutoff for having enough corpora to
run the scoring and produce a rules update is not a specific time, but
is instead related to the following masscheck run.
In other words:
There is still a cutoff time for the masscheck run, but it only means
"the scoring won't start prior to this time."
If the corpora are above the thresholds when this time is reached, the
scoring and update process commences immediately.
If not, that doesn't mean we've missed an update, at least not yet.
If another result set comes in for that pass, and that result set
pushes it over the thresholds, then we can start the scoring and rule
generation process.
The actual hard cutoff for pass X would be sometime after pass X+1
starts. Perhaps if the cutoff time for pass X+1 is reached and pass X
is still waiting, then we give up on pass X.
This way, a late result set that satisfies the threshholds will just
delay the rule generation, not prevent it.
This can use some refinement:
If we've started scoring and another result set for that pass comes
in, do we incorporate that into the score generation? We probably
should; the decision could be based on when the delayed results come
in (we don't want to keep resetting the scoring process and collide
with the following pass) and how large the new results are (we might
want to ignore a late small result set, but incorporate a late large
result set).
If we do that, does the scoring process need to restart from the
beginning? Or can we just do something like add N more passes onto the
genetic scorer?
If we're still running a score generation for pass X and pass X+1 has
reached its cutoff and has enough corpora to satisfy the thresholds
and immediately start the scoring process, do we give up on processing
pass X? I would think yes.
If we're still scoring pass X and pass X+1 was delayed but now has
enough corpora and wants to start its scoring pass, do we give up on
processing pass X? Probably yes, but this might still result in a long
series of missed rules if the timing is just wrong.
Granted this does introduce some inter-pass coordination that's not
currently there - pass X will need to know whether pass X+1 has
started processing, or pass X+1 will need to have a way to tell pass X
to stop processing because it wants to start.
Comments solicited.
I think looking at why we don't do dailies after a weekly fails is a
good start. It might be an optimization of some sort or just to deal
with people who only submit once a week, but having one day that screws
up everything is clearly causing problems, regardless of what's breaking
that one day.