Re: masscheck process timing

John Hardin Mon, 01 Aug 2016 09:06:53 -0700

On Mon, 1 Aug 2016, Kevin Golding wrote:

On Sun, 31 Jul 2016 22:00:11 +0100, John Hardin <[email protected]> wrote:
This can use some refinement:
Some good thoughts, but ones that I fear may prove an obstacle to getting achange in place. Perhaps things for a wishlist instead?


Maybe.

If we've started scoring and another result set for that pass comes in, dowe incorporate that into the score generation? We probably should; thedecision could be based on when the delayed results come in (we don't wantto keep resetting the scoring process and collide with the following pass)and how large the new results are (we might want to ignore a late smallresult set, but incorporate a late large result set).
As it stands I'm inclined to take the route that anything submitted after therun has started gets lost - this is no different to the current situation (asI understand it anyway) so it's not penalising anyone, but it also doesn'tgrant further concessions. Adding in new results just seems a way topotentially further delay an already delayed process.


I'm hoping to balance delay and quality of results.

Much as the additional data is beneficial it seems added complexity for nogain. Given how tight the ham threshold is most days (there are a lot of daysin the 140k-150k region) a large result set is unlikely to arrive after thethreshold has been met anyway, it's far more likely to be the trigger. If westart dividing large and small we need to pick a point and draw a line andpotentially discourage submissions from people who feel they aren't importantenough.
I'd also note that when you look at the uploads you have people like axb whosubmit multiple times in small groups - that is always an option to people ifthey feel something is important enough to beat the threshold.

My fear is that we start the scoring when we receive a small (20k hamcorpus) that just barely meets the threshold and then ignore a large (100kham corpus) that is received shortly thereafter and that would greatlyimprove the results.

Perhaps: if we receive a delayed corpus that crosses the threshold, wedon't *immediately* start scoring, instead we start in half an hour - thisgives a chance for another corpus to come in. This would continue up tosome maximum (1h?).


Or perhaps I'm overthinking it. :)

If we're still running a score generation for pass X and pass X+1 hasreached its cutoff and has enough corpora to satisfy the thresholds andimmediately start the scoring process, do we give up on processing pass X?I would think yes.
I don't know how long the process takes, but if we never start a pass by thetime the next day's start point comes I would assume it would never overlap.

I dislike the idea of trying to calculate a hard start cutoff based on howlong the scoring run takes. Do we really want to maintain statistics onthat?

I could be wrong, but it seems likely that a hard cut off that shouldn'toverlap the next day's start may be simpler. At some point we need to give uphope on a day's results anyway, so that may be the guideline for when thattime is.

OK, so the hard starting cutoff could be the time the following pass doesits SVN get. If the scoring is underway at that point, we let it run tocompletion? I am makign an assumption here, that the time the scoring andrule generation takes is less than the get -> minimum scoring start delay,so that the scoring+rulegen passes won't overlap.



--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 [email protected]    FALaholic #11174     pgpk -a [email protected]
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  It is not the place of government to make right every tragedy and
  woe that befalls every resident of the nation.
-----------------------------------------------------------------------
 4 days until the 281st anniversary of John Peter Zenger's acquittal

Re: masscheck process timing

Reply via email to