Re: [Assp-user] Bayesian file name/number logic

Micheal Espinola Jr Tue, 13 Nov 2007 09:42:07 -0800

Fritz Borgstedt wrote:
> Files Distribution
>
> This defines how file names are chosen in each collection. If set to
> 1, names are uniformly distributed. If set between 0.01 and 0.99,
> names distribution is exponential -- files get lower numbers more
> frequently. This prevents from corpus being refreshed too quickly,
> especially when MaxFiles is set to low value (ex. 3000)




What does this mean exactly?  I don't recall seeing/hearing anything
previous to suggest that "lower numbers" are handled any differently
than "higher numbers" in the corpus.  This also goes against anything
previous documented (albeit by all by John Hanna).

So - file naming by number is no longer a random naming process - and
there is preferential treatment between lower and higher numbered
files?  What is the delimiting factor that differentiates low from
high?  I thought that the Files Distribution feature (as previously
discussed) was supposed to control how "randomly" files were saved -
not named.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Assp-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/assp-user

Re: [Assp-user] Bayesian file name/number logic

Reply via email to