On Tue, Jun 22, 2010 at 9:25 AM, Ted Dunning <[email protected]> wrote:

>
>
> On Mon, Jun 21, 2010 at 8:35 PM, Robin Anil <[email protected]> wrote:
>
>> A Classifier Training Job will take a Trainer, and a Vector location and
>
> produce a Model
>>
>
> No.  Well, not exclusively, anyway.  We can't be limited to reading vectors
> due to the fairly substantial (3x) performance hit that will entail.
>


Ahhh... last minute thought here.

The output here also needs to include a vectorizer state.  Many vectorizers
require information to be repeatable.  For instance, a dictionary based
vectorizer might develop a dictionary as it sees terms during training.
 Another example is AdaptiveWordValueEncoder which doesn't use a dictionary,
but does keep counts to help with weighting.  And finally, all of the hashed
representations should produce some kind of trace history so that they can
be reverse engineered.  Again, I would recommend a blob as the on-disk
format.

Reply via email to