On Tue, Jun 22, 2010 at 9:25 AM, Ted Dunning <[email protected]> wrote:
> > > On Mon, Jun 21, 2010 at 8:35 PM, Robin Anil <[email protected]> wrote: > >> A Classifier Training Job will take a Trainer, and a Vector location and > > produce a Model >> > > No. Well, not exclusively, anyway. We can't be limited to reading vectors > due to the fairly substantial (3x) performance hit that will entail. > Ahhh... last minute thought here. The output here also needs to include a vectorizer state. Many vectorizers require information to be repeatable. For instance, a dictionary based vectorizer might develop a dictionary as it sees terms during training. Another example is AdaptiveWordValueEncoder which doesn't use a dictionary, but does keep counts to help with weighting. And finally, all of the hashed representations should produce some kind of trace history so that they can be reverse engineered. Again, I would recommend a blob as the on-disk format.
