2008/12/11 Mary Holstege <[email protected]>: > On Thu, 11 Dec 2008 09:13:43 -0800, Alf Eaton <[email protected]> wrote: > >> I've been trying to use the SVM classifier (MarkLogic 4.0-1) to >> classify a set of documents, but ran into a problem when trying to >> save a trained 'supports' classifier between runs. The problem seems >> to be that the saved classifier identifies documents in the training >> set using a temporary ID, which is no longer valid when the >> classification of the test set is performed. With the 'weights' >> classifier it works fine. >> >> Here's the error message: >> === >> Invalid classifier specification element: >> XDMP-BADDOCID: >> doc("saved/classifier")/cts:classifier/cts:supports/cts:class[1]/cts:doc[1] >> -- Invalid classifier specification element: document id >> 6196220549445471859 not found >> === >> >> I've attached a PHP script that contains the actual XQuery queries >> used, in case that's helpful. >> >> alf > > This is expected behaviour. > The documentation (for cts:train) says this, although it doesn't > perhaps stress the implications: > "The support vector representation of the classifier includes a supports > node that has <class/> children for each class. Here the class elements > contain a list of doc elements which identify the specific training nodes > using an internal key. This internal key is valid across queries only for > nodes in the database." > > What this means is that if your training and classification are happening in > different queries (which is generally the case, although it need not be), > then you have to put the training set in the database if you are using the > "supports" form of the classifier. If you are using the "weights" form of > the classifier you won't have this issue. And if you perform the training > and the classification in the same query, you also won't have a problem.
Thanks Mary, I hadn't noticed that in the documentation. When you say "you have to put the training set in the database", what does that involve, specifically? I was storing the training set in the same way as the classifier (saving the set of documents at a specific URI), but maybe it needs to be stored differently. _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
