On Thu, 11 Dec 2008 09:13:43 -0800, Alf Eaton <[email protected]> wrote:
I've been trying to use the SVM classifier (MarkLogic 4.0-1) to
classify a set of documents, but ran into a problem when trying to
save a trained 'supports' classifier between runs. The problem seems
to be that the saved classifier identifies documents in the training
set using a temporary ID, which is no longer valid when the
classification of the test set is performed. With the 'weights'
classifier it works fine.
Here's the error message:
===
Invalid classifier specification element:
XDMP-BADDOCID:
doc("saved/classifier")/cts:classifier/cts:supports/cts:class[1]/cts:doc[1]
-- Invalid classifier specification element: document id
6196220549445471859 not found
===
I've attached a PHP script that contains the actual XQuery queries
used, in case that's helpful.
alf
This is expected behaviour.
The documentation (for cts:train) says this, although it doesn't
perhaps stress the implications:
"The support vector representation of the classifier includes a supports
node that has <class/> children for each class. Here the class elements
contain a list of doc elements which identify the specific training nodes
using an internal key. This internal key is valid across queries only for
nodes in the database."
What this means is that if your training and classification are happening
in
different queries (which is generally the case, although it need not be),
then you have to put the training set in the database if you are using the
"supports" form of the classifier. If you are using the "weights" form of
the classifier you won't have this issue. And if you perform the training
and the classification in the same query, you also won't have a problem.
Cheers
//Mary
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general