2008/12/11 Mary Holstege <[email protected]>:
> On Thu, 11 Dec 2008 09:13:43 -0800, Alf Eaton <[email protected]> wrote:
>
>> I've been trying to use the SVM classifier (MarkLogic 4.0-1) to
>> classify a set of documents, but ran into a problem when trying to
>> save a trained 'supports' classifier between runs. The problem seems
>> to be that the saved classifier identifies documents in the training
>> set using a temporary ID, which is no longer valid when the
>> classification of the test set is performed. With the 'weights'
>> classifier it works fine.
>>
>> Here's the error message:
>> ===
>> Invalid classifier specification element:
>> XDMP-BADDOCID:
>> doc("saved/classifier")/cts:classifier/cts:supports/cts:class[1]/cts:doc[1]
>> -- Invalid classifier specification element: document id
>> 6196220549445471859 not found
>> ===
>>
>> I've attached a PHP script that contains the actual XQuery queries
>> used, in case that's helpful.
>>
>> alf
>
> This is expected behaviour.
> The documentation (for cts:train) says this, although it doesn't
> perhaps stress the implications:
>  "The support vector representation of the classifier includes a supports
>  node that has <class/> children for each class. Here the class elements
>  contain a list of doc elements which identify the specific training nodes
>  using an internal key. This internal key is valid across queries only for
>  nodes in the database."
>
> What this means is that if your training and classification are happening in
> different queries (which is generally the case, although it need not be),
> then you have to put the training set in the database if you are using the
> "supports" form of the classifier.  If you are using the "weights" form of
> the classifier you won't have this issue.  And if you perform the training
> and the classification in the same query, you also won't have a problem.


Thanks Mary, I hadn't noticed that in the documentation.

When you say "you have to put the training set in the database", what
does that involve, specifically? I was storing the training set in the
same way as the classifier (saving the set of documents at a specific
URI), but maybe it needs to be stored differently.
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to