I've been trying to use the SVM classifier (MarkLogic 4.0-1) to
classify a set of documents, but ran into a problem when trying to
save a trained 'supports' classifier between runs. The problem seems
to be that the saved classifier identifies documents in the training
set using a temporary ID, which is no longer valid when the
classification of the test set is performed. With the 'weights'
classifier it works fine.
Here's the error message:
===
Invalid classifier specification element:
XDMP-BADDOCID:
doc("saved/classifier")/cts:classifier/cts:supports/cts:class[1]/cts:doc[1]
-- Invalid classifier specification element: document id
6196220549445471859 not found
===
I've attached a PHP script that contains the actual XQuery queries
used, in case that's helpful.
alf
<?php
global $marklogic;
$type = 'supports';
//$type = 'weights';
$train = sprintf('
let $train := doc()[/article/front/article-meta/kwd-group/kwd][1 to 3]
let $labels := for $x in $train
let $classes := for $category in $x/article/front/article-meta/kwd-group/kwd
return <class name="{$category}"/>
return <cts:label>{$classes}</cts:label>
let $options := <options xmlns="cts:train"><classifier-type>%s</classifier-type></options>
let $classifier := cts:train($train, $labels, $options)
return
(
xdmp:document-insert("saved/training-labels", <root>{$labels}</root>),
xdmp:document-insert("saved/training-documents", <root>{$train}</root>),
xdmp:document-insert("saved/classifier", $classifier),
<trained>{count($train)}</trained>
)
', $type);
$test = sprintf('
let $labels := doc("saved/training-labels")/root/cts:label
let $train := doc("saved/training-documents")/root/article
let $classifier := doc("saved/classifier")/cts:classifier
(:
let $options := <options xmlns="cts:train"><classifier-type>%s</classifier-type></options>
let $classifier := cts:train($train, $labels, $options)
:)
let $test := collection("/articles")[1 to 3]
return
<classification>
{cts:classify($test, $classifier, <options xmlns="cts:classify"/>, $train)}
</classification>
', $type);
$result = $marklogic->query($train);
print_r($result);
$result = $marklogic->query($test);
print_r($result);
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general