I've been trying to use the SVM classifier (MarkLogic 4.0-1) to
classify a set of documents, but ran into a problem when trying to
save a trained 'supports' classifier between runs. The problem seems
to be that the saved classifier identifies documents in the training
set using a temporary ID, which is no longer valid when the
classification of the test set is performed. With the 'weights'
classifier it works fine.

Here's the error message:
===
Invalid classifier specification element:
XDMP-BADDOCID: 
doc("saved/classifier")/cts:classifier/cts:supports/cts:class[1]/cts:doc[1]
-- Invalid classifier specification element: document id
6196220549445471859 not found
===

I've attached a PHP script that contains the actual XQuery queries
used, in case that's helpful.

alf
<?php

global $marklogic;

$type = 'supports';
//$type = 'weights';

$train = sprintf('
  let $train := doc()[/article/front/article-meta/kwd-group/kwd][1 to 3]

  let $labels := for $x in $train
      let $classes := for $category in $x/article/front/article-meta/kwd-group/kwd
        return <class name="{$category}"/>
      return <cts:label>{$classes}</cts:label>
  
  let $options := <options xmlns="cts:train"><classifier-type>%s</classifier-type></options>
  let $classifier := cts:train($train, $labels, $options)
  
  return 
    (
      xdmp:document-insert("saved/training-labels", <root>{$labels}</root>),
      xdmp:document-insert("saved/training-documents", <root>{$train}</root>),
      xdmp:document-insert("saved/classifier", $classifier),
      <trained>{count($train)}</trained>
    )
  ', $type);

$test = sprintf('
  let $labels := doc("saved/training-labels")/root/cts:label 
  let $train := doc("saved/training-documents")/root/article
  let $classifier := doc("saved/classifier")/cts:classifier

  (:
  let $options := <options xmlns="cts:train"><classifier-type>%s</classifier-type></options>
  let $classifier := cts:train($train, $labels, $options)
  :)
  
  let $test := collection("/articles")[1 to 3]  
  
  return
    <classification>
      {cts:classify($test, $classifier, <options xmlns="cts:classify"/>, $train)}
    </classification>
  ', $type);
  
$result = $marklogic->query($train);
print_r($result);
$result = $marklogic->query($test);
print_r($result);
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to