I’m working on a project that currently depends on MarkLogic’s reverse query 
mechanism. This is a feature whereby you store documents that contain 
MarkLogic-specific queries (queries in their cts namespace). These get indexed 
in some way (no idea how the index works). You can then search for these stored 
queries in the context of specific nodes and ML will return those query 
documents that would match the specified nodes. Their driving use case is 
alerting-type applications where when new docs get added you see what queries 
they apply to and then use those queries to do something.

My use case is classification: given a node to be classified, find all queries 
that match it and from the query get the classification details (preferred 
term, variant forms, associated taxonomy, etc.). This process definitely 
depends on MarkLogic’s full-text search features, for example to match any form 
of a non-preferred term to a full-text search that would match it.

I have a large corpus to classify (approximately 45 million objects at current 
count). The processing is inherently parallelizable so I’m looking at non-ML 
options that would allow us to scale to the limits of our hardware budget. Even 
if each node was less efficient than ML we would be able to implement massive 
throughput for this classification operation. In theory we could scale to one 
processor per object if budget were unlimited (it is not unlimited), so even a 
brute-force solution would perform well at larger scales.

So I guess I have two questions really:

1. Can anyone share ways they use BaseX for doing classification in general? 
I’ve so far just been focused on analyzing the current system to find 
performance bottlenecks so I haven’t yet had a chance to think through the 
classification process in general, but there must be well-understood 
strategies. I suspect that one could build the equivalent of ML’s reverse query 
index in BaseX.
2. Is there a direct equivalent to ML’s reverse query facility in BaseX or an 
obvious route to building one?

Thanks,

Eliot
--
Eliot Kimber
http://contrext.com
 



Reply via email to