I’m working on a project that currently depends on MarkLogic’s reverse query mechanism. This is a feature whereby you store documents that contain MarkLogic-specific queries (queries in their cts namespace). These get indexed in some way (no idea how the index works). You can then search for these stored queries in the context of specific nodes and ML will return those query documents that would match the specified nodes. Their driving use case is alerting-type applications where when new docs get added you see what queries they apply to and then use those queries to do something.
My use case is classification: given a node to be classified, find all queries that match it and from the query get the classification details (preferred term, variant forms, associated taxonomy, etc.). This process definitely depends on MarkLogic’s full-text search features, for example to match any form of a non-preferred term to a full-text search that would match it. I have a large corpus to classify (approximately 45 million objects at current count). The processing is inherently parallelizable so I’m looking at non-ML options that would allow us to scale to the limits of our hardware budget. Even if each node was less efficient than ML we would be able to implement massive throughput for this classification operation. In theory we could scale to one processor per object if budget were unlimited (it is not unlimited), so even a brute-force solution would perform well at larger scales. So I guess I have two questions really: 1. Can anyone share ways they use BaseX for doing classification in general? I’ve so far just been focused on analyzing the current system to find performance bottlenecks so I haven’t yet had a chance to think through the classification process in general, but there must be well-understood strategies. I suspect that one could build the equivalent of ML’s reverse query index in BaseX. 2. Is there a direct equivalent to ML’s reverse query facility in BaseX or an obvious route to building one? Thanks, Eliot -- Eliot Kimber http://contrext.com