Hi, Javier: If it's a smallish set of documents, you can write a loop that reads each document and applies a regex to all of the text in the document, but if it is a substantial corpus, you should look at enriching the documents to support searching for VIN numbers.
To search over a set of values with performance at scale requires an index over the values. To recognize the values within JSON or XML documents, the indexer looks for a specified JSON property or XML element or attribute. That requires modifying the documents on or after ingestion to identify the VIN numbers. (It's easiest if you can specify a unique JSON property or XML element or attribute, but if that's not possible, fields can support unions and path range indexes can support containment.) Several natural language processors try to solve this kind of enrichment problem. Maybe someone on the list can recommend specific NLP tools for VIN recognition based on their experience. Hoping that helps, Erik Hennum ________________________________ From: [email protected] [[email protected]] on behalf of Javier Lizarraga [[email protected]] Sent: Tuesday, July 14, 2015 5:21 PM To: [email protected] Subject: [MarkLogic Dev General] search MarkLogic Database using Regular Expressions Is there a way to issue a search using a regular expression in MarkLogic? For example the following regular expression identifies a vin number: (([a-h,A-H,j-n,J-N,p-z,P-Z,0-9]{9})([a-h,A-H,j-n,J-N,p,P,r-t,R-T,v-z,V-Z,0-9])([a-h,A-H,j-n,J-N,p-z,P-Z,0-9])(\d{6})) I would like to issue a query that would search the entire database returning documents that contain valid vin numbers. Similar to the MarkLogic fn:match which takes in a string and outputs a Boolean value. fn:matches("this is my string 2T3JK4DV1AW023473" , "(([a-h,A-H,j-n,J-N,p-z,P-Z,0-9]{9})([a-h,A-H,j-n,J-N,p,P,r-t,R-T,v-z,V-Z,0-9])([a-h,A-H,j-n,J-N,p-z,P-Z,0-9])(\d{6}))") I’d like to do something like this cts:search(“(([a-h,A-H,j-n,J-N,p-z,P-Z,0-9]{9})([a-h,A-H,j-n,J-N,p,P,r-t,R-T,v-z,V-Z,0-9])([a-h,A-H,j-n,J-N,p-z,P-Z,0-9])(\d{6}))) Any help would be greatly appreciated!! Javier
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
