When you insert the files, tell MarkLogic they are binary and they shouldn't be indexed.
Many have run into this when they insert docs with no file extension. When no file extension is present, MarkLogic assumes the files are binary. In this case you can do a doc() of a file, and it really looks like XML in your browser, but as soon as you try some XPath, nothing works. This is the one time the source of the problem isn't namespaces. :) Oh, and when you do this, the insert rate will be a *lot* faster (at least potentially, you could always have a bottleneck somewhere). Kelly Message: 5 Date: Fri, 5 Mar 2010 15:09:48 -0800 From: "Lee, David" <[email protected]> Subject: [MarkLogic Dev General] How to keep ML from indexing non-XML files To: "General Mark Logic Developer Discussion" <[email protected]> Message-ID: <dd37f70d78609d4e9587d473fc61e0a716d93...@postoffice> Content-Type: text/plain; charset="us-ascii" I know ML doesn't index "binary" files. But how can I keep ML from indexing things that 'might look like' XML files. Say xhtml or html files ... I need to store HTML files along with XML files and I don't want ML to index, or return in searches the XML files What's the trick to tell ML to ignore file contents for purposes of indexing and search. Thanks for any suggestions ! ---------------------------------------- David A. Lee Senior Principal Software Engineer Epocrates, Inc. [email protected] <mailto:[email protected]> 812-482-5224 _______________________________________________ General mailing list [email protected] http://xqzone.com/mailman/listinfo/general
