When you insert the files, tell MarkLogic they are binary and they shouldn't be 
indexed.

Many have run into this when they insert docs with no file extension. When no 
file extension is present, MarkLogic assumes the files are binary. In this case 
you can do a doc() of a file, and it really looks like XML in your browser, but 
as soon as you try some XPath, nothing works. This is the one time the source 
of the problem isn't namespaces. :)

Oh, and when you do this, the insert rate will be a *lot* faster (at least 
potentially, you could always have a bottleneck somewhere). 

Kelly

Message: 5
Date: Fri, 5 Mar 2010 15:09:48 -0800
From: "Lee, David" <[email protected]>
Subject: [MarkLogic Dev General] How to keep ML from indexing non-XML
        files
To: "General Mark Logic Developer Discussion"
        <[email protected]>
Message-ID: <dd37f70d78609d4e9587d473fc61e0a716d93...@postoffice>
Content-Type: text/plain; charset="us-ascii"

I know ML doesn't index "binary" files.   But how can I keep ML from
indexing things that 'might look like' XML files.

Say xhtml or html files ...

I need to store HTML files along with XML files and I don't want ML to index, 
or return in searches the XML files

What's the trick to tell ML to ignore file contents for purposes of indexing 
and search.

 

Thanks for any suggestions !

 

 

----------------------------------------

David A. Lee

Senior Principal Software Engineer

Epocrates, Inc.

[email protected] <mailto:[email protected]> 

812-482-5224
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to