Re: [MarkLogic Dev General] can a milestone tag be a fragment root?

Michael Blakeley Sun, 18 Jan 2015 19:15:06 -0800

No, not in any useful way. But don't give up hope.

Technically you could configure a fragment root on an empty element, but that 
would only hurt performance. Every empty element would create subfragment 
overhead, adding N extra empty child fragments to each document. But there 
wouldn't be any benefit, because all the useful content would still be in the 
parent document. Subfragment represent hierarchal structure, not size.

Adding fragment rules makes sense if and only if you have large documents with 
a number of elements that form conceptually equivalent sub-documents. This 
works when the document acts something like a table, and for whatever reason 
you don't want to split it on ingestion. So you create virtual sub-documents: 
not as good as true documents, but good enough — and ideal for certain 
situations. From what I understand you aren't in any of those situations. Each 
of your documents is large, but there's no conceptually useful sub-document 
structure. 

All is not lost: MarkLogic should still be able to do the job. I've worked with 
a database over 7-TB in size with a significant number of large documents, some 
well above 50-MB.

In a situation like that you have to be careful with your queries. Unfiltered 
search and lexicon accessors don't much care how large your documents are: use 
them wherever possible. Avoid returning large result sets: if that means you 
have to cap the page size for search results, do it. You might be able to 
arrange things so that you can display search results and other query reports 
entirely from some mix of range indexes and properties, without touching the 
documents themselves.

Maybe you could write up one of "can't really do anything" use cases, and ask 
us how to solve it? You might get some useful ideas, and you could repeat that 
with other use-cases until you feel comfortable with the techniques.

-- Mike

> On 18 Jan 2015, at 18:38 , Craig A. Berry <[email protected]> wrote:
> 
> I have about 500 XML documents averaging about 3MB or 4MB in size, though 
> some are as small as 100K and one is as large as 14MB.[1]  I can't really do 
> anything at all with them in MarkLogic without specifying fragments, so the 
> question is what to use as fragment root or fragment parent.  I have multiple 
> div elements in every document and can at least successfully load the 
> documents and do some simple queries when I specify div as fragment root, but 
> the div elements vary widely in size and are probably not the best choice for 
> consistent fragment size.
> 
> There are page break tags at regular intervals, and a quick check shows that 
> a page has about 40K of data on it, so that sounds promising as a basis for 
> fragments given the recommendation in the documentation that fragments should 
> be between 10K and 100K in size.[2]  However, the page break is a milestone 
> tag with no content (something like <pb facs="14"/>).  If I specify such an 
> element as the fragment root, will it work as the fragment delimiter?  In 
> other words, does the fragmenter create a fragment for everything enclosed by 
> the fragment root, or does it simply start a new fragment whenever it 
> encounters the element specified as the fragment root?
> 
> [1] And this is a small pilot investigation for a much larger collection of 
> similar documents.
> [2] https://docs.marklogic.com/guide/admin/fragments#id_46736
> 
> ________________________________________
> Craig A. Berry
> mailto:[email protected]
> 
> "... getting out of a sonnet is much more
> difficult than getting in."
>                 Brad Leithauser
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] can a milestone tag be a fragment root?

Reply via email to