I have about 500 XML documents averaging about 3MB or 4MB in size, though some 
are as small as 100K and one is as large as 14MB.[1]  I can't really do 
anything at all with them in MarkLogic without specifying fragments, so the 
question is what to use as fragment root or fragment parent.  I have multiple 
div elements in every document and can at least successfully load the documents 
and do some simple queries when I specify div as fragment root, but the div 
elements vary widely in size and are probably not the best choice for 
consistent fragment size.

There are page break tags at regular intervals, and a quick check shows that a 
page has about 40K of data on it, so that sounds promising as a basis for 
fragments given the recommendation in the documentation that fragments should 
be between 10K and 100K in size.[2]  However, the page break is a milestone tag 
with no content (something like <pb facs="14"/>).  If I specify such an element 
as the fragment root, will it work as the fragment delimiter?  In other words, 
does the fragmenter create a fragment for everything enclosed by the fragment 
root, or does it simply start a new fragment whenever it encounters the element 
specified as the fragment root?

[1] And this is a small pilot investigation for a much larger collection of 
similar documents.
[2] https://docs.marklogic.com/guide/admin/fragments#id_46736

________________________________________
Craig A. Berry
mailto:[email protected]

"... getting out of a sonnet is much more
 difficult than getting in."
                 Brad Leithauser

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to