I have about 500 XML documents averaging about 3MB or 4MB in size, though some are as small as 100K and one is as large as 14MB.[1] I can't really do anything at all with them in MarkLogic without specifying fragments, so the question is what to use as fragment root or fragment parent. I have multiple div elements in every document and can at least successfully load the documents and do some simple queries when I specify div as fragment root, but the div elements vary widely in size and are probably not the best choice for consistent fragment size.
There are page break tags at regular intervals, and a quick check shows that a page has about 40K of data on it, so that sounds promising as a basis for fragments given the recommendation in the documentation that fragments should be between 10K and 100K in size.[2] However, the page break is a milestone tag with no content (something like <pb facs="14"/>). If I specify such an element as the fragment root, will it work as the fragment delimiter? In other words, does the fragmenter create a fragment for everything enclosed by the fragment root, or does it simply start a new fragment whenever it encounters the element specified as the fragment root? [1] And this is a small pilot investigation for a much larger collection of similar documents. [2] https://docs.marklogic.com/guide/admin/fragments#id_46736 ________________________________________ Craig A. Berry mailto:[email protected] "... getting out of a sonnet is much more difficult than getting in." Brad Leithauser _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
