Hello,
I have only one document, and it is currently 433KB. This is the min as this
document will only grow in the future, possibly up to 100MB or larger. The
fragment rooted at the <text> level of my document is currently at about 120KB
in size, and this is also a min as this fragment will possibly grow to several
MB in size. Currently there is only one <text> node, but more of comparable
size could be added in the future. For the nested fragments rooted at the <div>
level of my document the minimum is 1.52KB, the maximum is 108KB and the
average size is 16.56KB.
For the distribution of the size of the <div> nodes, I currently have eight
<div> nodes on which fragments are rooted and these are nested inside the one
<text> node. The size distribution of these eight <div> nodes is {3.45, 1.52,
7.62, 1.96, 108, 3.87, 2.02, 4.01} all in KB. In the future there could be many
hundred <div> nodes nested in any given <text> node and for the most part the
size of these would likely be in the 2KB to 5KB range with the odd outlier
being considerably larger.
As to why I think I need fragmentation, I actually got the idea from advice of
people on this list. I had a problem where I was searching and what I would
consider to be a hit was at the level of the <div> nodes in my example. But
with no fragmentation I could get many hits but the estimate on the hits (which
uses the fragments) would always be 1 (because of the one fragment for my one
document). So I would always get funny results like my search would return “1
to 8 hits of a total of 1 hit”...which of course makes no sense and would
confuse users. The suggested solution was to root fragments at the level at
which I was defining a hit. This works perfectly except that, as I outline
below I have two levels, one nested inside the other, at which my search
defines a hit. I consider the <text> node level to be a hit in certain
situations and the <div> node level to be a hit under other situations. I
should probably mention that the two different situations I’ve outlined are
disjoint: there is no overlap between the two searching situations.
I hope that helped,
Adam
From: [email protected]
[mailto:[email protected]] On Behalf Of Nuno Job
Sent: April 8, 2010 7:16 PM
To: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Search Question re:Fragments
Hi Adam,
Can you please complement that informatiom by saying how big are tjose
documents? (max, min, avg)
Also whats the distribution of the size for the elements you displayed here?
Finally why do you think you need fragmentation?
That will help me (and others) giving you a decent enough answer, even though
many other things might need to be taken into consideration.
Nuno
On Apr 8, 2010 4:32 PM, "Adam Patterson"
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I have a document which looks something like this (oversimplified for demo
purposes):
<teiCorpus>
<teiHeader>
...
</teiHeader>
<TEI>
<teiHeader>
...
</teiHeader>
<text>
<body>
<div/>
<div/>
...
</body>
</text>
</TEI>
<TEI>
<teiHeader>
...
</teiHeader>
<text>
<body>
<div/>
<div/>
...
</body>
</text>
</TEI> ...
</teiCorpus>
I have rooted fragments at the <text> level, and I have rooted fragments at the
<div> level (actually I made the <body> node a fragment parent...but it amounts
to the same thing I think). So, the fragments rooted at the <div> level are
fragments nested inside the fragment rooted at the <text> level.
Now, I am trying to build a search which has two scenarios: (1) It searches at
the <div> level and considers a fragment rooted at a <div> to be a hit if at
least one match occurs within the <div> node or one of its descendants; (2)
searches at the <text> level and considers a fragment rooted at a <text> level
to be a hit if at least one match occurs within the <text> node or one of its
descendants. Scenario (1) is working well, but for scenario (2) my search is
still considering fragments rooted at the <div> level to be hits. Is there any
way to tell the search which level of fragment to use for evaluation?
In scenario (2) I don’t want the <div> level fragments to be considered hits. I
want the higher level fragment, the fragment rooted at the <text> level to be a
hit.
Feedback is appreciated, and thanks,
Adam Patterson
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://xqzone.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general