Thank you for the suggestions!
We have a meta and a body element in all documents. The new field must
index /doc/body[@identifier_id="some_id"], so the element is found from
all documents, but the include in the field limits with the attribute
value. It seems that it reindexes all documents in this case, even if
only fraction of documents have "some_id" (Or none. If I add the index
beforehand, it still reindexes all content, even though not a single doc
has @identifier="non-existent-id").
I will fiddle with the path configurations in our dev environment, and
see if using that helps.
Ville
------ Original Message ------
From: "Danny Sokolsky" <[email protected]>
To: "MarkLogic Developer Discussion" <[email protected]>
Sent: 14.10.2014 21:48:03
Subject: Re: [MarkLogic Dev General] Adding new fields
One other thing here: if you have the reindexer off, you can ask the
system how many fragments need reindexing using the “preview-reindexer”
option in xdmp:forest-counts. For example:
xdmp:forest-counts(xdmp:database-forests(xdmp:database("Documents")),
(), "preview-reindexer")
This will query the database and calculate the number of fragments
needing reindexing, returning a report.
-Danny
From: Danny Sokolsky
Sent: Tuesday, October 14, 2014 11:14 AM
To: '[email protected]'; 'MarkLogic Developer Discussion'
Subject: RE: [MarkLogic Dev General] Adding new fields
You can turn reindexing off during peak times to minimize the impact.
It will pick up where you left off when you turn it back on. Another
thing you can do is to leave reindexing off, but just rewrite the
documents that you want to (for example, do a document-insert of a
document with its previous content as what to insert)—that will have
the effect of reindexing just those documents.
If you use the path to specify your field, you can use any path that
returns true from cts:valid-index-path:
http://docs.marklogic.com/cts:valid-index-path
I am not really understanding how your field can affect every document
but you only want it in some of the documents. Maybe the field is not
selective enough (the path field might help there)?
Also, there are several bug fixes in 7.0-4 wrt fields, so planning and
testing an upgrade might be a good idea.
-Danny
From:[email protected]
[mailto:[email protected]] On Behalf Of
[email protected]
Sent: Monday, October 13, 2014 11:43 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Adding new fields
Hi,
The indexes do not include the root element. Unfortunately we need to
index an element that exists in all documents - the only thing that
differs is the attribute value. (The field index settings are tweaked
for specific purposes in each case, result from having different full
text search requirements for specific document sets.)
As this is built on top of another product, we need to have the element
named like it is in there, and the element is found in all documents.
When I look at the database status right after adding one such field, I
can see that the forests are all reindexing totalling millions of docs
to go. With new tiered hardware this is completed in order of hours,
sometimes takes over a day though, and with old hardware it took in
order of weeks to add one. Our monitoring also reveals that it really
spikes the usable disk bandwidth, so it is definitely working a lot.
(My guess is that it selects all the docs with the element, but is not
intelligent enough to limit using the attribute value too.)
Indexes that include only elements that can be found from a fraction of
documents are not a problem. Is there some indexing option that I can
turn on so that ML can index only the docs that have a specific
attribute value in the given element? Now it seems only capable of
querying the docs that have the element
This may also be a design issue, but unfortunately I'm unable to do any
big changes to the way we do things in the codebase I've inherited.
We're running 7.0-2.3 btw, if that matters.
Ville
------ Original Message ------
From: "Danny Sokolsky" <[email protected]>
To: "MarkLogic Developer Discussion" <[email protected]>
Sent: 14.10.2014 0:41:51
Subject: Re: [MarkLogic Dev General] Adding new fields
Hi Ville,
I don’t know of a way to tell MarkLogic to trust you in this case, and
you should not need it to. If you do not have any content to reindex,
and if reindexing is enabled, it should not rewrite all of the
content. It will query all of the content to see if it needs
reindexing, which will not be free but should not be too expensive,
but I would not expect a full reindex to happen. In that case you
should see some messages in the log about reindexing that database and
a little later another message saying you reindexes 0 fragments (in
fact, you will see these messages each time the config files change).
You mention your fields are doing includes. I would recommend using
paths for your fields instead. Also, make sure your fields are not
including the root, as that is almost never the correct way to do it.
Are you using 7.0-4 for this? If not, try upgrading.
-Danny
From:[email protected]
[mailto:[email protected]] On Behalf Of
[email protected]
Sent: Monday, October 13, 2014 12:58 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Adding new fields
Hi,
when developing applications with ML as the database, we need to add
new indexes regularly to deliver new features. We often (probably 95%)
of the time add new indexes that will not hit any content in the
database currently, but know that eventually will when new content is
added.
As we have terabytes / millions of docs of content, these reindex
operations can be costly and take considerable time to run.
So finally to the question: given that we're adding a new field that
has one include, it seems that ML goes through all documents in the
database (include limits by element and attribute value) - is there a
way to tell ML that hey, we know, and we take the responsibility, that
the database currently does not have any content that needs to be
reindex, so even though the database wide "reindexer enable" is on,
please do not do any reindexing for this field?
Would it work to toggle reindexer enable off while adding the fields,
and then toggling it back on. What about new documents added during
reindexer is off? (We don't have the luxury to stop writes at any
given time.)
Ville
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general