Re: [MarkLogic Dev General] Adding new fields

Danny Sokolsky Tue, 14 Oct 2014 11:48:55 -0700

One other thing here:  if you have the reindexer off, you can ask the system 
how many fragments need reindexing using the “preview-reindexer” option in 
xdmp:forest-counts.  For example:


xdmp:forest-counts(xdmp:database-forests(xdmp:database("Documents")), (), 
"preview-reindexer")

This will query the database and calculate the number of fragments needing 
reindexing, returning a report.

-Danny

From: Danny Sokolsky
Sent: Tuesday, October 14, 2014 11:14 AM
To: '[email protected]'; 'MarkLogic Developer Discussion'
Subject: RE: [MarkLogic Dev General] Adding new fields

You can turn reindexing off during peak times to minimize the impact.  It will 
pick up where you left off when you turn it back on.  Another thing you can do 
is to leave reindexing off, but just rewrite the documents that you want to 
(for example, do a document-insert of a document with its previous content as 
what to insert)—that will have the effect of reindexing just those documents.

If you use the path to specify your field, you can use any path that returns 
true from cts:valid-index-path:

http://docs.marklogic.com/cts:valid-index-path

I am not really understanding how your field can affect every document but you 
only want it in some of the documents.  Maybe the field is not selective enough 
(the path field might help there)?

Also, there are several bug fixes in 7.0-4 wrt fields, so planning and testing 
an upgrade might be a good idea.

-Danny


From: [email protected] 
[mailto:[email protected]] On Behalf Of [email protected]
Sent: Monday, October 13, 2014 11:43 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Adding new fields

Hi,

The indexes do not include the root element. Unfortunately we need to index an 
element that exists in all documents - the only thing that differs is the 
attribute value. (The field index settings are tweaked for specific purposes in 
each case, result from having different full text search requirements for 
specific document sets.)

As this is built on top of another product, we need to have the element named 
like it is in there, and the element is found in all documents. When I look at 
the database status right after adding one such field, I can see that the 
forests are all reindexing totalling millions of docs to go. With new tiered 
hardware this is completed in order of hours, sometimes takes over a day 
though, and with old hardware it took in order of weeks to add one. Our 
monitoring also reveals that it really spikes the usable disk bandwidth, so it 
is definitely working a lot. (My guess is that it selects all the docs with the 
element, but is not intelligent enough to limit using the attribute value too.)

Indexes that include only elements that can be found from a fraction of 
documents are not a problem. Is there some indexing option that I can turn on 
so that ML can index only the docs that have a specific attribute value in the 
given element? Now it seems only capable of querying the docs that have the 
element

This may also be a design issue, but unfortunately I'm unable to do any big 
changes to the way we do things in the codebase I've inherited.

We're running 7.0-2.3 btw, if that matters.

Ville

------ Original Message ------
From: "Danny Sokolsky" 
<[email protected]<mailto:[email protected]>>
To: "MarkLogic Developer Discussion" 
<[email protected]<mailto:[email protected]>>
Sent: 14.10.2014 0:41:51
Subject: Re: [MarkLogic Dev General] Adding new fields

Hi Ville,

I don’t know of a way to tell MarkLogic to trust you in this case, and you 
should not need it to.  If you do not have any content to reindex, and if 
reindexing is enabled, it should not rewrite all of the content.  It will query 
all of the content to see if it needs reindexing, which will not be free but 
should not be too expensive, but I would not expect a full reindex to happen.  
In that case you should see some messages in the log about reindexing that 
database and a little later another message saying you reindexes 0 fragments 
(in fact, you will see these messages each time the config files change).

You mention your fields are doing includes.  I would recommend using paths for 
your fields instead.  Also, make sure your fields are not including the root, 
as that is almost never the correct way to do it.  Are you using 7.0-4 for 
this?  If not, try upgrading.

-Danny

From: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of [email protected]<mailto:[email protected]>
Sent: Monday, October 13, 2014 12:58 AM
To: MarkLogic Developer Discussion
Subject: [MarkLogic Dev General] Adding new fields

Hi,

when developing applications with ML as the database, we need to add new 
indexes regularly to deliver new features. We often (probably 95%) of the time 
add new indexes that will not hit any content in the database currently, but 
know that eventually will when new content is added.

As we have terabytes / millions of docs of content, these reindex operations 
can be costly and take considerable time to run.

So finally to the question: given that we're adding a new field that has one 
include, it seems that ML goes through all documents in the database (include 
limits by element and attribute value) - is there a way to tell ML that hey, we 
know, and we take the responsibility, that the database currently does not have 
any content that needs to be reindex, so even though the database wide 
"reindexer enable" is on, please do not do any reindexing for this field?

Would it work to toggle reindexer enable off while adding the fields, and then 
toggling it back on. What about new documents added during reindexer is off? 
(We don't have the luxury to stop writes at any given time.)

Ville

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] Adding new fields

Reply via email to