Gary, Keep in mind here that the termlist warning you are seeing in the logs is a warning, not an error. It effectively says that for the terms that are added to this list, assume that they exist in each document in the database and stop tracking position data for them. And if that termlist gets bigger than 256MB (the position list max size default, I think), then likely that is not a bad tradeoff to make. Everything will still work, just slightly differently. It is up to you to decide if those differences will have a material effect on your system.
Now adding new forests can help to eliminate this warning, but make sure that you care. I think the Java error you got is likely not related to the termlist warning, but something different like connectivity issues between your client and server. -Danny -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Tuesday, May 28, 2013 12:28 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Termlist database error https://github.com/mblakele/task-rebalancer is pretty robust now. -- Mike On 28 May 2013, at 12:17 , Damon Feldman <[email protected]> wrote: > Gary, > > Adding a forest will provide extra space but won't offload content from the > existing forest(s) in MarkLogic version 6 or below. You'll need to run CoRB > or scheduled tasks to re-ingest data or (better) move data from one forest to > another by specifying the forest-id in xdmp:document-insert() and > re-inserting the documents. > > I'm not sure how to trace the long ID number to a term description, but > someone else may know. > > The rebalancing code will be something like this: > for $u in cts:uris("", (), (), 0, $old-forest-ids)[1 to 100] let $p := > [find the doc's permissions] let $c := [find the doc's collections] > let $q := [find the doc's quality] xdmp:document-insert($u, doc($u), > $p, $c, $q, $new-forest-ids) > > and you just run it over and over until about 1/Nth of the content is in each > forest. > > Someone may have a real script for this that could be posted to this list for > posterity. > > Yours, > Damon > > From: [email protected] > [mailto:[email protected]] On Behalf Of Gary > Larsen > Sent: Tuesday, May 28, 2013 2:53 PM > To: 'MarkLogic Developer Discussion' > Subject: Re: [MarkLogic Dev General] Termlist database error > > Damon, > > Thanks for your response. I will add another forest to see if that > helps. About 5 minutes before that error a Java process got > terminated. I'm guessing it's related (stack trace below) > > Is there an easy way determine the offending range index or field? > > > Caused by: java.io.IOException: An established connection was aborted by the > software in your host machine > at sun.nio.ch.SocketDispatcher.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:33) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69) > at sun.nio.ch.IOUtil.write(IOUtil.java:26) > at > sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336) > at > com.marklogic.http.HttpChannel.writeBuffer(HttpChannel.java:371) > at > com.marklogic.http.HttpChannel.writeBody(HttpChannel.java:353) > at > com.marklogic.http.HttpChannel.flushRequest(HttpChannel.java:347) > at com.marklogic.http.HttpChannel.write(HttpChannel.java:136) > at > com.marklogic.xcc.impl.handlers.ContentInsertController.issueRequest(ContentInsertController.java:242) > at > com.marklogic.xcc.impl.handlers.ContentInsertController.serverDialog(ContentInsertController.java:116) > at > com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(A > bstractRequestController.java:84) > > > Gary > > From: [email protected] > [mailto:[email protected]] On Behalf Of Damon > Feldman > Sent: Tuesday, May 28, 2013 2:34 PM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Termlist database error > > Gary, > > I believe you have a very large forest with many entries for a common word, > element or similar. Breaking it up into more forests should fix the problem > because each forest will have smaller termlists. > > Once the termlist data is discarded, I think you'll have to rewrite a lot of > data to get the index rebuilt with the positions added back, so I suggest > holding off on ingest or other updates until you address this. > > For background, every element, word, word stem, etc. are a "term" and > termlists are lists of the documents that hold them. > > You have some very long list, which suggests you are operating outside the > ideal parameters of the system. If you post the forest sizes we can confirm > that. > > Yours, > Damon > > -- > Damon Feldman > Sr. Principal Consultant, MarkLogic > > > From: [email protected] > [mailto:[email protected]] On Behalf Of Gary > Larsen > Sent: Tuesday, May 28, 2013 2:32 PM > To: General MarkLogic Developer Discussion > Subject: [MarkLogic Dev General] Termlist database error > > Hi, > > Can someone help me understand what this errors means? Is it serious, > something I can fix with a configuration change? > > 2013-05-26 14:14:46.884 Warning: Termlist for 4697283252598410410 in > C:\Program Files\MarkLogic\Data\Forests\NetVisn_SB\000003d3 is 248 MB; > will discard positions at 256 MB > > Thanks, > Gary > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
