Thanks Mike.  I understand other people (and other configurations) are having 
success with large directories.  I'm simply reporting that *my system* is not 
successful.
I did re-set the memory paramaters and it doesnt help much.
I suspect that your statement is the main one, that a 64bit machine and OS is 
needed to accomidate this type of usage.






-----Original Message-----
From: Michael Blakeley [mailto:[email protected]] 
Sent: Wednesday, December 09, 2009 1:42 PM
To: Lee, David
Cc: General Mark Logic Developer Discussion
Subject: Re: [MarkLogic Dev General] Cannot delete directory with 1mil docs - 
XDMP-MEMORY

David,

Directories with millions of documents aren't necessarily a problem: I 
create them frequently. Last week I build a 20M document database, and 
the largest directory contained 9.2M documents.

I see the 32-bit kernel as more of a problem. A 32-bit kernel is limited 
to a 32-bit address space, and the server process only gets 3-GB of that 
address space, no matter how much RAM or swap you have. So why not 
install a 64-bit linux? Your CPU is probably 64-bit capable, unless it 
pre-dates AMD Opteron or Intel's EM64T technology.

Also, Jason reminded me that you've done some past tuning of your 
database in-memory limits, to accommodate those giant fragmented 
documents. Now that you're loading smaller documents, you should reset 
those to the default values. There's a button for this, toward the 
bottom of the database config screen: it's labeled "get default values". 
Returning to the default values might help you avoid the XDMP-MEMORY error.

Getting back to the query in my last message, it is probably slow 
because it has to read-lock all the documents in the directory, even 
when the query is only deleting 1000 of them. You can get around this 
with some xdmp:eval() trickery (caution - sharp tools!). This version 
uses an outer read-only query to gather the uris, and an inner update to 
delete them. So instead of needing millions of read locks and 1000 write 
locks, it only needs 1000 read locks and 1000 write locks.

This is essentially a way to relax the query's ACID guarantees. Normally 
we guarantee that the documents that are present at the start of a 
transaction, and aren't affected by the transaction, will still be 
available at the end of the transaction. Hence the need to read-lock all 
of them. But by telling the update to run in a different-transaction, we 
can relax this requirement and allow the xdmp:directory() portion to run 
in lockless (timestamped) mode. The assert on line 1 ensures that the 
xdmp:directory() part really does run in timestamped mode.

let $assert :=
   if (xdmp:request-timestamp()) then ()
   else error((), 'NOTIMESTAMP', text { 'outer query is not read-only' })
let $path := '/'
let $map := map:map()
let $list-uris :=
   for $i in xdmp:directory($path, 'infinity')[1 to 1000]
   return map:put($map, xdmp:node-uri($i), true())
let $do := xdmp:eval('
   declare variable $URIS as map:map external;
   xdmp:document-delete(map:keys($URIS))
',
   (xs:QName('URIS'), $map),
   <options xmlns="xdmp:eval">
     <isolation>different-transaction</isolation>
     <prevent-deadlocks>true</prevent-deadlocks>
   </options>
)
return count(map:keys($map))
, xdmp:elapsed-time()

You could keep running that until it returns 0, and you could tinker 
with the '1 to 1000' range if you like.

-- Mike

On 2009-12-09 09:46, Lee, David wrote:
> Thanks for the suggestion
> I am running 4.1-3, I have plenty of swap space.
>
> I tried the bulk deletes but they were taking about 1 minute per 1000 
> documents to delete ...
> I gave up after a few hours.
>
> I've created a new DB and am starting the process of reloading now, about 2/3 
> through then I'll delete the old forest.
>
> I've come to the conclusion, that atleast on my system which is admittedly 
> not that powerful (32bit linux, 4GB ram,  2.8ghz, ) that ML doesnt handle 
> directories with>  1mil entries very well.
> I try to add more then that and run into all sorts of memory problems.
> I try to *delete* that directory and cant.
>
> It also doesnt handle individual files with>  1mil fragments that well but 
> atleast it handles them.
> For my experimental case, I'm trying now a hybrid approach which is to bulk 
> up 1000 "rows" per file and keeping the # of files in a directory in the 
> 1000's not million's ...
>
>
>
> -----Original Message-----
> From: Michael Blakeley [mailto:[email protected]]
> Sent: Wednesday, December 09, 2009 12:33 PM
> To: General Mark Logic Developer Discussion
> Cc: Lee, David
> Subject: Re: [MarkLogic Dev General] Cannot delete directory with 1mil docs - 
> XDMP-MEMORY
>
> The XDMP-MEMORY message does mean that the host couldn't allocation the
> needed memory. In this case that was probably because the transaction
> was too large to fit in memory. If you aren't already using 4.1-3, I'd
> upgrade - just in case this is a known problem that has already been fixed.
>
> If 4.1-3 doesn't help, then I suppose you could increase the swap
> space... but I don't think you'd like the performance. You might be able
> to reduce the sizes of the group-level caches, but that might lead to
> *CACHEFULL errors.
>
> So as Geert suggested, clearing the forest is probably the fastest
> solution. Or if you don't mind spending more time on it, you could
> delete in blocks of 1000 documents.
>
>     for $i in xdmp:directory($path, 'infinity')[1 to 1000]
>     return xdmp:document-delete(xdmp:node-uri($i))
>
> You could automate this using xdmp:spawn(). You could also use
> cts:uris() with a cts:directory-query(), if you have the uri lexicon
> available.
>
> -- Mike
>
> On 2009-12-09 05:59, Lee, David wrote:
>> My joys of success were premature.
>> I ran into memory problems trying to load the full set of documents, it died 
>> after about 1mil.
>> So I tried to delete the directory and now I’m getting
>>
>> Exception running: :query
>> com.marklogic.xcc.exceptions.XQueryException: XDMP-MEMORY: 
>> xdmp:directory-delete
>> ("/RxNorm/rxnsat/") -- Memory exhausted
>> in /eval, on line 1
>>
>> Arg !!!!
>>
>> I’ve tried to change various memory settings to no avail.  Any clue how to 
>> delete this directory ?
>> or should I start to delete the files piecemeal.
>>
>> Suggestions welcome.
>>
>> -David
>>
>>
>> ----------------------------------------
>> David A. Lee
>> Senior Principal Software Engineer
>> Epocrates, Inc.
>> [email protected]<mailto:[email protected]>
>> 812-482-5224
>>
>>
>>
>
>


_______________________________________________
General mailing list
[email protected]
http://xqzone.com/mailman/listinfo/general

Reply via email to