To elaborate, on this a bit more: I like the fact that the collector works with a transaction-size, and dislike the fact CPF/transforms don't. I need the speed. But if the collector is running effectively in a single thread, that doesn't sound as efficient as could be..
Kind regards, Geert > -----Oorspronkelijk bericht----- > Van: Geert Josten [mailto:[email protected]] > Verzonden: woensdag 24 oktober 2012 16:05 > Aan: MarkLogic Developer Discussion > Onderwerp: RE: [MarkLogic Dev General] info studio using CPU > > Hi Colleen, > > The transforms cause much more tasks on the queue than the collector does, > since the collector uses the transaction-size, and the transforms work on > files individually, as far as I know. I'd actually would like the > collector to use parallel processes if possible, but just changing the > invokes to spawns didn't work well. ;-P > > Kind regards, > Geert > > > -----Oorspronkelijk bericht----- > > Van: [email protected] [mailto:general- > > [email protected]] Namens Colleen Whitney > > Verzonden: woensdag 24 oktober 2012 15:21 > > Aan: MarkLogic Developer Discussion > > Onderwerp: Re: [MarkLogic Dev General] info studio using CPU > > > > To avoid flooding the task queue, particularly in cases where transforms > will also > > be happening. (It does use more than one thread except on tiny batches, > but not > > many, and I've often wondered if number of loading threads should be > made > > controllable.) Ticket updates don't contend for a single document, but > rather > > write small documents; ticket status is done by query across them. > > > > > > Sent from my iPhone > > > > On Oct 23, 2012, at 12:35 PM, "Geert Josten" <[email protected]> > wrote: > > > > > Hi Colleen, > > > > > > Interesting. Why doesn't the collector transaction-manager spawn the > > > transactions, instead of invoking them? Afraid that updating the > ticket > > > along the way from multiple threads will interfere with each other? > > > > > > Kind regards, > > > Geert > > > > > >> -----Oorspronkelijk bericht----- > > >> Van: [email protected] [mailto:general- > > >> [email protected]] Namens Colleen Whitney > > >> Verzonden: dinsdag 23 oktober 2012 17:40 > > >> Aan: MarkLogic Developer Discussion > > >> Onderwerp: Re: [MarkLogic Dev General] info studio using CPU > > >> > > >> Yes, that's right. Transformations take advantage of the task > server. > > > The > > >> collector does not. > > >> > > >> ________________________________________ > > >> From: [email protected] [general- > > >> [email protected]] On Behalf Of Steiner, David J. > > > (LNG-DAY) > > >> [[email protected]] > > >> Sent: Tuesday, October 23, 2012 8:29 AM > > >> To: MarkLogic Developer Discussion > > >> Subject: Re: [MarkLogic Dev General] info studio using CPU > > >> > > >> Doesn't appear that the OS is swapping. > > >> > > >> It appears that there are 16 task server threads. > > >> > > >> Upon further "watching", it appears that just the collector may not > > > utilize > > >> threads? It appears that once the transforming starts, all CPUs > become > > >> engaged. > > >> > > >> David > > >> > > >> -----Original Message----- > > >> From: [email protected] [mailto:general- > > >> [email protected]] On Behalf Of Michael Blakeley > > >> Sent: Tuesday, October 23, 2012 11:23 AM > > >> To: MarkLogic Developer Discussion > > >> Cc: MarkLogic Developer Discussion > > >> Subject: Re: [MarkLogic Dev General] info studio using CPU > > >> > > >> Check the OS metrics. If RAM is maxed out, does that mean the OS is > > > swapping? > > >> If so, it's the swap disk that is the bottleneck. > > >> > > >> If you can't find an OS bottleneck... How many task server threads > are > > >> configured? I think the default is 4. Adding more threads won't help > if > > > the system > > >> is swapping or otherwise at its limits though. > > >> > > >> -- Mike > > >> > > >> On Oct 23, 2012, at 7:55, "Steiner, David J. (LNG-DAY)" > > >> <[email protected]> wrote: > > >> > > >>> Using ML 6.0-1.1. > > >>> > > >>> In Information Studio, I'm using a CSV collector, to process > hundreds > > > of CSV > > >> files. I'm also doing a transform to pull each row out of the CSV > and > > > write it as > > >> an individual document into another DB (actually, a naked property, > but > > > I don't > > >> think that matters). > > >>> > > >>> The files are all under 50MB (wasn't sure if that 64MB limit still > > > existed). > > >>> > > >>> It seems like only one CPU is being used and we have 8 available. > RAM > > > (24GB) > > >> is maxed out. It took 72 minutes to process 20 files. > > >>> > > >>> Is Info Studio specifically not utilizing more CPU because all of > the > > > RAM is > > >> already being used? > > >>> > > >>> Ideally, I guess, I'd like for Info Studio to be able to take > > > advantage of all CPUs > > >> while ingesting. I'm thinking the ingestion where CSV is being > > > translated to XML > > >> is the intense part. The "splitting" out and "document" (property) > > > insert > > >> shouldn't be as intense? > > >>> > > >>> Thanks, > > >>> David > > >>> _______________________________________________ > > >>> General mailing list > > >>> [email protected] > > >>> http://developer.marklogic.com/mailman/listinfo/general > > >>> > > >> _______________________________________________ > > >> General mailing list > > >> [email protected] > > >> http://developer.marklogic.com/mailman/listinfo/general > > >> _______________________________________________ > > >> General mailing list > > >> [email protected] > > >> http://developer.marklogic.com/mailman/listinfo/general > > >> _______________________________________________ > > >> General mailing list > > >> [email protected] > > >> http://developer.marklogic.com/mailman/listinfo/general > > > _______________________________________________ > > > General mailing list > > > [email protected] > > > http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
