Hi Colleen,

The transforms cause much more tasks on the queue than the collector does,
since the collector uses the transaction-size, and the transforms work on
files individually, as far as I know. I'd actually would like the
collector to use parallel processes if possible, but just changing the
invokes to spawns didn't work well. ;-P

Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: [email protected] [mailto:general-
> [email protected]] Namens Colleen Whitney
> Verzonden: woensdag 24 oktober 2012 15:21
> Aan: MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] info studio using CPU
>
> To avoid flooding the task queue, particularly in cases where transforms
will also
> be happening. (It does use more than one thread except on tiny batches,
but not
> many, and I've often wondered if number of loading threads should be
made
> controllable.) Ticket updates don't contend for a single document, but
rather
> write small documents; ticket status is done by query across them.
>
>
> Sent from my iPhone
>
> On Oct 23, 2012, at 12:35 PM, "Geert Josten" <[email protected]>
wrote:
>
> > Hi Colleen,
> >
> > Interesting. Why doesn't the collector transaction-manager spawn the
> > transactions, instead of invoking them? Afraid that updating the
ticket
> > along the way from multiple threads will interfere with each other?
> >
> > Kind regards,
> > Geert
> >
> >> -----Oorspronkelijk bericht-----
> >> Van: [email protected] [mailto:general-
> >> [email protected]] Namens Colleen Whitney
> >> Verzonden: dinsdag 23 oktober 2012 17:40
> >> Aan: MarkLogic Developer Discussion
> >> Onderwerp: Re: [MarkLogic Dev General] info studio using CPU
> >>
> >> Yes, that's right.  Transformations take advantage of the task
server.
> > The
> >> collector does not.
> >>
> >> ________________________________________
> >> From: [email protected] [general-
> >> [email protected]] On Behalf Of Steiner, David J.
> > (LNG-DAY)
> >> [[email protected]]
> >> Sent: Tuesday, October 23, 2012 8:29 AM
> >> To: MarkLogic Developer Discussion
> >> Subject: Re: [MarkLogic Dev General] info studio using CPU
> >>
> >> Doesn't appear that the OS is swapping.
> >>
> >> It appears that there are 16 task server threads.
> >>
> >> Upon further "watching", it appears that just the collector may not
> > utilize
> >> threads?  It appears that once the transforming starts, all CPUs
become
> >> engaged.
> >>
> >> David
> >>
> >> -----Original Message-----
> >> From: [email protected] [mailto:general-
> >> [email protected]] On Behalf Of Michael Blakeley
> >> Sent: Tuesday, October 23, 2012 11:23 AM
> >> To: MarkLogic Developer Discussion
> >> Cc: MarkLogic Developer Discussion
> >> Subject: Re: [MarkLogic Dev General] info studio using CPU
> >>
> >> Check the OS metrics. If RAM is maxed out, does that mean the OS is
> > swapping?
> >> If so, it's the swap disk that is the bottleneck.
> >>
> >> If you can't find an OS bottleneck... How many task server threads
are
> >> configured? I think the default is 4. Adding more threads won't help
if
> > the system
> >> is swapping or otherwise at its limits though.
> >>
> >> -- Mike
> >>
> >> On Oct 23, 2012, at 7:55, "Steiner, David J. (LNG-DAY)"
> >> <[email protected]> wrote:
> >>
> >>> Using ML 6.0-1.1.
> >>>
> >>> In Information Studio, I'm using a CSV collector, to process
hundreds
> > of CSV
> >> files.  I'm also doing a transform to pull each row out of the CSV
and
> > write it as
> >> an individual document into another DB (actually, a naked property,
but
> > I don't
> >> think that matters).
> >>>
> >>> The files are all under 50MB (wasn't sure if that 64MB limit still
> > existed).
> >>>
> >>> It seems like only one CPU is being used and we have 8 available.
RAM
> > (24GB)
> >> is maxed out.  It took 72 minutes to process 20 files.
> >>>
> >>> Is Info Studio specifically not utilizing more CPU because all of
the
> > RAM is
> >> already being used?
> >>>
> >>> Ideally, I guess, I'd like for Info Studio to be able to take
> > advantage of all CPUs
> >> while ingesting.  I'm thinking the ingestion where CSV is being
> > translated to XML
> >> is the intense part.  The "splitting" out and "document" (property)
> > insert
> >> shouldn't be as intense?
> >>>
> >>> Thanks,
> >>> David
> >>> _______________________________________________
> >>> General mailing list
> >>> [email protected]
> >>> http://developer.marklogic.com/mailman/listinfo/general
> >>>
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://developer.marklogic.com/mailman/listinfo/general
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://developer.marklogic.com/mailman/listinfo/general
> >> _______________________________________________
> >> General mailing list
> >> [email protected]
> >> http://developer.marklogic.com/mailman/listinfo/general
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to