To elaborate, on this a bit more: I like the fact that the collector works
with a transaction-size, and dislike the fact CPF/transforms don't. I need
the speed. But if the collector is running effectively in a single thread,
that doesn't sound as efficient as could be..

Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: Geert Josten [mailto:[email protected]]
> Verzonden: woensdag 24 oktober 2012 16:05
> Aan: MarkLogic Developer Discussion
> Onderwerp: RE: [MarkLogic Dev General] info studio using CPU
>
> Hi Colleen,
>
> The transforms cause much more tasks on the queue than the collector
does,
> since the collector uses the transaction-size, and the transforms work
on
> files individually, as far as I know. I'd actually would like the
> collector to use parallel processes if possible, but just changing the
> invokes to spawns didn't work well. ;-P
>
> Kind regards,
> Geert
>
> > -----Oorspronkelijk bericht-----
> > Van: [email protected] [mailto:general-
> > [email protected]] Namens Colleen Whitney
> > Verzonden: woensdag 24 oktober 2012 15:21
> > Aan: MarkLogic Developer Discussion
> > Onderwerp: Re: [MarkLogic Dev General] info studio using CPU
> >
> > To avoid flooding the task queue, particularly in cases where
transforms
> will also
> > be happening. (It does use more than one thread except on tiny
batches,
> but not
> > many, and I've often wondered if number of loading threads should be
> made
> > controllable.) Ticket updates don't contend for a single document, but
> rather
> > write small documents; ticket status is done by query across them.
> >
> >
> > Sent from my iPhone
> >
> > On Oct 23, 2012, at 12:35 PM, "Geert Josten" <[email protected]>
> wrote:
> >
> > > Hi Colleen,
> > >
> > > Interesting. Why doesn't the collector transaction-manager spawn the
> > > transactions, instead of invoking them? Afraid that updating the
> ticket
> > > along the way from multiple threads will interfere with each other?
> > >
> > > Kind regards,
> > > Geert
> > >
> > >> -----Oorspronkelijk bericht-----
> > >> Van: [email protected] [mailto:general-
> > >> [email protected]] Namens Colleen Whitney
> > >> Verzonden: dinsdag 23 oktober 2012 17:40
> > >> Aan: MarkLogic Developer Discussion
> > >> Onderwerp: Re: [MarkLogic Dev General] info studio using CPU
> > >>
> > >> Yes, that's right.  Transformations take advantage of the task
> server.
> > > The
> > >> collector does not.
> > >>
> > >> ________________________________________
> > >> From: [email protected] [general-
> > >> [email protected]] On Behalf Of Steiner, David J.
> > > (LNG-DAY)
> > >> [[email protected]]
> > >> Sent: Tuesday, October 23, 2012 8:29 AM
> > >> To: MarkLogic Developer Discussion
> > >> Subject: Re: [MarkLogic Dev General] info studio using CPU
> > >>
> > >> Doesn't appear that the OS is swapping.
> > >>
> > >> It appears that there are 16 task server threads.
> > >>
> > >> Upon further "watching", it appears that just the collector may not
> > > utilize
> > >> threads?  It appears that once the transforming starts, all CPUs
> become
> > >> engaged.
> > >>
> > >> David
> > >>
> > >> -----Original Message-----
> > >> From: [email protected] [mailto:general-
> > >> [email protected]] On Behalf Of Michael Blakeley
> > >> Sent: Tuesday, October 23, 2012 11:23 AM
> > >> To: MarkLogic Developer Discussion
> > >> Cc: MarkLogic Developer Discussion
> > >> Subject: Re: [MarkLogic Dev General] info studio using CPU
> > >>
> > >> Check the OS metrics. If RAM is maxed out, does that mean the OS is
> > > swapping?
> > >> If so, it's the swap disk that is the bottleneck.
> > >>
> > >> If you can't find an OS bottleneck... How many task server threads
> are
> > >> configured? I think the default is 4. Adding more threads won't
help
> if
> > > the system
> > >> is swapping or otherwise at its limits though.
> > >>
> > >> -- Mike
> > >>
> > >> On Oct 23, 2012, at 7:55, "Steiner, David J. (LNG-DAY)"
> > >> <[email protected]> wrote:
> > >>
> > >>> Using ML 6.0-1.1.
> > >>>
> > >>> In Information Studio, I'm using a CSV collector, to process
> hundreds
> > > of CSV
> > >> files.  I'm also doing a transform to pull each row out of the CSV
> and
> > > write it as
> > >> an individual document into another DB (actually, a naked property,
> but
> > > I don't
> > >> think that matters).
> > >>>
> > >>> The files are all under 50MB (wasn't sure if that 64MB limit still
> > > existed).
> > >>>
> > >>> It seems like only one CPU is being used and we have 8 available.
> RAM
> > > (24GB)
> > >> is maxed out.  It took 72 minutes to process 20 files.
> > >>>
> > >>> Is Info Studio specifically not utilizing more CPU because all of
> the
> > > RAM is
> > >> already being used?
> > >>>
> > >>> Ideally, I guess, I'd like for Info Studio to be able to take
> > > advantage of all CPUs
> > >> while ingesting.  I'm thinking the ingestion where CSV is being
> > > translated to XML
> > >> is the intense part.  The "splitting" out and "document" (property)
> > > insert
> > >> shouldn't be as intense?
> > >>>
> > >>> Thanks,
> > >>> David
> > >>> _______________________________________________
> > >>> General mailing list
> > >>> [email protected]
> > >>> http://developer.marklogic.com/mailman/listinfo/general
> > >>>
> > >> _______________________________________________
> > >> General mailing list
> > >> [email protected]
> > >> http://developer.marklogic.com/mailman/listinfo/general
> > >> _______________________________________________
> > >> General mailing list
> > >> [email protected]
> > >> http://developer.marklogic.com/mailman/listinfo/general
> > >> _______________________________________________
> > >> General mailing list
> > >> [email protected]
> > >> http://developer.marklogic.com/mailman/listinfo/general
> > > _______________________________________________
> > > General mailing list
> > > [email protected]
> > > http://developer.marklogic.com/mailman/listinfo/general
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to