Hi Geert,

Thanks, but this really doesn't do anything for my current problem, which is 
that I have 814 50MB files of CSV data to process.  At the moment, if I put too 
many files in the load directory, I get 'expanded tree cache' errors and the 
info studio process seems to be left in a state of un-able to complete - 
hitting the stop button does nothing.  Apparently, with 20 files, I don't get 
the error, while with 25 I do.  Incidentally, I actually have to clean up all 
of the "ticket" stuff from the App Services DB just to get the Flow to be 
usable again).

Processing 20 files at a time is a little less than optimal, since I'd like to 
just point at the directory with 814 files and let it go until it is done.

The collector and transformer are doing what I want (collector transforms CSV 
to XML and transform reads CSV-XML and sticks a naked property into an 
appropriate DB for every row in the CSV-XML, then at the end, the CSV-XML 
document is written into the DB specified in info studio.  I don't particularly 
think it will go faster if I write out my naked properties to the Fab DB and 
let info studio move them to the DB specified in the info studio setting (and 
actually, even if info studio would do that, I'd have to instead write the XML 
CSV documents to some other DB because their structure is different from the 
naked properties DB).

Thanks,
David



-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Geert Josten
Sent: Wednesday, October 24, 2012 3:48 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] info studio using CPU

Hi David,

Thought you might be interested in this blog item (and its comments)...

http://blog.davidcassel.net/2011/06/splitting-data-with-info-studio/

Kind regards,
Geert

> -----Oorspronkelijk bericht-----
> Van: [email protected] [mailto:general- 
> [email protected]] Namens Steiner, David J. (LNG-DAY)
> Verzonden: dinsdag 23 oktober 2012 17:30
> Aan: MarkLogic Developer Discussion
> Onderwerp: Re: [MarkLogic Dev General] info studio using CPU
>
> Doesn't appear that the OS is swapping.
>
> It appears that there are 16 task server threads.
>
> Upon further "watching", it appears that just the collector may not
utilize
> threads?  It appears that once the transforming starts, all CPUs 
> become engaged.
>
> David
>
> -----Original Message-----
> From: [email protected] [mailto:general- 
> [email protected]] On Behalf Of Michael Blakeley
> Sent: Tuesday, October 23, 2012 11:23 AM
> To: MarkLogic Developer Discussion
> Cc: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] info studio using CPU
>
> Check the OS metrics. If RAM is maxed out, does that mean the OS is
swapping?
> If so, it's the swap disk that is the bottleneck.
>
> If you can't find an OS bottleneck... How many task server threads are 
> configured? I think the default is 4. Adding more threads won't help 
> if
the system
> is swapping or otherwise at its limits though.
>
> -- Mike
>
> On Oct 23, 2012, at 7:55, "Steiner, David J. (LNG-DAY)"
> <[email protected]> wrote:
>
> > Using ML 6.0-1.1.
> >
> > In Information Studio, I'm using a CSV collector, to process 
> > hundreds
of CSV
> files.  I'm also doing a transform to pull each row out of the CSV and
write it as
> an individual document into another DB (actually, a naked property, 
> but
I don't
> think that matters).
> >
> > The files are all under 50MB (wasn't sure if that 64MB limit still
existed).
> >
> > It seems like only one CPU is being used and we have 8 available.  
> > RAM
(24GB)
> is maxed out.  It took 72 minutes to process 20 files.
> >
> > Is Info Studio specifically not utilizing more CPU because all of 
> > the
RAM is
> already being used?
> >
> > Ideally, I guess, I'd like for Info Studio to be able to take
advantage of all CPUs
> while ingesting.  I'm thinking the ingestion where CSV is being
translated to XML
> is the intense part.  The "splitting" out and "document" (property)
insert
> shouldn't be as intense?
> >
> > Thanks,
> > David
> > _______________________________________________
> > General mailing list
> > [email protected]
> > http://developer.marklogic.com/mailman/listinfo/general
> >
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to