Hi Geert, Thanks, but this really doesn't do anything for my current problem, which is that I have 814 50MB files of CSV data to process. At the moment, if I put too many files in the load directory, I get 'expanded tree cache' errors and the info studio process seems to be left in a state of un-able to complete - hitting the stop button does nothing. Apparently, with 20 files, I don't get the error, while with 25 I do. Incidentally, I actually have to clean up all of the "ticket" stuff from the App Services DB just to get the Flow to be usable again).
Processing 20 files at a time is a little less than optimal, since I'd like to just point at the directory with 814 files and let it go until it is done. The collector and transformer are doing what I want (collector transforms CSV to XML and transform reads CSV-XML and sticks a naked property into an appropriate DB for every row in the CSV-XML, then at the end, the CSV-XML document is written into the DB specified in info studio. I don't particularly think it will go faster if I write out my naked properties to the Fab DB and let info studio move them to the DB specified in the info studio setting (and actually, even if info studio would do that, I'd have to instead write the XML CSV documents to some other DB because their structure is different from the naked properties DB). Thanks, David -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Geert Josten Sent: Wednesday, October 24, 2012 3:48 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] info studio using CPU Hi David, Thought you might be interested in this blog item (and its comments)... http://blog.davidcassel.net/2011/06/splitting-data-with-info-studio/ Kind regards, Geert > -----Oorspronkelijk bericht----- > Van: [email protected] [mailto:general- > [email protected]] Namens Steiner, David J. (LNG-DAY) > Verzonden: dinsdag 23 oktober 2012 17:30 > Aan: MarkLogic Developer Discussion > Onderwerp: Re: [MarkLogic Dev General] info studio using CPU > > Doesn't appear that the OS is swapping. > > It appears that there are 16 task server threads. > > Upon further "watching", it appears that just the collector may not utilize > threads? It appears that once the transforming starts, all CPUs > become engaged. > > David > > -----Original Message----- > From: [email protected] [mailto:general- > [email protected]] On Behalf Of Michael Blakeley > Sent: Tuesday, October 23, 2012 11:23 AM > To: MarkLogic Developer Discussion > Cc: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] info studio using CPU > > Check the OS metrics. If RAM is maxed out, does that mean the OS is swapping? > If so, it's the swap disk that is the bottleneck. > > If you can't find an OS bottleneck... How many task server threads are > configured? I think the default is 4. Adding more threads won't help > if the system > is swapping or otherwise at its limits though. > > -- Mike > > On Oct 23, 2012, at 7:55, "Steiner, David J. (LNG-DAY)" > <[email protected]> wrote: > > > Using ML 6.0-1.1. > > > > In Information Studio, I'm using a CSV collector, to process > > hundreds of CSV > files. I'm also doing a transform to pull each row out of the CSV and write it as > an individual document into another DB (actually, a naked property, > but I don't > think that matters). > > > > The files are all under 50MB (wasn't sure if that 64MB limit still existed). > > > > It seems like only one CPU is being used and we have 8 available. > > RAM (24GB) > is maxed out. It took 72 minutes to process 20 files. > > > > Is Info Studio specifically not utilizing more CPU because all of > > the RAM is > already being used? > > > > Ideally, I guess, I'd like for Info Studio to be able to take advantage of all CPUs > while ingesting. I'm thinking the ingestion where CSV is being translated to XML > is the intense part. The "splitting" out and "document" (property) insert > shouldn't be as intense? > > > > Thanks, > > David > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
