Hi David, Have you tried lowering the transaction-size to 20? It is part of the collector policy, and is default set to 100. It controls how many files are loaded within one collector transaction. That way your 814 files are loaded in batches of 20 instead of 100. Essentially you are loading 100 CSV files of each 50 Mb into memory in one transaction, which probably explains why you are seeing such high memory consumption.
I wasn't saying you should write naked properties to the Fab database. Presuming you meant directly from collector that wouldn't be necessary either. You could ingest them into the target database immediately. Add a collection with the ticket id, and unload might be able to clean it up for you as well, though bit uncertain because you are talking about 'naked' properties.. Makes sense? Kind regards, Geert > -----Oorspronkelijk bericht----- > Van: [email protected] [mailto:general- > [email protected]] Namens Steiner, David J. (LNG-DAY) > Verzonden: woensdag 24 oktober 2012 15:24 > Aan: MarkLogic Developer Discussion > Onderwerp: Re: [MarkLogic Dev General] info studio using CPU > > Hi Geert, > > Thanks, but this really doesn't do anything for my current problem, which is that > I have 814 50MB files of CSV data to process. At the moment, if I put too many > files in the load directory, I get 'expanded tree cache' errors and the info studio > process seems to be left in a state of un-able to complete - hitting the stop > button does nothing. Apparently, with 20 files, I don't get the error, while with > 25 I do. Incidentally, I actually have to clean up all of the "ticket" stuff from the > App Services DB just to get the Flow to be usable again). > > Processing 20 files at a time is a little less than optimal, since I'd like to just point > at the directory with 814 files and let it go until it is done. > > The collector and transformer are doing what I want (collector transforms CSV > to XML and transform reads CSV-XML and sticks a naked property into an > appropriate DB for every row in the CSV-XML, then at the end, the CSV-XML > document is written into the DB specified in info studio. I don't particularly think > it will go faster if I write out my naked properties to the Fab DB and let info > studio move them to the DB specified in the info studio setting (and actually, > even if info studio would do that, I'd have to instead write the XML CSV > documents to some other DB because their structure is different from the naked > properties DB). > > Thanks, > David > > > > -----Original Message----- > From: [email protected] [mailto:general- > [email protected]] On Behalf Of Geert Josten > Sent: Wednesday, October 24, 2012 3:48 AM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] info studio using CPU > > Hi David, > > Thought you might be interested in this blog item (and its comments)... > > http://blog.davidcassel.net/2011/06/splitting-data-with-info-studio/ > > Kind regards, > Geert > > > -----Oorspronkelijk bericht----- > > Van: [email protected] [mailto:general- > > [email protected]] Namens Steiner, David J. (LNG-DAY) > > Verzonden: dinsdag 23 oktober 2012 17:30 > > Aan: MarkLogic Developer Discussion > > Onderwerp: Re: [MarkLogic Dev General] info studio using CPU > > > > Doesn't appear that the OS is swapping. > > > > It appears that there are 16 task server threads. > > > > Upon further "watching", it appears that just the collector may not > utilize > > threads? It appears that once the transforming starts, all CPUs > > become engaged. > > > > David > > > > -----Original Message----- > > From: [email protected] [mailto:general- > > [email protected]] On Behalf Of Michael Blakeley > > Sent: Tuesday, October 23, 2012 11:23 AM > > To: MarkLogic Developer Discussion > > Cc: MarkLogic Developer Discussion > > Subject: Re: [MarkLogic Dev General] info studio using CPU > > > > Check the OS metrics. If RAM is maxed out, does that mean the OS is > swapping? > > If so, it's the swap disk that is the bottleneck. > > > > If you can't find an OS bottleneck... How many task server threads are > > configured? I think the default is 4. Adding more threads won't help > > if > the system > > is swapping or otherwise at its limits though. > > > > -- Mike > > > > On Oct 23, 2012, at 7:55, "Steiner, David J. (LNG-DAY)" > > <[email protected]> wrote: > > > > > Using ML 6.0-1.1. > > > > > > In Information Studio, I'm using a CSV collector, to process > > > hundreds > of CSV > > files. I'm also doing a transform to pull each row out of the CSV and > write it as > > an individual document into another DB (actually, a naked property, > > but > I don't > > think that matters). > > > > > > The files are all under 50MB (wasn't sure if that 64MB limit still > existed). > > > > > > It seems like only one CPU is being used and we have 8 available. > > > RAM > (24GB) > > is maxed out. It took 72 minutes to process 20 files. > > > > > > Is Info Studio specifically not utilizing more CPU because all of > > > the > RAM is > > already being used? > > > > > > Ideally, I guess, I'd like for Info Studio to be able to take > advantage of all CPUs > > while ingesting. I'm thinking the ingestion where CSV is being > translated to XML > > is the intense part. The "splitting" out and "document" (property) > insert > > shouldn't be as intense? > > > > > > Thanks, > > > David > > > _______________________________________________ > > > General mailing list > > > [email protected] > > > http://developer.marklogic.com/mailman/listinfo/general > > > > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > > _______________________________________________ > > General mailing list > > [email protected] > > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
