Got it, thanks.  

I know you worked your way around it by reducing the size of the files.   
Another approach that might work is to load the CSV files as-is and do the 
splitting as a first step in the processing pipeline.  There's a little 
housekeeping you'd need to do on the resulting documents to make sure they get 
propagated into the target database (see 
http://blog.davidcassel.net/2011/06/splitting-data-with-info-studio/).  

--Colleen

________________________________________
From: [email protected] 
[[email protected]] On Behalf Of Steiner, David J. 
(LNG-DAY) [[email protected]]
Sent: Thursday, October 25, 2012 5:27 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] info studio using CPU

Colleen, it didn't. :-(

I'll send you a separate note...

David

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Colleen Whitney
Sent: Wednesday, October 24, 2012 12:22 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] info studio using CPU

David,

Thanks, it would be good to know if reducing the number of docs per transaction 
solves the problem.  If not, I can file a bug on your behalf if you like, but I 
think it might make more sense for you to open a support ticket so that 
engineering staff can reproduce and address the problem systematically.

--Colleen

________________________________________
From: [email protected] 
[[email protected]] On Behalf Of Steiner, David J. 
(LNG-DAY) [[email protected]]
Sent: Wednesday, October 24, 2012 9:24 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] info studio using CPU

Colleen,

OK.  I will try changing the # of docs setting once I clean up the latest 
errors.

As a note, I think there is an issue when one is using a collector like a CSV 
collector that makes a bigger doc from the input - perhaps bigger than 64MB.  
I'm getting these messages in the ErrorLog.txt, but no error is appearing in 
Info Studio.  The process in Info Studio just keeps running.  I have to remove 
the "ticket" docs in App Services just to get control of the Flow back.
2012-10-24 09:29:49.285 Notice: TaskServer: XDMP-EXPNTREECACHEFULL: 
fn:doc("/14974109146499330104/13438170693114125278//csv/filename_84.x...") -- 
Expanded tree cache full on host ilabsmltest.legal.regn.net
2012-10-24 09:29:49.285 Notice: TaskServer:   $e = <error:error 
xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:error="http://marklogic.com/xdmp/error";><error:code>XDMP-EXPNTREECACHEFULL</error:code><error:name/><err...</error:error>
2012-10-24 09:29:50.019 Notice: TaskServer: XDMP-EXPNTREECACHEFULL: 
fn:doc("/14974109146499330104/13438170693114125278//csv/filename_85.x...") -- 
Expanded tree cache full on host ilabsmltest.legal.regn.net
2012-10-24 09:29:50.019 Notice: TaskServer:   $e = <error:error 
xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:error="http://marklogic.com/xdmp/error";><error:code>XDMP-EXPNTREECACHEFULL</error:code><error:name/><err...</error:error>
2012-10-24 10:08:18.411 Notice: TaskServer: XDMP-EXPNTREECACHEFULL: 
fn:doc("/14974109146499330104/13438170693114125278//csv/filename_91.x...") -- 
Expanded tree cache full on host ilabsmltest.legal.regn.net
2012-10-24 10:08:18.411 Notice: TaskServer:   $e = <error:error 
xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:error="http://marklogic.com/xdmp/error";><error:code>XDMP-EXPNTREECACHEFULL</error:code><error:name/><err...</error:error>
2012-10-24 10:08:20.820 Notice: TaskServer: XDMP-EXPNTREECACHEFULL: 
fn:doc("/14974109146499330104/13438170693114125278//csv/filename_79.x...") -- 
Expanded tree cache full on host ilabsmltest.legal.regn.net
2012-10-24 10:08:20.820 Notice: TaskServer:   $e = <error:error 
xsi:schemaLocation="http://marklogic.com/xdmp/error error.xsd" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:error="http://marklogic.com/xdmp/error";><error:code>XDMP-EXPNTREECACHEFULL</error:code><error:name/><err...</error:error>

David

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Colleen Whitney
Sent: Wednesday, October 24, 2012 9:51 AM
To: MarkLogic Developer Discussion
Cc: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] info studio using CPU

David, this might sound counter-intuitive, but if you set the number of 
documents per transaction to something small (5, or even 1), you should be able 
to avoid the tree cache full error, and just point and press start instead of 
fiddling with directories. I think it's trying to work with too many large 
documents in memory at once.

Sent from my iPhone

On Oct 24, 2012, at 6:20 AM, "Steiner, David J. (LNG-DAY)" 
<[email protected]> wrote:

> Hi Geert,
>
> Thanks, but this really doesn't do anything for my current problem, which is 
> that I have 814 50MB files of CSV data to process.  At the moment, if I put 
> too many files in the load directory, I get 'expanded tree cache' errors and 
> the info studio process seems to be left in a state of un-able to complete - 
> hitting the stop button does nothing.  Apparently, with 20 files, I don't get 
> the error, while with 25 I do.  Incidentally, I actually have to clean up all 
> of the "ticket" stuff from the App Services DB just to get the Flow to be 
> usable again).
>
> Processing 20 files at a time is a little less than optimal, since I'd like 
> to just point at the directory with 814 files and let it go until it is done.
>
> The collector and transformer are doing what I want (collector transforms CSV 
> to XML and transform reads CSV-XML and sticks a naked property into an 
> appropriate DB for every row in the CSV-XML, then at the end, the CSV-XML 
> document is written into the DB specified in info studio.  I don't 
> particularly think it will go faster if I write out my naked properties to 
> the Fab DB and let info studio move them to the DB specified in the info 
> studio setting (and actually, even if info studio would do that, I'd have to 
> instead write the XML CSV documents to some other DB because their structure 
> is different from the naked properties DB).
>
> Thanks,
> David
>
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Geert
> Josten
> Sent: Wednesday, October 24, 2012 3:48 AM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] info studio using CPU
>
> Hi David,
>
> Thought you might be interested in this blog item (and its comments)...
>
> http://blog.davidcassel.net/2011/06/splitting-data-with-info-studio/
>
> Kind regards,
> Geert
>
>> -----Oorspronkelijk bericht-----
>> Van: [email protected] [mailto:general-
>> [email protected]] Namens Steiner, David J. (LNG-DAY)
>> Verzonden: dinsdag 23 oktober 2012 17:30
>> Aan: MarkLogic Developer Discussion
>> Onderwerp: Re: [MarkLogic Dev General] info studio using CPU
>>
>> Doesn't appear that the OS is swapping.
>>
>> It appears that there are 16 task server threads.
>>
>> Upon further "watching", it appears that just the collector may not
> utilize
>> threads?  It appears that once the transforming starts, all CPUs
>> become engaged.
>>
>> David
>>
>> -----Original Message-----
>> From: [email protected] [mailto:general-
>> [email protected]] On Behalf Of Michael Blakeley
>> Sent: Tuesday, October 23, 2012 11:23 AM
>> To: MarkLogic Developer Discussion
>> Cc: MarkLogic Developer Discussion
>> Subject: Re: [MarkLogic Dev General] info studio using CPU
>>
>> Check the OS metrics. If RAM is maxed out, does that mean the OS is
> swapping?
>> If so, it's the swap disk that is the bottleneck.
>>
>> If you can't find an OS bottleneck... How many task server threads
>> are configured? I think the default is 4. Adding more threads won't
>> help if
> the system
>> is swapping or otherwise at its limits though.
>>
>> -- Mike
>>
>> On Oct 23, 2012, at 7:55, "Steiner, David J. (LNG-DAY)"
>> <[email protected]> wrote:
>>
>>> Using ML 6.0-1.1.
>>>
>>> In Information Studio, I'm using a CSV collector, to process
>>> hundreds
> of CSV
>> files.  I'm also doing a transform to pull each row out of the CSV
>> and
> write it as
>> an individual document into another DB (actually, a naked property,
>> but
> I don't
>> think that matters).
>>>
>>> The files are all under 50MB (wasn't sure if that 64MB limit still
> existed).
>>>
>>> It seems like only one CPU is being used and we have 8 available.
>>> RAM
> (24GB)
>> is maxed out.  It took 72 minutes to process 20 files.
>>>
>>> Is Info Studio specifically not utilizing more CPU because all of
>>> the
> RAM is
>> already being used?
>>>
>>> Ideally, I guess, I'd like for Info Studio to be able to take
> advantage of all CPUs
>> while ingesting.  I'm thinking the ingestion where CSV is being
> translated to XML
>> is the intense part.  The "splitting" out and "document" (property)
> insert
>> shouldn't be as intense?
>>>
>>> Thanks,
>>> David
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to