For clarity, I might need to add that I was ingesting less than 1 mln docs.
At some point merges will kick in, which likely will with 1 mln docs, and
that slows down ingestion. Just mentioned these numbers to show how fast
ingest can go in optimal conditions..

Cheers,
Geert

-----Oorspronkelijk bericht-----
Van: Geert Josten [mailto:[email protected]] 
Verzonden: dinsdag 1 juli 2014 10:13
Aan: 'MarkLogic Developer Discussion'
Onderwerp: RE: [MarkLogic Dev General] MLCP import speed improvements

For comparison,

I managed to reach up to 3000 inserts per sec on a single host with a single
forest, but I am running on SSD, and I wasn't using MLCP transforms, nor
anything else that could touch the content between read from disk and insert
into database. I likely hadn't any range indexes enabled either at that
point..

100/sec isn't bad for single host with an ordinary disk. Extra forests could
help..

Cheers,
Geert

-----Oorspronkelijk bericht-----
Van: [email protected]
[mailto:[email protected]] Namens Michael Blakeley
Verzonden: maandag 30 juni 2014 18:15
Aan: MarkLogic Developer Discussion
Onderwerp: Re: [MarkLogic Dev General] MLCP import speed improvements

It sounds like you're seeing about 100 docs/sec?

I would start by gathering more information. What version of MarkLogic is
this? What OS is it running on? What's the CPU? RAM? What's the storage
subsystem? How many forests are attached to the database?

Then gather OS-level metrics, to see if one of the subsystems is an obvious
bottleneck. For now I won't speculate beyond that.

-- Mike

On 30 Jun 2014, at 02:18 , Eugen Tautu <[email protected]> wrote:

> Hello,
> 
> I've been trying to use the MLCP tool to import about 1 million XML
documents at a time from the file system into a MarkLogic server that has
the XDBC app server installed.
> The issue I'm facing is that, while the import does work, it takes a
really long time (about 3-4 hours, even more), so my question is if there's
any way to improve the speed?
> The XML documents that I'm trying to load are small, at about 100 
> bytes
each, the largest one reaching about 1-2 KB.
> I've tried tinkering with the batch_size, transaction_size and
thread_count parameters, but it doesn't seem to help too much.
> 
> Thanks,
> Eugen Tautu
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general


_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to