Re: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion Performance?

Justin Makeig Tue, 14 Oct 2014 14:00:59 -0700

The bulk API does not spawn tasks in the Server. It allows you to send groups 
of documents together in a single request. (The implementation is actually HTTP 
multipart.) An E-node can, of course, handle multiple simultaneous requests, 
bulk or not.


Why spawn in the Server? Without knowing much more about what the bottlenecks 
actually are, I'd start by spawning threads in C#, similar the way that 
something like mlcp or Corb does. It sounds like you have some headroom in 
MarkLogic so you should be able to throw more work at it from the client. 

Justin


On Oct 14, 2014, at 12:16 PM, Gary Russo <[email protected]> wrote:

> Hello Danny,
>  
> Yes, I’m using 7.0-4.
>  
> >> What are you comparing it to on the Oracle side?
> >> In MarkLogic, the content will be all indexed and searchable.  Is that 
> >> true on the orcl side too
>  
> The Oracle side is doing a basic CLOB insert with no indexing.
>  
> The Oracle server being compared to is a higher capacity system so we 
> expected to see a faster ingestion.
>  
> I didn’t expect the MarkLogic side to be 4 times slower.
>  
> Yes, we tried tweaking the batch size. The 500 batch size had the fastest 
> load times.
>  
> I will investigate further but I believe the bottleneck is on the MarkLogic 
> side.
>  
> I believe the MarkLogic CPU has some room for parallelizing.
>  
> I’ll create a custom REST Extension that will spawn multiple threads for the 
> doc-inserts.
>  
> I assume the REST API bulk ingestion already does this but I can’t say for 
> sure.
>  
> I’ll keep you posted.
>  
> Thanks Danny
>  
> -          Gary R
>  
>  
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Danny Sokolsky
> Sent: Tuesday, October 14, 2014 2:00 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] How to optimize the REST API Bulk 
> Ingestion Performance?
>  
> Hi Gary,
>  
> A few thoughts here.  You are using 7.0-4 on this? 
>  
> What are you comparing it to on the Oracle side?  In MarkLogic, the content 
> will be all indexed and searchable.  Is that true on the orcl side too?
>  
> What indexes to you have enabled?  Maybe you do not need them all (or maybe 
> you should put the equivalent indexing on the orcl side)?
>  
> Have you tried tweaking the batch size?  I would try a smaller number, say 50 
> or 100.
>  
> Have you analyzed where you are spending the time?  In the c# code?  In the 
> code loading the doc on MarkLogic?
>  
> Do you have multiple threads loading from your .net program?  If you are not 
> maxing out your cpu on the MarkLogic side, you probably have room for more 
> parallelization.
>  
> -Danny
>  
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Gary Russo
> Sent: Tuesday, October 14, 2014 9:21 AM
> To: [email protected]
> Subject: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion 
> Performance?
>  
> MarkLogic Bulk ingestion processing is slower than an equivalent Oracle 
> ingestion process.
>  
> The MarkLogic ingestion takes 30 minutes. An Oracle equivalent only takes 7 
> minutes.
>  
> I’m using the REST API to bulk ingest multiple documents as described here. 
> => http://docs.marklogic.com/guide/rest-dev/bulk#id_54649
>  
> Notes:
> ·         C# code is used to call the MarkLogic Bulk Ingest REST API.
> ·         Document batch size used is 500.
> ·         Average doc size is 1 KB.
> ·         JSON Conversion and Validation logic occurs in the C# code.
>  
>  
> Any thoughts on how to optimize the MarkLogic bulk ingest to make it as fast 
> as Oracle’s 7 minute load time?
>  
>  
> Thanks,
> Gary R
>  
>  
> Gary Russo
> Enterprise NoSQL Developer
> http://garyrusso.wordpress.com
>  
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Re: [MarkLogic Dev General] How to optimize the REST API Bulk Ingestion Performance?

Reply via email to