Where is your bottleneck? You're going to need to provide more information 
about the specific problem you're seeing and what your infrastructure looks 
like. How are the 40M files packaged? HDFS generally does better with large 
aggregate files (like tables), while MarkLogic does better with small, 
individual records (like rows). I can't tell what you're doing or where it's 
going wrong from the sparse detail you've provided below.

> please can some one  let me know how do I  use the  direct access?

Direct access <http://docs.marklogic.com/guide/ingestion/content-pump#id_66213> 
is primarily for reading data out of offline MarkLogic archives. Direct access 
is unrelated to mcp's distributed mode. The latter allows you to parallelize 
ingestion over hosts, not just threads. On the other hand, using Direct Access, 
mlcp can read forests off of disk without having to go through a running 
MarkLogic instance. Direct Access does nothing for loading data today. The only 
way to create a MarkLogic forest is through a MarkLogic instance.

Justin


> On Sep 20, 2015, at 7:52 PM, manju rajendran <[email protected]> wrote:
> 
> Hi
> 
> I have a  load task   of  40 million files from    HDFS(Hadoop)  to the  ML DB
> 
> 
> Architecturally  the  ML server is  configured as a  file I/O server,  MLCP  
> is loading the   data in  Distributed mode, the  Threads  used in the  MLCP 
> (Shell scripts)  did ** not ** improve performance.
> 
> please can some one  let me know how do I  use the  direct access?.
> 
> Thanks
> Manju Rajendran
> 
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at: 
> http://developer.marklogic.com/mailman/listinfo/general

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to