Where is your bottleneck? You're going to need to provide more information about the specific problem you're seeing and what your infrastructure looks like. How are the 40M files packaged? HDFS generally does better with large aggregate files (like tables), while MarkLogic does better with small, individual records (like rows). I can't tell what you're doing or where it's going wrong from the sparse detail you've provided below.
> please can some one let me know how do I use the direct access? Direct access <http://docs.marklogic.com/guide/ingestion/content-pump#id_66213> is primarily for reading data out of offline MarkLogic archives. Direct access is unrelated to mcp's distributed mode. The latter allows you to parallelize ingestion over hosts, not just threads. On the other hand, using Direct Access, mlcp can read forests off of disk without having to go through a running MarkLogic instance. Direct Access does nothing for loading data today. The only way to create a MarkLogic forest is through a MarkLogic instance. Justin > On Sep 20, 2015, at 7:52 PM, manju rajendran <[email protected]> wrote: > > Hi > > I have a load task of 40 million files from HDFS(Hadoop) to the ML DB > > > Architecturally the ML server is configured as a file I/O server, MLCP > is loading the data in Distributed mode, the Threads used in the MLCP > (Shell scripts) did ** not ** improve performance. > > please can some one let me know how do I use the direct access?. > > Thanks > Manju Rajendran > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
