xmlsh can do this easily with fixed2xml + xsplit + the marklogic module (or mix and match the output of any stage to your favorite ML loading tool like record loader or mlcp) If run entirely within xmlsh then it will all run in the same JVM as one process , and hence generally be faster.
fixed2xml - parses fixed width field files to XML http://www.xmlsh.org/CommandFixed2xml xsplit - splits up a large XML into individual files http://www.xmlsh.org/CommandXsplit MarkLogic module put command http://www.xmlsh.org/MarkLogicPut At the bottom of the put command is an example of how to stream without making temp files ... but that's an advanced topic. Its a tad easier to make the temp files then do a bulk put and clean them up after. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Michael Blakeley Sent: Tuesday, June 17, 2014 11:45 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Loading files with fixed length fields I'm not sure what's going on with InfoStudio, but you ought to be able to do that with http://marklogic.github.io/recordloader/ and a content module. The documentation for CONTENT_FACTORY_CLASSNAME explains how that interface works, and provides sample code that you could modify. Or you could write a simple HTTP endpoint and POST your file to it. That could be a REST extension, or not. -- Mike On 17 Jun 2014, at 04:59 , Ed Outhwaite <[email protected]> wrote: > Hi, > > I'm loading some text files that have fixed length fields via Information > Studio in MarkLogic 7. > > It uses an XQuery transform that should split the records via calls to > fn:substring and generate a document for each row via xdmp:document-insert, > however the only documents that are being inserted are the original text > files. > > Running a slightly different version of the XQuery script - replacing: > > let $doc := fn:doc( $cpf:document-uri ) > for $inline in fn:tokenize( $doc , "\n") > > with: > > for $inline in fn:tokenize( doc("/BS1.txt") , "\n") > > splits the loaded file and inserts the rows as documents correctly. > > Am I somehow falling over the "action modules should only modify the document > being processed" guideline ? > > ...and is there a better way to handle this type of data source ? > > Thanks, > Ed > _______________________________________________ > General mailing list > [email protected] > http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
