Hmm.... On second thought that approach won't work, because the input documents are binary. The content module interface only handles XML. I believe this is the first time anyone has asked for binary support.
You could try patching recordloader/xcc/XccModuleContent.java to support this. I don't think XCC can handle setting binary nodes as external variables for a request object. But what should work is to convert the binary node into a Base64-encoded string. Then the module would have to convert it back to binary, of course. It might be a good idea to extend the content module interface with a new variable too, so that the module knows what the input document-type is. Strictly speaking that isn't necessary because the document-type is fixed for a single invocation of RecordLoader, but it still seems like the right thing to do. -- Mike On 19 Apr 2013, at 13:46 , Michael Blakeley <[email protected]> wrote: > It sounds like you are looking for this: > > CONTENT_FACTORY_CLASSNAME=com.marklogic.recordloader.xcc.XccModuleContentFactory > CONTENT_MODULE_URI=my-code-module.xqy > > Note that the content module has to conform to a strict interface. See > http://marklogic.github.io/recordloader/ for details and sample code. > > -- Mike > > On 19 Apr 2013, at 12:59 , Mohanraj Chozhan <[email protected]> > wrote: > >> Thank you very much its worked Brilliantly. >> >> Also I need how to inject the xqy files to write my business logic to move >> the documents into ML Directory from the record loader. >> >> >> -----Original Message----- >> From: [email protected] >> [mailto:[email protected]] On Behalf Of Michael >> Blakeley >> Sent: Saturday, April 20, 2013 1:00 AM >> To: MarkLogic Developer Discussion >> Subject: Re: [MarkLogic Dev General] Record Loader - not able to load binary >> files (pdf, images) >> >> I asked for *full* logs: you've left out much of the interesting setup and >> configuration. >> >> However there is just enough here to see the problem: INPUT_PATTERN isn't >> set, so it's using the default value, which only matching *.xml filenames. >> From http://marklogic.github.io/recordloader/ >> >> Property default value notes >> INPUT_PATTERN ^.+\\.[Xx][Mm][Ll]$ Matching pattern (regex) for >> files found in INPUT_PATH. The default value matches all filenames ending >> with .xml >> >> RecordLoader isn't using your files because it's only looking for filenames >> that match the regex '^.+\\.[Xx][Mm][Ll]$'. Try something like >> INPUT_PATTERN=.+\\.(PDF|JPG|pdf|jpg)$ instead. >> >> -- Mike >> >> On 19 Apr 2013, at 12:24 , Mohanraj Chozhan <[email protected]> >> wrote: >> >>> Recordloader logs >>> >>> Apr 20, 2013 12:51:06 AM >>> com.marklogic.recordloader.DefaultInputHandler configureInputs >>> INFO: adding D:/test/ >>> Apr 20, 2013 12:51:06 AM com.marklogic.recordloader.LoaderFactory >>> <init> >>> INFO: Loader is com.marklogic.recordloader.FileLoader >>> Apr 20, 2013 12:51:06 AM >>> com.marklogic.recordloader.DefaultInputHandler run >>> INFO: populating queue >>> Apr 20, 2013 12:51:06 AM >>> com.marklogic.recordloader.DefaultInputHandler run >>> INFO: queued 0 loader(s) >>> Apr 20, 2013 12:51:06 AM com.marklogic.recordloader.Monitor halt >>> INFO: halting >>> Apr 20, 2013 12:51:06 AM com.marklogic.recordloader.Monitor run >>> INFO: loaded 0 records ok (0 B in 0.5075154 s, 0 tps, 0 kB/s), with 0 >>> error(s) >>> >>> >>> In the D:/test test directory I have pdf and jpg files but showing 0 files >>> only. >>> >>> >>> >>> -----Original Message----- >>> From: [email protected] >>> [mailto:[email protected]] On Behalf Of Michael >>> Blakeley >>> Sent: Saturday, April 20, 2013 12:36 AM >>> To: MarkLogic Developer Discussion >>> Subject: Re: [MarkLogic Dev General] Record Loader - not able to load >>> binary files (pdf, images) >>> >>> If you can provide the full RecordLoader logs we may be able to diagnose >>> the problem. >>> >>> One potential problem I see is that your CONNECTION_STRING doesn't seem to >>> have a username or password. >>> >>> -- Mike >>> >>> On 19 Apr 2013, at 11:36 , Mohanraj Chozhan <[email protected]> >>> wrote: >>> >>>> Which version of MarkLogic are you using? >>>> We are using ML6. >>>> >>>> Which version of RecordLoader? >>>> >>>> recordloader.jar - We can't find the version xpp3-1.1.3_8.jar >>>> >>>> What's the exact command you're issuing to invoke the JAR? >>>> >>>> We are using the below code to execute the recordloader >>>> >>>> String[] args = { "resources/ recordloader.properties " }; >>>> try { >>>> RecordLoader.main(args); >>>> } catch (Exception e) { >>>> throw new MarocMLDBException("Uploading bulk files >>>> unseccessful", e); >>>> } >>>> recordloader.properties >>>> >>>> CONNECTION_STRING=xcc://localhost:8100/test >>>> INPUT_PATH=D:/test >>>> OUTPUT_COLLECTIONS=wikipedia >>>> DOCMENT_TYPE=binary >>>> URI_PREFIX=/FR/ >>>> >>>> What's the error message you're getting? >>>> XML files are getting load into ML repository. >>>> >>>> Can you provide the MLCP Sample working , We tried and unable to use it. >>>> >>>> That's reason we are using chosen recordloader. >>>> >>>> It will more helpful if we get the sample of MLCP to use. >>>> >>>> Thanks in advance >>>> >>>> Mohanraj >>>> >>>> From: [email protected] >>>> [mailto:[email protected]] On Behalf Of Justin >>>> Makeig >>>> Sent: Friday, April 19, 2013 11:49 PM >>>> To: MarkLogic Developer Discussion >>>> Subject: Re: [MarkLogic Dev General] Record Loader - not able to >>>> load binary files (pdf, images) >>>> >>>> Which version of MarkLogic are you using? Which version of RecordLoader? >>>> What's the exact command you're issuing to invoke the JAR? What's the >>>> error message you're getting? If you're using MarkLogic 6, you might also >>>> take a look at mlcp >>>> <http://docs.marklogic.com/guide/ingestion/content-pump>. >>>> >>>> Justin >>>> >>>> Justin Makeig >>>> Director, Product Management >>>> MarkLogic Corporation >>>> [email protected] >>>> www.marklogic.com >>>> >>>> >>>> >>>> >>>> On Apr 19, 2013, at 11:11 AM, Mohanraj Chozhan >>>> <[email protected]> >>>> wrote: >>>> >>>> >>>> Hi, >>>> >>>> I am using the ML record Loader. But unable to load the binary files pdf, >>>> images into ML repository. >>>> >>>> In the record loader properties file we added >>>> >>>> DOCUMENT_TYPE=binary >>>> >>>> But still facing the issue to load it. >>>> >>>> Can someone help me out on this. >>>> >>>> Regards, >>>> Mohanraj >>>> **************** CAUTION - Disclaimer ***************** This e-mail >>>> contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for >>>> the use of the addressee(s). If you are not the intended recipient, >>>> please notify the sender by e-mail and delete the original message. >>>> Further, you are not to copy, disclose, or distribute this e-mail or >>>> its contents to any other person and any such actions are unlawful. >>>> This e-mail may contain viruses. Infosys has taken every reasonable >>>> precaution to minimize this risk, but is not liable for any damage >>>> you may sustain as a result of any virus in this e-mail. You should >>>> carry out your own virus checks before opening the e-mail or attachment. >>>> Infosys reserves the right to monitor and review the content of all >>>> messages sent to or from this e-mail address. Messages sent to or from >>>> this e-mail address may be stored on the Infosys e-mail system. >>>> ***INFOSYS******** End of Disclaimer ********INFOSYS*** >>>> _______________________________________________ >>>> General mailing list >>>> [email protected] >>>> http://developer.marklogic.com/mailman/listinfo/general >>>> >>>> _______________________________________________ >>>> General mailing list >>>> [email protected] >>>> http://developer.marklogic.com/mailman/listinfo/general >>> >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >>> _______________________________________________ >>> General mailing list >>> [email protected] >>> http://developer.marklogic.com/mailman/listinfo/general >> >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> _______________________________________________ >> General mailing list >> [email protected] >> http://developer.marklogic.com/mailman/listinfo/general >> > _______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
