If I plan to use content pump shall I overcome this approach. 

Because as our requirement we doesn't know the files type present in the zip 
files.

I understand the recordloader parameters approach. In which we can overcome 
this to load all files of xml, pdf and image



-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Thursday, April 25, 2013 8:58 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Record Loader - not ableloabinaryfile 
(pdf, images)

When you set DOCUMENT_FORMAT=binary, you are telling RecordLoader to load all 
the documents as BLOBs. Those are binary nodes, with no structure for XPath to 
work on. See http://developer.marklogic.com/blog/document-formats-part1 for 
more about MarkLogic's ability to store and work with documents in XML, binary, 
and text formats.

RecordLoader does not support multiple document types in the same operation. So 
run it multiple times. Each run can set different INPUT_PATTERN, 
DOCUMENT_FORMAT, and any other parameters.

-- Mike

On 25 Apr 2013, at 06:36 , Mohanraj Chozhan <[email protected]> 
wrote:

> Hi,
> 
> I have some xml files, jpeg and pdf files to load to Marklogic database. Also 
> once the xmls documents are loaded to the marklogic database I need to 
> perform some xpath operations. I have set  DOCUMENT_TYPE=binary and 
> INPUT_PATTERN=.+\\.(PDF|JPG|pdf|jpg|[Xx][Mm][Ll])$. Obviously the problem of 
> loading to the database got solved but I am unable to perform xpath 
> operations. Your below mail confirms that. If I use Content Pump will I be 
> able to overcome this problem? Or the only way to achieve this is to use the 
> approach you have mentioned below?
> 
> Regards,
> Mohanraj
> 
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Michael 
> Blakeley
> Sent: Saturday, April 20, 2013 2:51 AM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Record Loader - not 
> ableloabinaryfiles (pdf, images)
> 
> Hmm.... On second thought that approach won't work, because the input 
> documents are binary. The content module interface only handles XML. I 
> believe this is the first time anyone has asked for binary support.
> 
> You could try patching recordloader/xcc/XccModuleContent.java to support 
> this. I don't think XCC can handle setting binary nodes as external variables 
> for a request object. But what should work is to convert the binary node into 
> a Base64-encoded string. Then the module would have to convert it back to 
> binary, of course.
> 
> It might be a good idea to extend the content module interface with a new 
> variable too, so that the module knows what the input document-type is. 
> Strictly speaking that isn't necessary because the document-type is fixed for 
> a single invocation of RecordLoader, but it still seems like the right thing 
> to do.
> 
> -- Mike
> 
> On 19 Apr 2013, at 13:46 , Michael Blakeley <[email protected]> wrote:
> 
>> It sounds like you are looking for this:
>> 
>> CONTENT_FACTORY_CLASSNAME=com.marklogic.recordloader.xcc.XccModuleCon
>> t
>> entFactory
>> CONTENT_MODULE_URI=my-code-module.xqy
>> 
>> Note that the content module has to conform to a strict interface. See 
>> http://marklogic.github.io/recordloader/ for details and sample code.
>> 
>> -- Mike
>> 
>> On 19 Apr 2013, at 12:59 , Mohanraj Chozhan <[email protected]> 
>> wrote:
>> 
>>> Thank you very much its worked Brilliantly.
>>> 
>>> Also I need how to inject the xqy files to write my business logic to move 
>>> the documents into ML Directory from the record loader.
>>> 
>>> 
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of 
>>> Michael Blakeley
>>> Sent: Saturday, April 20, 2013 1:00 AM
>>> To: MarkLogic Developer Discussion
>>> Subject: Re: [MarkLogic Dev General] Record Loader - not able to 
>>> load binary files (pdf, images)
>>> 
>>> I asked for *full* logs: you've left out much of the interesting setup and 
>>> configuration.
>>> 
>>> However there is just enough here to see the problem: INPUT_PATTERN 
>>> isn't set, so it's using the default value, which only matching 
>>> *.xml filenames. From http://marklogic.github.io/recordloader/
>>> 
>>> Property    default value   notes
>>> INPUT_PATTERN       ^.+\\.[Xx][Mm][Ll]$     Matching pattern (regex) for 
>>> files found in INPUT_PATH. The default value matches all filenames ending 
>>> with .xml
>>> 
>>> RecordLoader isn't using your files because it's only looking for filenames 
>>> that match the regex '^.+\\.[Xx][Mm][Ll]$'. Try something like 
>>> INPUT_PATTERN=.+\\.(PDF|JPG|pdf|jpg)$ instead.
>>> 
>>> -- Mike
>>> 
>>> On 19 Apr 2013, at 12:24 , Mohanraj Chozhan <[email protected]> 
>>> wrote:
>>> 
>>>> Recordloader logs
>>>> 
>>>> Apr 20, 2013 12:51:06 AM
>>>> com.marklogic.recordloader.DefaultInputHandler configureInputs
>>>> INFO: adding D:/test/
>>>> Apr 20, 2013 12:51:06 AM com.marklogic.recordloader.LoaderFactory
>>>> <init>
>>>> INFO: Loader is com.marklogic.recordloader.FileLoader
>>>> Apr 20, 2013 12:51:06 AM
>>>> com.marklogic.recordloader.DefaultInputHandler run
>>>> INFO: populating queue
>>>> Apr 20, 2013 12:51:06 AM
>>>> com.marklogic.recordloader.DefaultInputHandler run
>>>> INFO: queued 0 loader(s)
>>>> Apr 20, 2013 12:51:06 AM com.marklogic.recordloader.Monitor halt
>>>> INFO: halting
>>>> Apr 20, 2013 12:51:06 AM com.marklogic.recordloader.Monitor run
>>>> INFO: loaded 0 records ok (0 B in 0.5075154 s, 0 tps, 0 kB/s), with
>>>> 0
>>>> error(s)
>>>> 
>>>> 
>>>> In the D:/test  test directory I have pdf and jpg files but showing 0 
>>>> files  only.
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of 
>>>> Michael Blakeley
>>>> Sent: Saturday, April 20, 2013 12:36 AM
>>>> To: MarkLogic Developer Discussion
>>>> Subject: Re: [MarkLogic Dev General] Record Loader - not able to 
>>>> load binary files (pdf, images)
>>>> 
>>>> If you can provide the full RecordLoader logs we may be able to diagnose 
>>>> the problem.
>>>> 
>>>> One potential problem I see is that your CONNECTION_STRING doesn't seem to 
>>>> have a username or password.
>>>> 
>>>> -- Mike
>>>> 
>>>> On 19 Apr 2013, at 11:36 , Mohanraj Chozhan <[email protected]> 
>>>> wrote:
>>>> 
>>>>> Which version of MarkLogic are you using?
>>>>> We are using ML6.
>>>>> 
>>>>> Which version of RecordLoader?
>>>>> 
>>>>> recordloader.jar - We can't find the version xpp3-1.1.3_8.jar
>>>>> 
>>>>> What's the exact command you're issuing to invoke the JAR?
>>>>> 
>>>>> We are using the below code to execute the recordloader
>>>>> 
>>>>> String[] args = { "resources/ recordloader.properties " };
>>>>>            try {
>>>>>                   RecordLoader.main(args);
>>>>>            } catch (Exception e) {
>>>>>                   throw new MarocMLDBException("Uploading bulk files 
>>>>> unseccessful", e);
>>>>>            }
>>>>> recordloader.properties
>>>>> 
>>>>> CONNECTION_STRING=xcc://localhost:8100/test
>>>>> INPUT_PATH=D:/test
>>>>> OUTPUT_COLLECTIONS=wikipedia
>>>>> DOCMENT_TYPE=binary
>>>>> URI_PREFIX=/FR/
>>>>> 
>>>>> What's the error message you're getting?
>>>>> XML files are getting load into ML repository.
>>>>> 
>>>>> Can you provide the MLCP Sample working , We tried and unable to use it.
>>>>> 
>>>>> That's reason we are using chosen recordloader.
>>>>> 
>>>>> It will more helpful if we get the sample of MLCP to use.
>>>>> 
>>>>> Thanks in advance
>>>>> 
>>>>> Mohanraj
>>>>> 
>>>>> From: [email protected]
>>>>> [mailto:[email protected]] On Behalf Of 
>>>>> Justin Makeig
>>>>> Sent: Friday, April 19, 2013 11:49 PM
>>>>> To: MarkLogic Developer Discussion
>>>>> Subject: Re: [MarkLogic Dev General] Record Loader - not able to 
>>>>> load binary files (pdf, images)
>>>>> 
>>>>> Which version of MarkLogic are you using? Which version of RecordLoader? 
>>>>> What's the exact command you're issuing to invoke the JAR? What's the 
>>>>> error message you're getting? If you're using MarkLogic 6, you might also 
>>>>> take a look at mlcp 
>>>>> <http://docs.marklogic.com/guide/ingestion/content-pump>.
>>>>> 
>>>>> Justin
>>>>> 
>>>>> Justin Makeig
>>>>> Director, Product Management
>>>>> MarkLogic Corporation
>>>>> [email protected]
>>>>> www.marklogic.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Apr 19, 2013, at 11:11 AM, Mohanraj Chozhan 
>>>>> <[email protected]>
>>>>> wrote:
>>>>> 
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am using the ML record Loader. But unable to load the binary files pdf, 
>>>>> images into ML repository.
>>>>> 
>>>>> In the record loader properties file we added
>>>>> 
>>>>> DOCUMENT_TYPE=binary
>>>>> 
>>>>> But still facing the issue to load it.
>>>>> 
>>>>> Can someone help me out on this.
>>>>> 
>>>>> Regards,
>>>>> Mohanraj
>>>>> **************** CAUTION - Disclaimer ***************** This 
>>>>> e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended 
>>>>> solely for the use of the addressee(s). If you are not the 
>>>>> intended recipient, please notify the sender by e-mail and delete the 
>>>>> original message.
>>>>> Further, you are not to copy, disclose, or distribute this e-mail 
>>>>> or its contents to any other person and any such actions are unlawful.
>>>>> This e-mail may contain viruses. Infosys has taken every 
>>>>> reasonable precaution to minimize this risk, but is not liable for 
>>>>> any damage you may sustain as a result of any virus in this 
>>>>> e-mail. You should carry out your own virus checks before opening the 
>>>>> e-mail or attachment.
>>>>> Infosys reserves the right to monitor and review the content of 
>>>>> all messages sent to or from this e-mail address. Messages sent to or 
>>>>> from this e-mail address may be stored on the Infosys e-mail system.
>>>>> ***INFOSYS******** End of Disclaimer ********INFOSYS*** 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>>> 
>>>>> _______________________________________________
>>>>> General mailing list
>>>>> [email protected]
>>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> 
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>>> _______________________________________________
>>>> General mailing list
>>>> [email protected]
>>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> _______________________________________________
>>> General mailing list
>>> [email protected]
>>> http://developer.marklogic.com/mailman/listinfo/general
>>> 
>> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to