Hi,
When we are trying to upload pdf and image its working perfectly in ML5.
Same I am trying to upload into ML6 it's not uploading. Throwing document is
not UTF-8 encoded.
RecordLoader Properties file
CONNECTION_STRING=xcc://test:test@localhost:8010/test
INPUT_STRIP_PREFIX=^[A-Z]:
INPUT_NORMALIZE_PATHS=true
INPUT_PATH=D:/test/
OUTPUT_COLLECTIONS=wikipedia
URI_PREFIX=/
INPUT_PATTERN=.+\\.(PDF|JPG|pdf|jpg|JPEG|jpeg)$
Now getting this error:
Apr 23, 2013 7:08:50 PM com.marklogic.ps.SimpleLogger logException
SEVERE: com.marklogic.recordloader.LoaderException:
/test/D:\test\extractedFiles\test.jpg
com.marklogic.xcc.exceptions.XQueryException: XDMP-DOCUTF8SEQ: Invalid UTF-8
escape sequence at / /test/D:\test\extractedFiles\APP_APP1102.jpg line 1 --
document is not UTF-8 encoded
[Client: XCC/6.0-2, Server: XDBC/6.0-2.2]
at
com.marklogic.xcc.impl.handlers.ServerExceptionHandler.handleResponse(ServerExceptionHandler.java:34)
at
com.marklogic.xcc.impl.handlers.ContentInsertController.serverDialog(ContentInsertController.java:139)
at
com.marklogic.xcc.impl.handlers.AbstractRequestController.runRequest(AbstractRequestController.java:84)
at
com.marklogic.xcc.impl.SessionImpl.insertContent(SessionImpl.java:309)
at
com.marklogic.xcc.impl.SessionImpl.insertContent(SessionImpl.java:274)
at
com.marklogic.xcc.impl.SessionImpl.insertContent(SessionImpl.java:338)
at
com.marklogic.recordloader.xcc.XccContent.insert(XccContent.java:74)
at
com.marklogic.recordloader.AbstractLoader.insert(AbstractLoader.java:326)
at
com.marklogic.recordloader.FileLoader.process(FileLoader.java:60)
at
com.marklogic.recordloader.AbstractLoader.call(AbstractLoader.java:96)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Apr 23, 2013 7:08:50 PM com.marklogic.recordloader.Monitor halt
INFO: halting
Apr 23, 2013 7:08:50 PM com.marklogic.recordloader.Monitor run
INFO: loaded 1 records ok (15831 B in 1.868895839 s, 1 tps, 8 kB/s), with 0
error(s)
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Michael Blakeley
Sent: Saturday, April 20, 2013 1:00 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Record Loader - not able to load binary
files (pdf, images)
I asked for *full* logs: you've left out much of the interesting setup and
configuration.
However there is just enough here to see the problem: INPUT_PATTERN isn't set,
so it's using the default value, which only matching *.xml filenames. From
http://marklogic.github.io/recordloader/
Property default value notes
INPUT_PATTERN ^.+\\.[Xx][Mm][Ll]$ Matching pattern (regex)
for files found in INPUT_PATH. The default value matches all filenames ending
with .xml
RecordLoader isn't using your files because it's only looking for filenames
that match the regex '^.+\\.[Xx][Mm][Ll]$'. Try something like
INPUT_PATTERN=.+\\.(PDF|JPG|pdf|jpg)$ instead.
-- Mike
On 19 Apr 2013, at 12:24 , Mohanraj Chozhan
<[email protected]<mailto:[email protected]>> wrote:
> Recordloader logs
>
> Apr 20, 2013 12:51:06 AM
> com.marklogic.recordloader.DefaultInputHandler configureInputs
> INFO: adding D:/test/
> Apr 20, 2013 12:51:06 AM com.marklogic.recordloader.LoaderFactory
> <init>
> INFO: Loader is com.marklogic.recordloader.FileLoader
> Apr 20, 2013 12:51:06 AM
> com.marklogic.recordloader.DefaultInputHandler run
> INFO: populating queue
> Apr 20, 2013 12:51:06 AM
> com.marklogic.recordloader.DefaultInputHandler run
> INFO: queued 0 loader(s)
> Apr 20, 2013 12:51:06 AM com.marklogic.recordloader.Monitor halt
> INFO: halting
> Apr 20, 2013 12:51:06 AM com.marklogic.recordloader.Monitor run
> INFO: loaded 0 records ok (0 B in 0.5075154 s, 0 tps, 0 kB/s), with 0
> error(s)
>
>
> In the D:/test test directory I have pdf and jpg files but showing 0 files
> only.
>
>
>
> -----Original Message-----
> From:
> [email protected]<mailto:[email protected]>
> [mailto:[email protected]] On Behalf Of Michael
> Blakeley
> Sent: Saturday, April 20, 2013 12:36 AM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Record Loader - not able to load
> binary files (pdf, images)
>
> If you can provide the full RecordLoader logs we may be able to diagnose the
> problem.
>
> One potential problem I see is that your CONNECTION_STRING doesn't seem to
> have a username or password.
>
> -- Mike
>
> On 19 Apr 2013, at 11:36 , Mohanraj Chozhan
> <[email protected]<mailto:[email protected]>> wrote:
>
> > Which version of MarkLogic are you using?
> > We are using ML6.
> >
> > Which version of RecordLoader?
> >
> > recordloader.jar - We can't find the version xpp3-1.1.3_8.jar
> >
> > What's the exact command you're issuing to invoke the JAR?
> >
> > We are using the below code to execute the recordloader
> >
> > String[] args = { "resources/ recordloader.properties " };
> > try {
> > RecordLoader.main(args);
> > } catch (Exception e) {
> > throw new MarocMLDBException("Uploading bulk files
> > unseccessful", e);
> > }
> > recordloader.properties
> >
> > CONNECTION_STRING=xcc://localhost:8100/test
> > INPUT_PATH=D:/test
> > OUTPUT_COLLECTIONS=wikipedia
> > DOCMENT_TYPE=binary
> > URI_PREFIX=/FR/
> >
> > What's the error message you're getting?
> > XML files are getting load into ML repository.
> >
> > Can you provide the MLCP Sample working , We tried and unable to use it.
> >
> > That's reason we are using chosen recordloader.
> >
> > It will more helpful if we get the sample of MLCP to use.
> >
> > Thanks in advance
> >
> > Mohanraj
> >
> > From:
> > [email protected]<mailto:[email protected]>
> > [mailto:[email protected]] On Behalf Of Justin
> > Makeig
> > Sent: Friday, April 19, 2013 11:49 PM
> > To: MarkLogic Developer Discussion
> > Subject: Re: [MarkLogic Dev General] Record Loader - not able to
> > load binary files (pdf, images)
> >
> > Which version of MarkLogic are you using? Which version of RecordLoader?
> > What's the exact command you're issuing to invoke the JAR? What's the error
> > message you're getting? If you're using MarkLogic 6, you might also take a
> > look at mlcp <http://docs.marklogic.com/guide/ingestion/content-pump>.
> >
> > Justin
> >
> > Justin Makeig
> > Director, Product Management
> > MarkLogic Corporation
> > [email protected]<mailto:[email protected]>
> > www.marklogic.com<http://www.marklogic.com>
> >
> >
> >
> >
> > On Apr 19, 2013, at 11:11 AM, Mohanraj Chozhan
> > <[email protected]<mailto:[email protected]>>
> > wrote:
> >
> >
> > Hi,
> >
> > I am using the ML record Loader. But unable to load the binary files pdf,
> > images into ML repository.
> >
> > In the record loader properties file we added
> >
> > DOCUMENT_TYPE=binary
> >
> > But still facing the issue to load it.
> >
> > Can someone help me out on this.
> >
> > Regards,
> > Mohanraj
> > **************** CAUTION - Disclaimer ***************** This e-mail
> > contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for
> > the use of the addressee(s). If you are not the intended recipient,
> > please notify the sender by e-mail and delete the original message.
> > Further, you are not to copy, disclose, or distribute this e-mail or
> > its contents to any other person and any such actions are unlawful.
> > This e-mail may contain viruses. Infosys has taken every reasonable
> > precaution to minimize this risk, but is not liable for any damage
> > you may sustain as a result of any virus in this e-mail. You should
> > carry out your own virus checks before opening the e-mail or attachment.
> > Infosys reserves the right to monitor and review the content of all
> > messages sent to or from this e-mail address. Messages sent to or from this
> > e-mail address may be stored on the Infosys e-mail system.
> > ***INFOSYS******** End of Disclaimer ********INFOSYS***
> > _______________________________________________
> > General mailing list
> > [email protected]<mailto:[email protected]>
> > http://developer.marklogic.com/mailman/listinfo/general
> >
> > _______________________________________________
> > General mailing list
> > [email protected]<mailto:[email protected]>
> > http://developer.marklogic.com/mailman/listinfo/general
>
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]<mailto:[email protected]>
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general