Curious, from a user perspective what additional information would you like 
from MarkLogic documentation.
>From a developer perspective (buried in the trees so I dont see the forest) it 
>seems obvious to me ... but clearly not to everyone
In the docs
xdmp:filesystem-file  clearly states it only works on text files in UTF8 
encoding
xdmp:external-binary  clearly states its for binary files.

PDF is a binary type file.  In fact one should know that unless you *know for 
sure that your file is UTF8 text,
it should be treated as binary.
Without listing every possible file type in existenc, or diverting to the book 
it might take to explain the difference in filetypes, encodings, text and 
binary and unicode and UTF representations etc, both now and the imaginable 
future, ... what more would you like to see in the documentation ?



-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
[email protected]
Phone: +1 812-482-5224
Cell:  +1 812-630-7622
www.marklogic.com<http://www.marklogic.com/>


From: [email protected] 
[mailto:[email protected]] On Behalf Of Jakob Fix
Sent: Wednesday, June 12, 2013 6:46 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] problem with xdmp:filesystem-file on 
Windows

Thanks David, I didn't see the comment at the bottom of the blog post I was 
referencing (my bad). We can confirm that this works as expected.
I have added a Disqus comment to the ML6 documentation for xdmp:filesystem-file 
in that respect. Would be great if this informaton could find its way in the 
official documentation.

cheers,
Jakob.

On Wed, Jun 12, 2013 at 6:19 PM, David Lee 
<[email protected]<mailto:[email protected]>> wrote:
You need to read the file as binary not text.  The link is correct.
Use xdmp:external-binary to read binary files

https://docs.marklogic.com/xdmp:external-binary





-----------------------------------------------------------------------------
David Lee
Lead Engineer
MarkLogic Corporation
[email protected]<mailto:[email protected]>
Phone: +1 812-482-5224<tel:%2B1%20812-482-5224>
Cell:  +1 812-630-7622<tel:%2B1%20812-630-7622>
www.marklogic.com<http://www.marklogic.com/>

From: 
[email protected]<mailto:[email protected]>
 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Jakob Fix
Sent: Wednesday, June 12, 2013 12:17 PM
To: General Mark Logic Developer Discussion
Subject: [MarkLogic Dev General] problem with xdmp:filesystem-file on Windows

Hi,

we're encountering a problem reading a PDF file via xdmp:filesystem-file on a 
Windows 2008 server (EA2).

The problem also exists for ML 6 and is explained here:

http://learnxquery.blogspot.fr/2012/08/not-possible-to-read-pdf-from-file.html

We can read other files, like HTML or Excel, but not PDFs (we tried several 
sizes and PDF versions). The file seems to be truncated (we receive only about 
2kb, or even nothing.).

Stacktrace in qconsole:

[1.0-ml] XDMP-READFILE: for $r in $results -- ReadFile File is not in UTF-8: 
C:/Applications/kappav3/backend/kv3-jfix-contents-library/f-78/51758045038371473.pdf
Stack Trace
In /qconsole/endpoints/evaler.xqy on line 276
In 
local:format-eval-result(xdmp:filesystem-file("C:/Applications/kappav3/backend/kv3-jfix-contents-library/f-78/51758045038371473.pdf"))
$results := 
xdmp:filesystem-file("C:/Applications/kappav3/backend/kv3-jfix-contents-library/f-78/51758045038371473.pdf")

Thanks,
Jakob.

_______________________________________________
General mailing list
[email protected]<mailto:[email protected]>
http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to