Thats very good input, thank you !
----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation [email protected] Phone: +1 812-482-5224 Cell: +1 812-630-7622 www.marklogic.com<http://www.marklogic.com/> From: [email protected] [mailto:[email protected]] On Behalf Of Jakob Fix Sent: Wednesday, June 12, 2013 7:06 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] problem with xdmp:filesystem-file on Windows David, I think it would be sufficient to say that filesystem-file is for text files only, although the remark on encoding could be a hint, but it's just that, a hint. Today, there is one line in the summary, that mentions very generically "reads a file": Reads a file from the filesystem. The file at the specified path must be UTF-8 encoded. Specify here that it's about text files. Add a mention of external-binary to use for reading binary files. Given the very different names (certainly justified historically), it's not easy to make that connection. Oh, and while we're on documentation, why not document filesystem-directory-create? cheers, Jakob. On Thu, Jun 13, 2013 at 12:54 AM, David Lee <[email protected]<mailto:[email protected]>> wrote: Curious, from a user perspective what additional information would you like from MarkLogic documentation. >From a developer perspective (buried in the trees so I dont see the forest) it >seems obvious to me ... but clearly not to everyone In the docs xdmp:filesystem-file clearly states it only works on text files in UTF8 encoding xdmp:external-binary clearly states its for binary files. PDF is a binary type file. In fact one should know that unless you *know for sure that your file is UTF8 text, it should be treated as binary. Without listing every possible file type in existenc, or diverting to the book it might take to explain the difference in filetypes, encodings, text and binary and unicode and UTF representations etc, both now and the imaginable future, ... what more would you like to see in the documentation ? ----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation [email protected]<mailto:[email protected]> Phone: +1 812-482-5224<tel:%2B1%20812-482-5224> Cell: +1 812-630-7622<tel:%2B1%20812-630-7622> www.marklogic.com<http://www.marklogic.com/> From: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] On Behalf Of Jakob Fix Sent: Wednesday, June 12, 2013 6:46 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] problem with xdmp:filesystem-file on Windows Thanks David, I didn't see the comment at the bottom of the blog post I was referencing (my bad). We can confirm that this works as expected. I have added a Disqus comment to the ML6 documentation for xdmp:filesystem-file in that respect. Would be great if this informaton could find its way in the official documentation. cheers, Jakob. On Wed, Jun 12, 2013 at 6:19 PM, David Lee <[email protected]<mailto:[email protected]>> wrote: You need to read the file as binary not text. The link is correct. Use xdmp:external-binary to read binary files https://docs.marklogic.com/xdmp:external-binary ----------------------------------------------------------------------------- David Lee Lead Engineer MarkLogic Corporation [email protected]<mailto:[email protected]> Phone: +1 812-482-5224<tel:%2B1%20812-482-5224> Cell: +1 812-630-7622<tel:%2B1%20812-630-7622> www.marklogic.com<http://www.marklogic.com/> From: [email protected]<mailto:[email protected]> [mailto:[email protected]<mailto:[email protected]>] On Behalf Of Jakob Fix Sent: Wednesday, June 12, 2013 12:17 PM To: General Mark Logic Developer Discussion Subject: [MarkLogic Dev General] problem with xdmp:filesystem-file on Windows Hi, we're encountering a problem reading a PDF file via xdmp:filesystem-file on a Windows 2008 server (EA2). The problem also exists for ML 6 and is explained here: http://learnxquery.blogspot.fr/2012/08/not-possible-to-read-pdf-from-file.html We can read other files, like HTML or Excel, but not PDFs (we tried several sizes and PDF versions). The file seems to be truncated (we receive only about 2kb, or even nothing.). Stacktrace in qconsole: [1.0-ml] XDMP-READFILE: for $r in $results -- ReadFile File is not in UTF-8: C:/Applications/kappav3/backend/kv3-jfix-contents-library/f-78/51758045038371473.pdf Stack Trace In /qconsole/endpoints/evaler.xqy on line 276 In local:format-eval-result(xdmp:filesystem-file("C:/Applications/kappav3/backend/kv3-jfix-contents-library/f-78/51758045038371473.pdf")) $results := xdmp:filesystem-file("C:/Applications/kappav3/backend/kv3-jfix-contents-library/f-78/51758045038371473.pdf") Thanks, Jakob. _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general _______________________________________________ General mailing list [email protected]<mailto:[email protected]> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list [email protected] http://developer.marklogic.com/mailman/listinfo/general
