Hi all, Christian: I completely agree, CSV is a nightmare. One way to reduce the headaches (in, say, developing an EXPath CSV library) might be to require that CSV pass validation by a tool such as http://digital-preservation.github.io/csv-validator/. Adam Retter presented his work on CSV Schema and CSV Validator at http://slides.com/adamretter/csv-validation. This might require the user to fix issues in the CSV first, but would reduce the scope of variation considerably. I notice that the Jackson CSV parser leverages the notion of a schema in its imports: https://github.com/FasterXML/jackson-dataformat-csv.
Hans-Jürgen: Thanks for the pointer to your library - it looks fantastic. I look forward to trying it out. Liam: Thanks for the info about XQuery's additional regex handling beyond XSD. And, lastly, to keep this post still basex related... Christian: I tried removing the quote escaping but still get an error. Here's a small test to reproduce: xquery version "3.1"; let $row := '"Larry Bossidy, Ram Charan, Charles Burck",Execution,9780609610572,Hardcover,2002' return fn:analyze-string($row, '(?:\s*(?:"([^"]*)"|([^,]+))\s*,?|(?<=,)(),?)+?') Joe On Mon, Sep 12, 2016 at 7:29 AM, Christian Grün <christian.gr...@gmail.com> wrote: > I didn’t check the regex in general, but one reason I think why it > fails is the escaped quote. For example, the following query is > illegal in XQuery 3.1… > > matches('a"b', 'a\"b') > > …where as the following one is ok: > > matches('a"b', 'a"b') > > > > On Mon, Sep 12, 2016 at 1:15 PM, Hans-Juergen Rennau <hren...@yahoo.de> wrote: >> Cordial thanks, Liam - I was not aware of that! >> >> @Joe: Rule of life: when one is especially sure to be right, one is surely >> wrong, and so was I, and right were you(r first two characters). >> >> >> Liam R. E. Quin <l...@w3.org> schrieb am 5:54 Montag, 12.September 2016: >> >> >> Hans-Jürgen, wrote: >> >> ! Already the first >>> two characters >>> (?render the expression invalid:(1) An unescaped ? is an >>> occurrence indicator, making the preceding entity optional(2) An >>> unescaped ( is used for grouping, it does not repesent anything >>> => there is no entity preceding the ? which the ? could make optional >>> => error >> >> >> Actually (?: .... ) is a non-capturing group, defined in XPath 3.0 and >> XQuery 3.0, based on the same syntax in other languages. >> >> This extension, like a number of others, is useful because the >> expression syntax defined by XSD doesn't make use of capturing groups >> (there's no \1 or $1 or whatever), and so it doesn't need non-capturing >> groups, but in XPath and XQuery they are used. >> >> See e.g. https://www.w3.org/TR/xpath-functions-30/#regex-syntax >> >> Liam >> >> >> -- >> Liam R. E. Quin <l...@w3.org> >> The World Wide Web Consortium (W3C) >> >> >>