Hi all,

Christian: I completely agree, CSV is a nightmare.  One way to reduce
the headaches (in, say, developing an EXPath CSV library) might be to
require that CSV pass validation by a tool such as
http://digital-preservation.github.io/csv-validator/.  Adam Retter
presented his work on CSV Schema and CSV Validator at
http://slides.com/adamretter/csv-validation.  This might require the
user to fix issues in the CSV first, but would reduce the scope of
variation considerably.  I notice that the Jackson CSV parser
leverages the notion of a schema in its imports:
https://github.com/FasterXML/jackson-dataformat-csv.

Hans-Jürgen: Thanks for the pointer to your library - it looks
fantastic.  I look forward to trying it out.

Liam: Thanks for the info about XQuery's additional regex handling beyond XSD.

And, lastly, to keep this post still basex related...

Christian: I tried removing the quote escaping but still get an error.
Here's a small test to reproduce:

xquery version "3.1";

let $row := '"Larry Bossidy, Ram Charan, Charles
Burck",Execution,9780609610572,Hardcover,2002'
return
    fn:analyze-string($row, '(?:\s*(?:"([^"]*)"|([^,]+))\s*,?|(?<=,)(),?)+?')

Joe

On Mon, Sep 12, 2016 at 7:29 AM, Christian Grün
<christian.gr...@gmail.com> wrote:
> I didn’t check the regex in general, but one reason I think why it
> fails is the escaped quote. For example, the following query is
> illegal in XQuery 3.1…
>
>   matches('a"b', 'a\"b')
>
> …where as the following one is ok:
>
>   matches('a"b', 'a"b')
>
>
>
> On Mon, Sep 12, 2016 at 1:15 PM, Hans-Juergen Rennau <hren...@yahoo.de> wrote:
>> Cordial thanks, Liam - I was not aware of that!
>>
>> @Joe: Rule of life: when one is especially sure to be right, one is surely
>> wrong, and so was I, and right were you(r first two characters).
>>
>>
>> Liam R. E. Quin <l...@w3.org> schrieb am 5:54 Montag, 12.September 2016:
>>
>>
>> Hans-Jürgen, wrote:
>>
>> ! Already the first
>>> two characters
>>>     (?render the expression invalid:(1) An unescaped ? is an
>>> occurrence indicator, making the preceding entity optional(2) An
>>> unescaped ( is used for grouping, it does not repesent anything
>>> => there is no entity preceding the ? which the ? could make optional
>>> => error
>>
>>
>> Actually (?: .... ) is a non-capturing group, defined in XPath 3.0 and
>> XQuery 3.0, based on the same syntax in other languages.
>>
>> This extension, like a number of others, is useful because the
>> expression syntax defined by XSD doesn't make use of capturing groups
>> (there's no \1 or $1 or whatever), and so it doesn't need non-capturing
>> groups, but in XPath and XQuery they are used.
>>
>> See e.g. https://www.w3.org/TR/xpath-functions-30/#regex-syntax
>>
>> Liam
>>
>>
>> --
>> Liam R. E. Quin <l...@w3.org>
>> The World Wide Web Consortium (W3C)
>>
>>
>>

Reply via email to