Hi Christian,

Yes, that sounds like the culprit.  Searching back through my files,
Adam Retter responded on exist-open (at
http://markmail.org/message/3bxz55du3hl6arpr) to a call for help with
the lack of lookahead support in XPath, by pointing to an XSLT he
adapted for CSV parsing,
https://github.com/digital-preservation/csv-tools/blob/master/csv-to-xml_v3.xsl.
I adapted this technique to XQuery, and it works on the sample case in
my earlier email.

Joe

```xquery
xquery version "3.1";

declare function local:get-cells($row as xs:string) as xs:string {
    (: workaround lack of lookahead support in XPath: end row with comma :)
    let $string-to-analyze := $row || ","
    let $analyze := fn:analyze-string($row, '(("[^"]*")+|[^,]*),')
    for $group in $analyze//fn:group[@nr="1"]
    return
        if (matches($group, '^".+"$')) then
            replace($group, '^"([^"]+)"$', '$1')
        else
            $group/string()
};

let $csv := 'Author,Title,ISBN,Binding,Year Published
Jeannette Walls,The Glass Castle,074324754X,Paperback,2006
James Surowiecki,The Wisdom of Crowds,9780385503860,Paperback,2005
Lawrence Lessig,The Future of Ideas,9780375505782,Paperback,2002
"Larry Bossidy, Ram Charan, Charles
Burck",Execution,9780609610572,Hardcover,2002
Kurt Vonnegut,Slaughterhouse-Five,9780791059258,Paperback,1999'
let $lines := tokenize($csv, '\n')
let $header-row := fn:head($lines)
let $body-rows := fn:tail($lines)
let $headers := local:get-cells($header-row)
for $row in $body-rows
let $cells := local:get-cells($row)
return
    element row {
      for $cell at $count in $cells
      return element {$headers[$count]} {$cell}
    }
```

On Mon, Sep 12, 2016 at 10:11 AM, Christian Grün
<christian.gr...@gmail.com> wrote:
>> Christian: I tried removing the quote escaping but still get an error.
>> Here's a small test to reproduce:
>>
>>     fn:analyze-string($row, '(?:\s*(?:"([^"]*)"|([^,]+))\s*,?|(?<=,)(),?)+?')
>
> I assume it’s the lookbehind assertion that is not allowed in XQuery
> (but I should definitely spend more time on it to give you a better
> answer..).

Reply via email to