I'll try to get around to comparing against the DataFrames version and 
profiling this week. I got stuck trying to figure out the action semantics.

On Tuesday, May 27, 2014 6:58:42 PM UTC-4, John Myles White wrote:
>
> I'd be really interested to see how this parser compares with DataFrames. 
> There's a bunch of test files in the DataFrames.jl/test directory.
>
>  -- John
>
> On May 27, 2014, at 3:49 PM, Abe Schneider <abe.sc...@gmail.com 
> <javascript:>> wrote:
>
> I don't know how the speed of the parser will be compared to DataFrames -- 
> I've done absolutely no work to date on profiling the code, but I thought 
> writing a CSV parser was a good way to test out code (and helped find a 
> bunch of bugs).
>
> I've also committed (under examples/) the CSV parser. The grammar (from 
> the RFC) is:
>
> @grammar csv begin
>   start = data
>   data = record + *(crlf + record)
>   record = field + *(comma + field)
>   field = escaped_field | unescaped_field
>   escaped_field = dquote + *(textdata | comma | cr | lf | dqoute2) + 
> dquote
>   unescaped_field = textdata
>   textdata = r"[ !#$%&'()*+\-./0-~]+"
>   cr = '\r'
>   lf = '\n'
>   crlf = cr + lf
>   dquote = '"'
>   dqoute2 = "\"\""
>   comma = ','
> end
>
> and the actions are:
>
> tr["crlf"] = (node, children) -> nothing
> tr["comma"] = (node, children) -> nothing
>
> tr["escaped_field"] = (node, children) -> node.children[2].value
> tr["unescaped_field"] = (node, children) -> node.children[1].value
> tr["field"] = (node, children) -> children
> tr["record"] = (node, children) -> unroll(children)
> tr["data"] = (node, children) -> unroll(children)
> tr["textdata"] = (node, children) -> node.value
>
>
> give the data:
>
> parse_data = """1,2,3\r\nthis is,a test,of csv\r\n"these","are","quotes (
> "")""""
>
> and running the parser:
>
> (node, pos, error) = parse(csv, parse_data)
> result = transform(tr, node)
>
> I get:
>
> {{"1","2","3"},{"this is","a test","of csv"},{"these","are","quotes 
> (\"\")"}}
>
>
>
>
>
> On Monday, May 26, 2014 3:41:26 AM UTC-4, harven wrote:
>>
>> Nice!
>>
>> If you are interested by testing your library on a concrete problem, you 
>> may want to parse comma separated value (csv) files. The bnf is in the 
>> specification RFC4180. http://tools.ietf.org/html/rfc4180
>>
>> AFAIK, the readcsv function provided in Base does not handle quotations 
>> well whereas the csv parser in DataFrames is slow, so that julia does not 
>> have yet a native efficient way to parse csv files.
>>
>
>

Reply via email to