Hi!
Why you are using
for line in eachline(f) l = readline(f)
instead of
for l in eachline(f)
?
Best
El jueves, 28 de enero de 2016, 12:42:35 (UTC-3), Brandon Booth escribió:
>
> I'm parsing an XML file that's about 30gb and wrote the loop below to
> parse it line by line. My code cycles through each line and builds a 1x200
> dataframe that is appended to a larger dataframe. When the larger dataframe
> gets to 1000 rows I stream it to an SQLite table. The code works for the
> first 25 million or so lines (which equates to 125,000 or so records in the
> SQLite table) and then freezes. I've tried it without the larger dataframe
> but that didn't help.
>
> Any suggestions to avoid crashing?
>
> Thanks.
>
> Brandon
>
>
>
> The XML structure:
> <doc>
> <field1>value</field1>
> <field2>value>/field2>
> ...
> </doc>
> <doc>
> <field1>value</field1>
> <field2>value>/field2>
> ...
> </doc>
>
>
> My loop:
>
> f = open("contracts.xml","r")readline(f)n = countlines(f)tic()for line in
> eachline(f) l = readline(f) if startswith(l,"<doc") df =
> DataFrame(df_types,df_names, 1) elseif startswith(l,"</doc")
> append!(df1,df) if size(df1,1) == 1000 source = convertdf(df1)
> Data.stream!(source,sink) deleterows!(df1,1:1000) end else str =
> parse_string(l) r = root(str) df[symbol(name(r))] = string(content(r))
> endend
>
> close(f)
>
>