No real reason. I was going back and forth between eachline(f) and for i =
1:n to see if it worked for 1000 rows, then 10,000 rows, etc. I ended up
with a hybrid of the two. Will that matter much?
On Thursday, January 28, 2016 at 1:32:09 PM UTC-5, Diego Javier Zea wrote:
>
> Hi!
>
> Why you are using
>
> for line in eachline(f) l = readline(f)
>
>
> instead of
>
> for l in eachline(f)
>
>
> ?
>
> Best
>
> El jueves, 28 de enero de 2016, 12:42:35 (UTC-3), Brandon Booth escribió:
>>
>> I'm parsing an XML file that's about 30gb and wrote the loop below to
>> parse it line by line. My code cycles through each line and builds a 1x200
>> dataframe that is appended to a larger dataframe. When the larger dataframe
>> gets to 1000 rows I stream it to an SQLite table. The code works for the
>> first 25 million or so lines (which equates to 125,000 or so records in the
>> SQLite table) and then freezes. I've tried it without the larger dataframe
>> but that didn't help.
>>
>> Any suggestions to avoid crashing?
>>
>> Thanks.
>>
>> Brandon
>>
>>
>>
>> The XML structure:
>> <doc>
>> <field1>value</field1>
>> <field2>value>/field2>
>> ...
>> </doc>
>> <doc>
>> <field1>value</field1>
>> <field2>value>/field2>
>> ...
>> </doc>
>>
>>
>> My loop:
>>
>> f = open("contracts.xml","r")readline(f)n = countlines(f)tic()for line in
>> eachline(f) l = readline(f) if startswith(l,"<doc") df =
>> DataFrame(df_types,df_names, 1) elseif startswith(l,"</doc")
>> append!(df1,df) if size(df1,1) == 1000 source = convertdf(df1)
>> Data.stream!(source,sink) deleterows!(df1,1:1000) end else str =
>> parse_string(l) r = root(str) df[symbol(name(r))] = string(content(r))
>> endend
>>
>> close(f)
>>
>>