[julia-users] Re: Crashing while parsing large XML file

Brandon Booth Thu, 28 Jan 2016 12:36:06 -0800

No real reason. I was going back and forth between eachline(f) and for i = 
1:n to see if it worked for 1000 rows, then 10,000 rows, etc. I ended up 
with a hybrid of the two. Will that matter much?



On Thursday, January 28, 2016 at 1:32:09 PM UTC-5, Diego Javier Zea wrote:
>
> Hi! 
>
> Why you are using 
>
> for line in eachline(f)  l = readline(f)
>
>
> instead of
>
> for l in eachline(f)
>
>
> ?
>
> Best
>
> El jueves, 28 de enero de 2016, 12:42:35 (UTC-3), Brandon Booth escribió:
>>
>> I'm parsing an XML file that's about 30gb and wrote the loop below to 
>> parse it line by line. My code cycles through each line and builds a 1x200 
>> dataframe that is appended to a larger dataframe. When the larger dataframe 
>> gets to 1000 rows I stream it to an SQLite table. The code works for the 
>> first 25 million or so lines (which equates to 125,000 or so records in the 
>> SQLite table) and then freezes. I've tried it without the larger dataframe 
>> but that didn't help.
>>
>> Any suggestions to avoid crashing?
>>
>> Thanks.
>>
>> Brandon
>>
>>
>>
>> The XML structure:
>> <doc>
>> <field1>value</field1>
>> <field2>value>/field2>
>> ...
>> </doc>
>> <doc>
>> <field1>value</field1>
>> <field2>value>/field2>
>> ...
>> </doc>
>>
>>
>> My loop:
>>
>> f = open("contracts.xml","r")readline(f)n = countlines(f)tic()for line in 
>> eachline(f)  l = readline(f)  if startswith(l,"<doc")    df = 
>> DataFrame(df_types,df_names, 1)  elseif startswith(l,"</doc")    
>> append!(df1,df)    if size(df1,1) == 1000      source = convertdf(df1)      
>> Data.stream!(source,sink)      deleterows!(df1,1:1000)    end  else    str = 
>> parse_string(l)    r = root(str)    df[symbol(name(r))] = string(content(r)) 
>>  endend
>>
>> close(f)
>>
>>

[julia-users] Re: Crashing while parsing large XML file

Reply via email to