At best, you'll only see every other line, right? At worst, eachline may do
some IO lookahead (i.e. read one line ahead) and this will do something
even more confusing.

On Thu, Jan 28, 2016 at 3:35 PM, Brandon Booth <etu...@gmail.com> wrote:

> No real reason. I was going back and forth between eachline(f) and for i =
> 1:n to see if it worked for 1000 rows, then 10,000 rows, etc. I ended up
> with a hybrid of the two. Will that matter much?
>
>
> On Thursday, January 28, 2016 at 1:32:09 PM UTC-5, Diego Javier Zea wrote:
>>
>> Hi!
>>
>> Why you are using
>>
>> for line in eachline(f)  l = readline(f)
>>
>>
>> instead of
>>
>> for l in eachline(f)
>>
>>
>> ?
>>
>> Best
>>
>> El jueves, 28 de enero de 2016, 12:42:35 (UTC-3), Brandon Booth escribió:
>>>
>>> I'm parsing an XML file that's about 30gb and wrote the loop below to
>>> parse it line by line. My code cycles through each line and builds a 1x200
>>> dataframe that is appended to a larger dataframe. When the larger dataframe
>>> gets to 1000 rows I stream it to an SQLite table. The code works for the
>>> first 25 million or so lines (which equates to 125,000 or so records in the
>>> SQLite table) and then freezes. I've tried it without the larger dataframe
>>> but that didn't help.
>>>
>>> Any suggestions to avoid crashing?
>>>
>>> Thanks.
>>>
>>> Brandon
>>>
>>>
>>>
>>> The XML structure:
>>> <doc>
>>> <field1>value</field1>
>>> <field2>value>/field2>
>>> ...
>>> </doc>
>>> <doc>
>>> <field1>value</field1>
>>> <field2>value>/field2>
>>> ...
>>> </doc>
>>>
>>>
>>> My loop:
>>>
>>> f = open("contracts.xml","r")readline(f)n = countlines(f)tic()for line in 
>>> eachline(f)  l = readline(f)  if startswith(l,"<doc")    df = 
>>> DataFrame(df_types,df_names, 1)  elseif startswith(l,"</doc")    
>>> append!(df1,df)    if size(df1,1) == 1000      source = convertdf(df1)      
>>> Data.stream!(source,sink)      deleterows!(df1,1:1000)    end  else    str 
>>> = parse_string(l)    r = root(str)    df[symbol(name(r))] = 
>>> string(content(r))  endend
>>>
>>> close(f)
>>>
>>>

Reply via email to