[julia-users] Re: [large file] eachline memory consumption

Keith Campbell Wed, 02 Apr 2014 11:32:28 -0700

Hi Krishna,
  I've run into similar problems, and had some luck with mmaps + 
pre-allocated storage.
You can see an example at:
http://nbviewer.ipython.org/github/catawbasam/catawbasam_sandbox/blob/master/lineread_mmap_short.ipynb

That example reads in a list of 100 million floats from a 1.8 gb text file. 
 Using eachline()+float(), Julia allocates 12.8 gb and takes 45 seconds.   

With mmap and pre-allocated storage, the memory allocation drops to 10 kb 
and the processing time is 23 seconds. (the mmap version near the bottom of 
the Notebook)

I've played with an iterator version, but haven't succeeded in creating one 
that avoids memory allocation.  If someone else has got such a thing, I'd 
love to hear about it.
cheers,
Keith

On Sunday, March 30, 2014 2:51:19 PM UTC-4, krishna mohan wrote:
>
> Dear All,
>
> I am reading a large file (10Gb) as follows
> open("large_file.txt") do fh
>     for line in eachline(fh)
>         println(length(line))
>      end
> end
>
> It is strange to note that the memory consumption goes up linearly with 
> time. But I would expect it  to be negligible and constant because we are 
> reading only one line at a time.
>
> Please let me know.
>
> Am I missing something ?
>
> Regards,
> Krishna 
>
>
>

[julia-users] Re: [large file] eachline memory consumption

Reply via email to