RE: mmap confusion in lucene

Uwe Schindler Mon, 14 Jul 2014 03:14:26 -0700

This is very easy to explain:

In the first part you copy the whole memory mapped stuff into a on-heap byte 
array. You allocate this byte array in total and you then do a copy (actually 
this is a standard libc copy call) of the whole file. To do this copy, the 
underlying OS will need to swap in the whole file, because it "sees" that you 
want to read the whole file anyway (because of the size of they copy operation).


The other example reads the stuff byte by byte in a Java for-loop. The 
operating system has no idea how to optimize that, so whenever you cross page 
boundaries it will swap in another buffer. Because of internal 
kernel-page-garbage collection, the pages swapped in are freed much faster. 
This is OS specific.

In general copying a random access file to java heap with mmap is just the 
wrong use case. Lucene never does this! The idea behind mmap is to *not copy* 
the data and work on the mmapped region directly (using random access). The OS 
cache logic will then use statistics about which pages were actually used and 
keep them longer in FS cache than those used one time and then no longer used 
for very long time.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: wangzhijiang999 [mailto:wangzhijiang...@aliyun.com]
> Sent: Monday, July 14, 2014 11:58 AM
> To: java-user
> Subject: mmap confusion in lucene
> 
> Hi everybody,         I found a problem confused me when I tested the mmap
> feature in lucene. I tested to read a file size of 800M by mmap method like
> below:
> 
> RandomAccessFile raf = new RandomAccessFile(new File(path), "r");
> FileChannel rafc = raf.getChannel();ByteBuffer buff =
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());
>  int len=buff.limit(); byte[] b = new byte[len];   for (int i = 0; i < len;
> i++){         b[i] = buff.get();  }
> After the program finished, the linux cache will be consumed about 800M.
> 
> 
> RandomAccessFile raf = new RandomAccessFile(new File(path), "r");
> FileChannel rafc = raf.getChannel();ByteBuffer buff =
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());
>  int len=buff.limit(); for (int i = 0; i < len; i++){         Byte b= 
> buff.get();  }
> But in this way, the linux cache will be consumed just 4M.
> 
> 
> RandomAccessFile raf = new RandomAccessFile(new File(path), "r");
> FileChannel rafc = raf.getChannel();ByteBuffer buff =
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());
>  int len=buff.limit(); byte[] b = new byte[len];   for (int i = 0; i < len;
> i++){         b[i] = buff.get();
>          b[i]=0;  }
> In this way, the linux cache will  be also consumed 4M.
> 
> The whole content of the file should be read for above three tests, but for
> the last two testings, the linux system only cached 4M .
> Would somebody give me the explaination about this? Thanks in advane.
> 
> Zhijiang Wang
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: mmap confusion in lucene

Reply via email to