答复：答复：mmap confusion in lucene

wangzhijiang999 Tue, 15 Jul 2014 02:10:28 -0700

Hi 308181687,
 
         I also tested in this way. If print every byte, the OS cache will 
consume the size of file at last,about 800M.
 
for (int j = 0; j < len; ++j){        System.out.println(buff.get());}     
 
 
If just call buff.get() in loop, the OS cache will consume only 8M at last.
for (int j = 0; j < len; ++j){        byte b=buff.get();}    
 
The buff.get() means reading the byte at this buffer's current position, and 
then increments the position. But actually if you do not use the value from 
buff.get(), FS  will not read the disk. And I monitored the disk read and cache 
condition by dstat -md command to confirm that the disk read will not increase 
for the second test.
As you said, the jvm is so smart that if you do not use the data  , it will not 
read from disk. As my previous understanding, as long as you use get method to 
fetch data, it should read from disk no matter whether you actually use the 
data or not. I will continue researching on it to find the real reason.




------------------------------------------------------------------发件人：308181687 
<308181...@qq.com>发送时间：2014年7月15日(星期二) 13:04收件人：java-user 
<java-user@lucene.apache.org>主　题：Re:答复：mmap confusion in luceneHi, Zhiiang It 
seems that the jvm is smart enough to ignore the unused code. Try the following 
code:RandomAccessFile raf = new RandomAccessFile(new File("/root/xx.txt"), 
"r");FileChannel rafc = raf.getChannel();ByteBuffer buff = 
rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());int 
len=buff.limit();byte b = 0;for (int i = 0; i < len; i++){b + = buff.get();}The 
java process will consume the expected 800M share memory. But if change the 
line of " b + = buff.get()" to "b = buff.get()", the java process will not 
consume so much share memory, i guess that the jvm is smart enough to directly 
skip to the the last pos of the bytebuffer .Thanks & Best 
Regards!‍------------------ Original ------------------From: 
"java-user@lucene.apache.org wan";<wangzhijiang...@aliyun.com>;Date: Tue, Jul 
15, 2014 10:44 AMTo: "java-user"<java-user@lucene.apache.org>; Subject: 答复：mmap 
confusion in luceneHi Uwe,Thank you for always help. For my first testing I am 
clear of it, it is becuase the OS cache the whole file because of copying data 
to java heap and it does not free the page, then I see 800M used by cache in 
the end.But for my last two testings, the OS has freed all the previous cached 
pages, so I see the cache used only 4M in the end.Maybe I am not very clear of 
the internal kernel mechanism. As I understand, the kernel will swap out the 
page when the memory resource is limited or the cached page is not used for 
long time. The first condition is not satisfied in my testing, because the OS 
still has 30G memory available for use. For the second condition, although the 
bytes are copied to java heap in first test, but when the program ends to quit, 
the OS still reserve the cache. In the last test, the OS released the page even 
in the running process of program. Would you give me some further explaination 
for this? I am very appreciated.Zhiiang 
Wang------------------------------------------------------------------发件人：Uwe 
Schindler <u...@thetaphi.de>发送时间：2014年7月14日(星期一) 18:13收件人：java-user 
<java-user@lucene.apache.org>; wangzhijiang999 <wangzhijiang...@aliyun.com>主　
题：RE: mmap confusion in luceneThis is very easy to explain:In the first part 
you copy the whole memory mapped stuff into a on-heap byte array. You allocate 
this byte array in total and you then do a copy (actually this is a standard 
libc copy call) of the whole file. To do this copy, the underlying OS will need 
to swap in the whole file, because it "sees" that you want to read the whole 
file anyway (because of the size of they copy operation).The other example 
reads the stuff byte by byte in a Java for-loop. The operating system has no 
idea how to optimize that, so whenever you cross page boundaries it will swap 
in another buffer. Because of internal kernel-page-garbage collection, the 
pages swapped in are freed much faster. This is OS specific.In general copying 
a random access file to java heap with mmap is just the wrong use case. Lucene 
never does this! The idea behind mmap is to *not copy* the data and work on the 
mmapped region directly (using random access). The OS cache logic will then use 
statistics about which pages were actually used and keep them longer in FS 
cache than those used one time and then no longer used for very long 
time.Uwe-----Uwe SchindlerH.-H.-Meier-Allee 63, D-28213 
Bremenhttp://www.thetaphi.deeMail: u...@thetaphi.de> -----Original 
Message-----> From: wangzhijiang999 [mailto:wangzhijiang...@aliyun.com]> Sent: 
Monday, July 14, 2014 11:58 AM> To: java-user> Subject: mmap confusion in 
lucene> > Hi everybody, I found a problem confused me when I tested the mmap> 
feature in lucene. I tested to read a file size of 800M by mmap method like> 
below:> > RandomAccessFile raf = new RandomAccessFile(new File(path), "r");> 
FileChannel rafc = raf.getChannel();ByteBuffer buff => 
rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int len=buff.limit(); 
byte[] b = new byte[len]; for (int i = 0; i < len;> i++){ b[i] = buff.get(); }> 
After the program finished, the linux cache will be consumed about 800M.> > > 
RandomAccessFile raf = new RandomAccessFile(new File(path), "r");> FileChannel 
rafc = raf.getChannel();ByteBuffer buff => 
rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int len=buff.limit(); 
for (int i = 0; i < len; i++){ Byte b= buff.get(); }> But in this way, the 
linux cache will be consumed just 4M.> > > RandomAccessFile raf = new 
RandomAccessFile(new File(path), "r");> FileChannel rafc = 
raf.getChannel();ByteBuffer buff => rafc.map(FileChannel.MapMode.READ_ONLY, 0, 
rafc.size());> int len=buff.limit(); byte[] b = new byte[len]; for (int i = 0; 
i < len;> i++){ b[i] = buff.get();> b[i]=0; }> In this way, the linux cache 
will be also consumed 4M.> > The whole content of the file should be read for 
above three tests, but for> the last two testings, the linux system only cached 
4M .> Would somebody give me the explaination about this? Thanks in advane.> > 
Zhijiang Wang> 
---------------------------------------------------------------------To 
unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.orgFor additional 
commands, e-mail: java-user-h...@lucene.apache.org

答复：答复：mmap confusion in lucene

Reply via email to