答复：答复：答复：mmap confusion in lucene

wangzhijiang999 Wed, 16 Jul 2014 02:39:31 -0700

Hi Uwe,
        Where can find the detail introduction about the algorithm of mmap in 
java and OS? I did not find anything useful from jdk source code. 
 
For example: byte b=curBuf.get(); System.out.printf(b);
 
When running get method, the jvm will not invoke the FS to read file in disk. 
When running printf method, that means the data will be used, then the jvm will 
really invoke the FS to read data.  My understanding is right?
Thank you!
 
Zhijiang Wang



------------------------------------------------------------------发件人：Uwe 
Schindler <[email protected]>发送时间：2014年7月15日(星期二) 17:29收件人：java-user 
<[email protected]>; wangzhijiang999 <[email protected]>主　
题：RE: 答复：答复：mmap confusion in luceneYes, the JVM is removing the get() call, 
because it knows that it has no side-effect: the position() pointer is not used 
afterwards and the result of the get() call is also not used. It is partly 
mapped because the optimization only starts to kick in after 10,000 method 
calls (the default threshold in the JVM).Uwe-----Uwe SchindlerH.-H.-Meier-Allee 
63, D-28213 Bremenhttp://www.thetaphi.deeMail: [email protected]> -----Original 
Message-----> From: wangzhijiang999 [mailto:[email protected]]> Sent: 
Tuesday, July 15, 2014 11:10 AM> To: java-user> Subject: 答复：答复：mmap confusion 
in lucene> > Hi 308181687,> > I also tested in this way. If print every byte, 
the OS cache will consume> the size of file at last,about 800M.> > for (int j = 
0; j < len; ++j){ System.out.println(buff.get());}> > > If just call buff.get() 
in loop, the OS cache will consume only 8M at last.> for (int j = 0; j < len; 
++j){ byte b=buff.get();}> > The buff.get() means reading the byte at this 
buffer's current position, and> then increments the position. But actually if 
you do not use the value from> buff.get(), FS will not read the disk. And I 
monitored the disk read and cache> condition by dstat -md command to confirm 
that the disk read will not> increase for the second test.> As you said, the 
jvm is so smart that if you do not use the data , it will not> read from disk. 
As my previous understanding, as long as you use get> method to fetch data, it 
should read from disk no matter whether> you actually use the data or not. I 
will continue researching on it to find the> real reason.> > > > > 
------------------------------------------------------------------发件人：308181687>
 <[email protected]>发送时间：2014年7月15日(星期二) 13:04收件人> ：java-user 
<[email protected]>主　题：Re:答复：mmap> confusion in luceneHi, Zhiiang It 
seems that the jvm is smart enough to> ignore the unused code. Try the 
following code:RandomAccessFile raf = new> RandomAccessFile(new 
File("/root/xx.txt"), "r");FileChannel rafc => raf.getChannel();ByteBuffer buff 
=> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());int> 
len=buff.limit();byte b = 0;for (int i = 0; i < len; i++){b + = buff.get();}The 
java> process will consume the expected 800M share memory. But if change the> 
line of " b + = buff.get()" to "b = buff.get()", the java process will not> 
consume so much share memory, i guess that the jvm is smart enough to> directly 
skip to the the last pos of the bytebuffer .Thanks & Best Regards!‍-----> 
------------- Original ------------------From: "[email protected]> 
wan";<[email protected]>;Date: Tue, Jul 15, 2014 10:44 AMTo:> 
"java-user"<[email protected]>; Subject: 答复：mmap> confusion in 
luceneHi Uwe,Thank you for always help. For my first testing I> am clear of it, 
it is becuase the OS cache the whole file because of copying> data to java heap 
and it does not free the page, then I see 800M used by> cache in the end.But 
for my last two testings, the OS has freed all the> previous cached pages, so I 
see the cache used only 4M in the end.Maybe I> am not very clear of the 
internal kernel mechanism. As I understand, the> kernel will swap out the page 
when the memory resource is limited or the> cached page is not used for long 
time. The first condition is not satisfied in my> testing, because the OS still 
has 30G memory available for use. For the> second condition, although the bytes 
are copied to java heap in first test, but> when the program ends to quit, the 
OS still reserve the cache. In the last> test, the OS released the page even in 
the running process of program.> Would you give me some further explaination 
for this? I am very> appreciated.Zhiiang 
Wang--------------------------------------------------------------> ----发件人：Uwe 
Schindler <[email protected]>发送时间：2014年7月14> 日(星期一) 18:13收件人：java-user 
<[email protected]>;> wangzhijiang999 <[email protected]>主　
题：RE: mmap> confusion in luceneThis is very easy to explain:In the first part 
you copy the> whole memory mapped stuff into a on-heap byte array. You allocate 
this> byte array in total and you then do a copy (actually this is a standard 
libc copy> call) of the whole file. To do this copy, the underlying OS will 
need to swap in> the whole file, because it "sees" that you want to read the 
whole file anyway> (because of the size of they copy operation).The other 
example reads the> stuff byte by byte in a Java for-loop. The operating system 
has no idea how> to optimize that, so whenever you cross page boundaries it 
will swap in> another buffer. Because of internal kernel-page-garbage 
collection, the> pages swapped in are freed much faster. This is OS specific.In 
general> copying a random access file to java heap with mmap is just the wrong 
use> case. Lucene never does this! The idea behind mmap is to *not copy* the> 
data and work on the mmapped region directly (using random access). The> OS 
cache logic will then use statistics about which pages were actually used> and 
keep them longer in FS cache than those used one time and then no> longer used 
for very long time.Uwe-----Uwe SchindlerH.-H.-Meier-Allee 63,> D-28213 
Bremenhttp://www.thetaphi.deeMail: [email protected]> -----> Original 
Message-----> From: wangzhijiang999> [mailto:[email protected]]> Sent: 
Monday, July 14, 2014 11:58> AM> To: java-user> Subject: mmap confusion in 
lucene> > Hi everybody, I> found a problem confused me when I tested the mmap> 
feature in lucene. I> tested to read a file size of 800M by mmap method like> 
below:> >> RandomAccessFile raf = new RandomAccessFile(new File(path), "r");>> 
FileChannel rafc = raf.getChannel();ByteBuffer buff =>> 
rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int> 
len=buff.limit(); byte[] b = new byte[len]; for (int i = 0; i < len;> i++){ 
b[i] => buff.get(); }> After the program finished, the linux cache will be 
consumed> about 800M.> > > RandomAccessFile raf = new RandomAccessFile(new> 
File(path), "r");> FileChannel rafc = raf.getChannel();ByteBuffer buff =>> 
rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int> 
len=buff.limit(); for (int i = 0; i < len; i++){ Byte b= buff.get(); }> But in 
this> way, the linux cache will be consumed just 4M.> > > RandomAccessFile raf 
=> new RandomAccessFile(new File(path), "r");> FileChannel rafc => 
raf.getChannel();ByteBuffer buff =>> rafc.map(FileChannel.MapMode.READ_ONLY, 0, 
rafc.size());> int> len=buff.limit(); byte[] b = new byte[len]; for (int i = 0; 
i < len;> i++){ b[i] => buff.get();> b[i]=0; }> In this way, the linux cache 
will be also consumed 4M.>> > The whole content of the file should be read for 
above three tests, but> for> the last two testings, the linux system only 
cached 4M .> Would> somebody give me the explaination about this? Thanks in 
advane.> > Zhijiang> Wang> 
---------------------------------------------------------------------To> 
unsubscribe, e-mail: [email protected]> additional 
commands, e-mail: [email protected]

答复：答复：答复：mmap confusion in lucene

Reply via email to