RE: 答复：答复：答复：mmap confusion in lucene

Uwe Schindler Wed, 16 Jul 2014 02:45:08 -0700

This has nothing to do with mmap.
This is an optimization done by the compiler. If you don't use the return value 
of a method and the method itself has no side effects, it is never called.


Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: wangzhijiang999 [mailto:wangzhijiang...@aliyun.com]
> Sent: Wednesday, July 16, 2014 11:39 AM
> To: java-user; Uwe Schindler
> Subject: 答复：答复：答复：mmap confusion in lucene
> 
> Hi Uwe,
>         Where can find the detail introduction about the algorithm of mmap in
> java and OS? I did not find anything useful from jdk source code.
> 
> For example: byte b=curBuf.get(); System.out.printf(b);
> 
> When running get method, the jvm will not invoke the FS to read file in
> disk. When running printf method, that means the data will be used, then
> the jvm will really invoke the FS to read data.  My understanding is right?
> Thank you!
> 
> Zhijiang Wang
> 
> 
> 
> 
> ------------------------------------------------------------------发件人：Uwe
> Schindler <u...@thetaphi.de>发送时间：2014年7月15日(星期二) 17:29
> 收件人：java-user <java-user@lucene.apache.org>; wangzhijiang999
> <wangzhijiang...@aliyun.com>主　题：RE: 答复：答复：mmap
> confusion in luceneYes, the JVM is removing the get() call, because it knows
> that it has no side-effect: the position() pointer is not used afterwards and
> the result of the get() call is also not used. It is partly mapped because the
> optimization only starts to kick in after 10,000 method calls (the default
> threshold in the JVM).Uwe-----Uwe SchindlerH.-H.-Meier-Allee 63, D-28213
> Bremenhttp://www.thetaphi.deeMail: u...@thetaphi.de> -----Original
> Message-----> From: wangzhijiang999
> [mailto:wangzhijiang...@aliyun.com]> Sent: Tuesday, July 15, 2014 11:10
> AM> To: java-user> Subject: 答复：答复：mmap confusion in lucene> > Hi
> 308181687,> > I also tested in this way. If print every byte, the OS cache 
> will
> consume> the size of file at last,about 800M.> > for (int j = 0; j < len; 
> ++j){
> System.out.println(buff.get());}> > > If just call buff.get() in loop, the OS
> cache will consume only 8M at last.> for (int j = 0; j < len; ++j){ byte
> b=buff.get();}> > The buff.get() means reading the byte at this buffer's
> current position, and> then increments the position. But actually if you do
> not use the value from> buff.get(), FS will not read the disk. And I monitored
> the disk read and cache> condition by dstat -md command to confirm that
> the disk read will not> increase for the second test.> As you said, the jvm is
> so smart that if you do not use the data , it will not> read from disk. As my
> previous understanding, as long as you use get> method to fetch data, it
> should read from disk no matter whether> you actually use the data or not. I
> will continue researching on it to find the> real reason.> > > > > 
> ----------------
> --------------------------------------------------发件人：308181687>
> <308181...@qq.com>发送时间：2014年7月15日(星期二) 13:04收件人>
> ：java-user <java-user@lucene.apache.org>主　题：Re:答复：mmap>
> confusion in luceneHi, Zhiiang It seems that the jvm is smart enough to>
> ignore the unused code. Try the following code:RandomAccessFile raf =
> new> RandomAccessFile(new File("/root/xx.txt"), "r");FileChannel rafc =>
> raf.getChannel();ByteBuffer buff =>
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());int>
> len=buff.limit();byte b = 0;for (int i = 0; i < len; i++){b + = 
> buff.get();}The java>
> process will consume the expected 800M share memory. But if change the>
> line of " b + = buff.get()" to "b = buff.get()", the java process will not>
> consume so much share memory, i guess that the jvm is smart enough to>
> directly skip to the the last pos of the bytebuffer .Thanks & Best 
> Regards!‍-----
> > ------------- Original ------------------From: "java-user@lucene.apache.org>
> wan";<wangzhijiang...@aliyun.com>;Date: Tue, Jul 15, 2014 10:44 AMTo:>
> "java-user"<java-user@lucene.apache.org>; Subject: 答复：mmap>
> confusion in luceneHi Uwe,Thank you for always help. For my first testing I>
> am clear of it, it is becuase the OS cache the whole file because of copying>
> data to java heap and it does not free the page, then I see 800M used by>
> cache in the end.But for my last two testings, the OS has freed all the>
> previous cached pages, so I see the cache used only 4M in the end.Maybe I>
> am not very clear of the internal kernel mechanism. As I understand, the>
> kernel will swap out the page when the memory resource is limited or the>
> cached page is not used for long time. The first condition is not satisfied in
> my> testing, because the OS still has 30G memory available for use. For the>
> second condition, although the bytes are copied to java heap in first test,
> but> when the program ends to quit, the OS still reserve the cache. In the
> last> test, the OS released the page even in the running process of
> program.> Would you give me some further explaination for this? I am very>
> appreciated.Zhiiang 
> Wang--------------------------------------------------------------
> > ----发件人：Uwe Schindler <u...@thetaphi.de>发送时间：2014年7月
> 14> 日(星期一) 18:13收件人：java-user <java-user@lucene.apache.org>;>
> wangzhijiang999 <wangzhijiang...@aliyun.com>主　题：RE: mmap>
> confusion in luceneThis is very easy to explain:In the first part you copy 
> the>
> whole memory mapped stuff into a on-heap byte array. You allocate this>
> byte array in total and you then do a copy (actually this is a standard libc
> copy> call) of the whole file. To do this copy, the underlying OS will need to
> swap in> the whole file, because it "sees" that you want to read the whole
> file anyway> (because of the size of they copy operation).The other example
> reads the> stuff byte by byte in a Java for-loop. The operating system has no
> idea how> to optimize that, so whenever you cross page boundaries it will
> swap in> another buffer. Because of internal kernel-page-garbage collection,
> the> pages swapped in are freed much faster. This is OS specific.In general>
> copying a random access file to java heap with mmap is just the wrong use>
> case. Lucene never does this! The idea behind mmap is to *not copy* the>
> data and work on the mmapped region directly (using random access). The>
> OS cache logic will then use statistics about which pages were actually used>
> and keep them longer in FS cache than those used one time and then no>
> longer used for very long time.Uwe-----Uwe SchindlerH.-H.-Meier-Allee 63,>
> D-28213 Bremenhttp://www.thetaphi.deeMail: u...@thetaphi.de> ----->
> Original Message-----> From: wangzhijiang999>
> [mailto:wangzhijiang...@aliyun.com]> Sent: Monday, July 14, 2014 11:58>
> AM> To: java-user> Subject: mmap confusion in lucene> > Hi everybody, I>
> found a problem confused me when I tested the mmap> feature in lucene.
> I> tested to read a file size of 800M by mmap method like> below:> >>
> RandomAccessFile raf = new RandomAccessFile(new File(path), "r");>>
> FileChannel rafc = raf.getChannel();ByteBuffer buff =>>
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int>
> len=buff.limit(); byte[] b = new byte[len]; for (int i = 0; i < len;> i++){ 
> b[i] =>
> buff.get(); }> After the program finished, the linux cache will be consumed>
> about 800M.> > > RandomAccessFile raf = new RandomAccessFile(new>
> File(path), "r");> FileChannel rafc = raf.getChannel();ByteBuffer buff =>>
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int>
> len=buff.limit(); for (int i = 0; i < len; i++){ Byte b= buff.get(); }> But 
> in this>
> way, the linux cache will be consumed just 4M.> > > RandomAccessFile raf =>
> new RandomAccessFile(new File(path), "r");> FileChannel rafc =>
> raf.getChannel();ByteBuffer buff =>>
> rafc.map(FileChannel.MapMode.READ_ONLY, 0, rafc.size());> int>
> len=buff.limit(); byte[] b = new byte[len]; for (int i = 0; i < len;> i++){ 
> b[i] =>
> buff.get();> b[i]=0; }> In this way, the linux cache will be also consumed
> 4M.>> > The whole content of the file should be read for above three tests,
> but> for> the last two testings, the linux system only cached 4M .> Would>
> somebody give me the explaination about this? Thanks in advane.> >
> Zhijiang> Wang> 
> ---------------------------------------------------------------------
> To> unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.orgFor>
> additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: 答复：答复：答复：mmap confusion in lucene

Reply via email to