AW: Tip (for future consideration) on Channel + Buffer use

Dieter Stüken Mon, 05 Jan 2015 09:44:37 -0800

Hello Martin & Marc,

Since you mention MappedByteBuffer here are some notes on my experiences with 
memory mapped IO during the last decade:


I heavily used mmap() back to 2001 to process GeoTIFF images and Shapefiles too 
using C++.
Later on I switched to Java using NIO with MappedByteBuffer, realizing crazy 
fast processing tools.

Unfortunately I also encountered some problems:

1) I got unexpected OutOfMemoryError and it took a long time for me  to 
understand the source of this problem.

It was not caused by missing Java heap space (-Xmx...). Instead the system was 
unable to allocate additional virtual address space beyond the heap space java 
itself already allocated. This occurred especially on 32bit systems. While 
Linux may assign up to 3GB virtual memory to a user process, stupid XP gets 
exhausted below 1GB (and you have to subtract the java heap space already 
allocated). 

Today we mainly use 64 bit systems, but I still observed sporadic OOM Errors 
even on 64Bit systems (but this was around 2009, so maybe this has gone with 
Java7/8 meanwhile).

2) In contrast to C there is no way to explicitly unmap() any  MappedByteBuffer 
in Java.

Even worse the associated file is kept open, which is a minor problem on Unix 
but raises major problems on Windows due to its stupid mandatory locking. (see 
http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4724038). The problem is, 
that the mapping and the file channel are not released until the garbage 
collector finally wipes the buffer. In addition you may run unto a "too many 
open file" problem if you are about to process many files using MMIO. (see: 
http://stackoverflow.com/questions/13204656/too-many-open-file-error-java-io-filenotfoundexception)


My conclusion was to give up MappedByteBuffer to speed up IO. (I still use it 
rarely; i.e. modifying the colormap of a GeoTiff image on the fly...) Instead I 
switched back to plain ByteBuffers again, as you mentioned. But it may still be 
useful to use direct ByteBuffers. Those are allocated outside the Java heap 
space, just like MappedByteBuffer, but without locking any external file 
resource. This may still be problematic on 32bit systems, but I think running 
big data applications on 32bit is a bad idea anyway (and still using XP 
particularly!)

Dieter.

-----Ursprüngliche Nachricht-----
Von: Martin Desruisseaux [mailto:[email protected]] 
Gesendet: Montag, 5. Januar 2015 16:38
An: [email protected]
Betreff: Tip (for future consideration) on Channel + Buffer use

Hello Marc

Just a tip for later (at your choice): since our reading of Shapefile data is 
(for now) essentially sequential, it would be nice to use a plain ByteBuffer 
instead than a MappedByteBuffer in order to use less OS resources and for 
avoiding to be restricted to File inputs (other inputs could be URL or entries 
in a ZIP file. The later is especially useful for implementing Web Services 
that can return only a single file).

Using a plain java.nio.ByteBuffer is a little bit more difficult because we 
have to fill the buffer ourself from the java.nio.channels.ReadableByteChannel. 
To make this task easier, we have this internal class:

storage/sis-storage/src/main/java/org/apache/sis/internal/storage/ChannelDataInput.java

This class takes the supplied ByteBuffer and ReadableByteChannel, and provides 
convenience methods like readByte(), readDouble(), etc. which will handle 
automatically the task of transferring data from the channel to the buffer when 
needed.

For the record, the reason why this class is not public is that it breaks 
encapsulation: the ByteBuffer and the ReadableByteChannel are exposed publicly. 
This is intentional since this class is designed only as a convenience for SIS 
implementations, who may want to switch between the convenience methods for 
some tasks and direct usage of the channel and buffer for other tasks.

    Martin

AW: Tip (for future consideration) on Channel + Buffer use

Reply via email to