Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-19 Thread Even Rouault
Le jeudi 19 décembre 2013 04:41:18, Trent Piepho a écrit :
 Do you see page file activity?  If you look at /proc/pid/smaps, you
 should be able to see the actual status of the mapping of your data
 file.  Probably it is consuming a large number of pages of RAM, but
 also there should be zero pages written to swap.  All clean private or
 clean shared, zero anonymous and zero swap.

7f8e6f75d000-7f9334df1000 r--p  08:06 11645756   
/home/even/Téléchargements/eudem_dem_4258_europe.tif
Size:   20011600 kB
Rss: 2943372 kB
Pss: 2943372 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean:   2943372 kB
Private_Dirty: 0 kB
Referenced:   950596 kB
Swap:  0 kB
KernelPageSize:4 kB
MMUPageSize:   4 kB

So yes, you are right. I've noticed the 'Referenced' values tends to fluctuate 
a lot during execution, while other values remain stable after some point.

 
 I think the system unresponsiveness is probably do to I/O scheduling.
 You're process has queued a lot of I/O reads and everything else has
 to wait in the queue.  So all other I/O sees huge latencies.

Yes, that's a likely clause. I've just tried to do a 
madvise(..,..,MADV_RANDOM) on the whole mapping just after mmap(), and it 
seems to increase the system responsiveness in a noticeable way (since I can 
write this email while the program is running !). Of course, the throughput of 
the test program has reduced significantly (which is logical since agressive 
I/O read-ahead must be disabled).

 
 And too, a 20 GB mapping is probably thrashing the TLB.  Do huge pages
 actually get used?

No, they don't. My understanding and previous attempts is that mmap() needs an 
explicit flag for that, and other tunings at OS level.

 On the embedded systems I'm more intimately
 familiar with, only normal 4k pages are used by user processes.  Huge
 TLBs are more of a special case that can be used by the kernel for
 things like frame buffer mappings and SoC register windows.
 


-- 
Geospatial professional services
http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-18 Thread Even Rouault
Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
 Even,
 
 Very impressive work, I am supportive.
 
 IMHO it would be wonderful if there was also an mmap() based mechanism
 where you could ask for the virtual memory chunk and you get it back (if it
 works) along with stride values to access in it.  This could likely be made
 to work for most raw based formats and a few others too.  It might also
 allow non-mmap() based files to return an organization based more on their
 actual organization for efficiency.

Hi Frank,

I'm not completely sure to have understood your idea. Would that be something 
like :

CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
 GDALRWFlag eRWFlag,
 int nXOff, int nYOff,
 int nXSize, int nYSize,
 int nBufXSize, int nBufYSize,
 GDALDataType eBufType,
 int nBandCount, int* panBandMap,
 int *pnPixelSpace,
 GIntBig *pnLineSpace,
 GIntBig *pnBandSpace,
 size_t nCacheSize,
 int bSingleThreadUsage,
 char **papszOptions );

Difference with GDALDatasetGetVirtualMem() : the stride values are now output 
values and no more nPageSizeHint parameter.

In your mind, would the spacings be determined in a generic way from the 
dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item), or 
would that require some direct cooperation of the driver ?

Since you mention raw formats, perhaps you are thinking more to a file-based 
mmap() rather than a anonymous mmap() combined with RasterIO(), like currently 
proposed ? This is something I've mentionned in the Related thoughts 
paragraph but there are practical annoyance with how Linux manages memory with 
file-based mmap(). I'd be happy if someone has successfull experience with that 
by the way (and that doesn't require explicit madvise() each time you're done 
with a range of memory)

---

Reading again your words, I'm now wondering if you are not thinking to a 
Dataset / RasterBand virtual method that could be implemented by drivers ?

virtual CPLVirtualMem* GetVirtualMem(...)

They would directly use the low-level CPLVirtualMem to create the mapping and 
provide their own callback to fill pages when page fault occurs. So they could 
potentially avoid using the block cache layer and do direct file I/O ?

Looking at RawRasterBand::IRasterIO(), I can see that it can use (under some 
circumstances with a non obvious heuristics) direct file I/O without going to 
the block cache. So the current proposed implementation could potentially 
already benefit from that. Perhaps we would need a flag to RasterIO to ask it 
to 
avoid block cache when possible. Or just call 
CPLSetThreadLocalConfigOption(GDAL_ONE_BIG_READ, YES) in 
GDALVirtualMem::DoIOBandSequential() / DoIOPixelInterleaved()

Even

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-18 Thread Frank Warmerdam
Even,

Sorry, I was thinking of mmap() directly to the file, and having something
like:

CPLVirtualMem CPL_DLL* GDALBandGetVirtualMemAuto( GDALRasterBandH hBand,
 int *pnPixelSpace,
 GIntBig *pnLineSpace,
 char **papszOptions );

I imagined an available virtual method on the band which could be
implemented - primarily by the RawBand class to try and mmap() the data and
return the layout.  But when that fails, or is unavailable it could use
your existing methodology with a layout that seems well tuned to the
underlying data organization.

Certainly there is no need to hold things up for this.  What you are
proposing is already wonderfully useful.  I'm wondering if there would be
ways of making what you propose work with Python Numpy in such a way that a
numpy array could be requested which is of this virtual memory.  That would
also be a nice extension.

Best regards,
Frank



On Wed, Dec 18, 2013 at 2:10 AM, Even Rouault
even.roua...@mines-paris.orgwrote:

 Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
  Even,
 
  Very impressive work, I am supportive.
 
  IMHO it would be wonderful if there was also an mmap() based mechanism
  where you could ask for the virtual memory chunk and you get it back (if
 it
  works) along with stride values to access in it.  This could likely be
 made
  to work for most raw based formats and a few others too.  It might also
  allow non-mmap() based files to return an organization based more on
 their
  actual organization for efficiency.

 Hi Frank,

 I'm not completely sure to have understood your idea. Would that be
 something
 like :

 CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
  GDALRWFlag eRWFlag,
  int nXOff, int nYOff,
  int nXSize, int nYSize,
  int nBufXSize, int nBufYSize,
  GDALDataType eBufType,
  int nBandCount, int* panBandMap,
  int *pnPixelSpace,
  GIntBig *pnLineSpace,
  GIntBig *pnBandSpace,
  size_t nCacheSize,
  int bSingleThreadUsage,
  char **papszOptions );

 Difference with GDALDatasetGetVirtualMem() : the stride values are now
 output
 values and no more nPageSizeHint parameter.

 In your mind, would the spacings be determined in a generic way from the
 dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item), or
 would that require some direct cooperation of the driver ?

 Since you mention raw formats, perhaps you are thinking more to a
 file-based
 mmap() rather than a anonymous mmap() combined with RasterIO(), like
 currently
 proposed ? This is something I've mentionned in the Related thoughts
 paragraph but there are practical annoyance with how Linux manages memory
 with
 file-based mmap(). I'd be happy if someone has successfull experience with
 that
 by the way (and that doesn't require explicit madvise() each time you're
 done
 with a range of memory)

 ---

 Reading again your words, I'm now wondering if you are not thinking to a
 Dataset / RasterBand virtual method that could be implemented by drivers ?

 virtual CPLVirtualMem* GetVirtualMem(...)

 They would directly use the low-level CPLVirtualMem to create the mapping
 and
 provide their own callback to fill pages when page fault occurs. So they
 could
 potentially avoid using the block cache layer and do direct file I/O ?

 Looking at RawRasterBand::IRasterIO(), I can see that it can use (under
 some
 circumstances with a non obvious heuristics) direct file I/O without going
 to
 the block cache. So the current proposed implementation could potentially
 already benefit from that. Perhaps we would need a flag to RasterIO to ask
 it to
 avoid block cache when possible. Or just call
 CPLSetThreadLocalConfigOption(GDAL_ONE_BIG_READ, YES) in
 GDALVirtualMem::DoIOBandSequential() / DoIOPixelInterleaved()

 Even

 --
 Geospatial professional services
 http://even.rouault.free.fr/services.html




-- 
---+--
I set the clouds in motion - turn up   | Frank Warmerdam,
warmer...@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Software Developer
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-18 Thread Even Rouault
Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
 Even,
 
 Sorry, I was thinking of mmap() directly to the file, and having something
 like:
 
 CPLVirtualMem CPL_DLL* GDALBandGetVirtualMemAuto( GDALRasterBandH hBand,
  int *pnPixelSpace,
  GIntBig *pnLineSpace,
  char **papszOptions );
 
 I imagined an available virtual method on the band which could be
 implemented - primarily by the RawBand class to try and mmap() the data and
 return the layout.  But when that fails, or is unavailable it could use
 your existing methodology with a layout that seems well tuned to the
 underlying data organization.

Yes, that should be doable, but with the limitation I raised about the memory 
management of file-based mmap() : if you mmap() a file larger than RAM, and 
read 
it entirely, without explicit madvise() to discard regions no longer needed, 
it will fill RAM and cause disk swapping. I should retest to confirm. Perhaps 
there are some OS level tuning to avoid that ?

 
 Certainly there is no need to hold things up for this.  What you are
 proposing is already wonderfully useful. 

I've no particular timetable for this. This started as an experiment. So I'm 
happy to explore complementary ideas.

 I'm wondering if there would be
 ways of making what you propose work with Python Numpy in such a way that a
 numpy array could be requested which is of this virtual memory.  That would
 also be a nice extension.

Hum, how would that be different from what is proposed in the SWIG bindings 
section of the RFC ?

 
 Best regards,
 Frank
 
 
 
 On Wed, Dec 18, 2013 at 2:10 AM, Even Rouault
 
 even.roua...@mines-paris.orgwrote:
  Le mercredi 18 décembre 2013 06:55:50, Frank Warmerdam a écrit :
   Even,
   
   Very impressive work, I am supportive.
   
   IMHO it would be wonderful if there was also an mmap() based mechanism
   where you could ask for the virtual memory chunk and you get it back
   (if
  
  it
  
   works) along with stride values to access in it.  This could likely be
  
  made
  
   to work for most raw based formats and a few others too.  It might
   also allow non-mmap() based files to return an organization based more
   on
  
  their
  
   actual organization for efficiency.
  
  Hi Frank,
  
  I'm not completely sure to have understood your idea. Would that be
  something
  like :
  
  CPLVirtualMem CPL_DLL* GDALDatasetGetVirtualMemAuto( GDALDatasetH hDS,
  
   GDALRWFlag eRWFlag,
   int nXOff, int nYOff,
   int nXSize, int nYSize,
   int nBufXSize, int nBufYSize,
   GDALDataType eBufType,
   int nBandCount, int* panBandMap,
   int *pnPixelSpace,
   GIntBig *pnLineSpace,
   GIntBig *pnBandSpace,
   size_t nCacheSize,
   int bSingleThreadUsage,
   char **papszOptions );
  
  Difference with GDALDatasetGetVirtualMem() : the stride values are now
  output
  values and no more nPageSizeHint parameter.
  
  In your mind, would the spacings be determined in a generic way from the
  dataset properties(block size and INTERLEAVED=PIXEL/BAND metadata item),
  or would that require some direct cooperation of the driver ?
  
  Since you mention raw formats, perhaps you are thinking more to a
  file-based
  mmap() rather than a anonymous mmap() combined with RasterIO(), like
  currently
  proposed ? This is something I've mentionned in the Related thoughts
  paragraph but there are practical annoyance with how Linux manages memory
  with
  file-based mmap(). I'd be happy if someone has successfull experience
  with that
  by the way (and that doesn't require explicit madvise() each time you're
  done
  with a range of memory)
  
  ---
  
  Reading again your words, I'm now wondering if you are not thinking to a
  Dataset / RasterBand virtual method that could be implemented by drivers
  ?
  
  virtual CPLVirtualMem* GetVirtualMem(...)
  
  They would directly use the low-level CPLVirtualMem to create the mapping
  and
  provide their own callback to fill pages when page fault occurs. So they
  could
  potentially avoid using the block cache layer and do direct file I/O ?
  
  Looking at RawRasterBand::IRasterIO(), I can see that it can use (under
  some
  circumstances with a non obvious heuristics) direct file I/O without
  going to
  the block cache. So the current proposed implementation could potentially
  already benefit from that. Perhaps we would need a flag to RasterIO to
  

Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-18 Thread Frank Warmerdam
On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault even.roua...@mines-paris.org
 wrote:

 Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
  Even,
 
  Sorry, I was thinking of mmap() directly to the file, and having
 something
  like:
 
  CPLVirtualMem CPL_DLL* GDALBandGetVirtualMemAuto( GDALRasterBandH hBand,
   int *pnPixelSpace,
   GIntBig *pnLineSpace,
   char **papszOptions );
 
  I imagined an available virtual method on the band which could be
  implemented - primarily by the RawBand class to try and mmap() the data
 and
  return the layout.  But when that fails, or is unavailable it could use
  your existing methodology with a layout that seems well tuned to the
  underlying data organization.

 Yes, that should be doable, but with the limitation I raised about the
 memory
 management of file-based mmap() : if you mmap() a file larger than RAM,
 and read
 it entirely, without explicit madvise() to discard regions no longer
 needed,
 it will fill RAM and cause disk swapping. I should retest to confirm.
 Perhaps

there are some OS level tuning to avoid that ?


Even,

That was not my experience for readonly mmap() of actual files on disk
back in the day.

In any event, I'd suggest sticking with what you have, and if I'm keen
perhaps one day I'll try and implement mmap() support.  If I do, I feel
like it needs to go down through the VSI*L system and once a file is
mmapped() the VSI*L IO should also be using the mmaped images.  Once upon a
time this had performance benefits. I'm not sure if that is the case any
more.

Best regards,
Frank

-- 
---+--
I set the clouds in motion - turn up   | Frank Warmerdam,
warmer...@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Software Developer
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-18 Thread Frank Warmerdam
On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault even.roua...@mines-paris.org
 wrote:

  I'm wondering if there would be
  ways of making what you propose work with Python Numpy in such a way
 that a
  numpy array could be requested which is of this virtual memory.  That
 would
  also be a nice extension.

 Hum, how would that be different from what is proposed in the SWIG bindings
 section of the RFC ?


Even,

Ahem - I apparently did not read the RFC closely enough.  You are well
ahead of me on this idea.

Best regards,

-- 
---+--
I set the clouds in motion - turn up   | Frank Warmerdam,
warmer...@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Software Developer
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-18 Thread Trent Piepho
On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault
even.roua...@mines-paris.org wrote:
 Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :

 I imagined an available virtual method on the band which could be
 implemented - primarily by the RawBand class to try and mmap() the data and
 return the layout.  But when that fails, or is unavailable it could use
 your existing methodology with a layout that seems well tuned to the
 underlying data organization.

 Yes, that should be doable, but with the limitation I raised about the memory
 management of file-based mmap() : if you mmap() a file larger than RAM, and 
 read
 it entirely, without explicit madvise() to discard regions no longer needed,
 it will fill RAM and cause disk swapping. I should retest to confirm. Perhaps
 there are some OS level tuning to avoid that ?

For Linux, if you mmap a file and do not write to it, the pages will
be clean.  This means that under memory pressure those pages can be
dropped without paging out to swap.  They are already backed on disk
in the mmaped file.  Only dirty anonymous mapped pages (anon mmap,
malloc() memory from mmap() or brk(), stack, etc.) would need to be
written to swap.

Of course if you touch a large amount of memory and know you're never
use it again, you can help the OS out when it comes to deciding which
pages to free by using madvise.

One think to consider is that a 32-bit OS can only memory map about
2-3 GB at once, even though there is no trouble using files much
larger than this size.  If you want to access a large file with
mmap(), you might need to use some kind of sliding window.

I think also, mmaping many gigabytes has a certain cost in setting up
the page tables for the mapping that's not insignificant.  Even on a
64-bit os, mmaping a 20 GB file just to access some small portion of
it could be inefficient.
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-18 Thread Even Rouault
Le mercredi 18 décembre 2013 21:09:48, Trent Piepho a écrit :
 On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault
 
 even.roua...@mines-paris.org wrote:
  Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
  I imagined an available virtual method on the band which could be
  implemented - primarily by the RawBand class to try and mmap() the data
  and return the layout.  But when that fails, or is unavailable it could
  use your existing methodology with a layout that seems well tuned to
  the underlying data organization.
  
  Yes, that should be doable, but with the limitation I raised about the
  memory management of file-based mmap() : if you mmap() a file larger
  than RAM, and read it entirely, without explicit madvise() to discard
  regions no longer needed, it will fill RAM and cause disk swapping. I
  should retest to confirm. Perhaps there are some OS level tuning to
  avoid that ?
 
 For Linux, if you mmap a file and do not write to it, the pages will
 be clean.  This means that under memory pressure those pages can be
 dropped without paging out to swap.  They are already backed on disk
 in the mmaped file.  Only dirty anonymous mapped pages (anon mmap,
 malloc() memory from mmap() or brk(), stack, etc.) would need to be
 written to swap.

Yes, that's the theory. But in practice, on my system ( kernel 2.6.32-46-
generic 64 bit - Ubuntu 10.04 - 4 GB RAM ), the system becomes rather 
unresponsive as soon as the process has read a part of the file that is 
equivalent to the initial remaining free RAM. The 'top' utility shows it to 
consume ~ 2.7 GB, which must be the free RAM.

Here's the test program I've used :

test_mmap.c :

#define _LARGEFILE64_SOURCE 1
#include sys/mman.h
#include sys/types.h
#include sys/stat.h
#include assert.h
#include fcntl.h
#include stdio.h
#include string.h
#include unistd.h

int main(int argc, char* argv[])
{
int fd;
struct stat64 buf;
char* ptr;
long long i;
int res = 0;
int bDontNeed = 0;

assert( argc == 2 || argc == 3 );
if( argc == 3  strcmp(argv[2], -dontneed) == 0 )
bDontNeed = 1;
fd = open(argv[1], O_RDONLY);
assert(fd = 0);
assert(stat64(argv[1], buf) == 0);
ptr = (char*) mmap(NULL, buf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
assert(ptr);
for(i = 0; i buf.st_size; i+= 4096)
{
/* Discard the pages every 500 MB read */
if( bDontNeed  ((i % (1024 * 1024 * 500)) == 0) )
madvise(ptr, buf.st_size, MADV_DONTNEED);

res += ptr[i];
}
close(fd);
return res;
}

$ gcc -Wall -g test_mmap.c -o test_mmap

$ ./test_mmap eudem_dem_4258_europe.tif
(the file is 20 GB large)

-- system becomes unresponsive

$ ./test_mmap eudem_dem_4258_europe.tif -dontneed

-- system remains usable. Every 500 MB read, a madvise() call will 
explicitely discard all pages. That's just for test. It couldn't be used in 
practice.

== Does anyone reproduce similar behaviour ?

 
 Of course if you touch a large amount of memory and know you're never
 use it again, you can help the OS out when it comes to deciding which
 pages to free by using madvise.
 
 One think to consider is that a 32-bit OS can only memory map about
 2-3 GB at once, even though there is no trouble using files much
 larger than this size.  If you want to access a large file with
 mmap(), you might need to use some kind of sliding window.

Yes, I'm well aware of that. But 32bit systems are now becoming increasingly 
legacy, so we shouldn't worry too much about them.

 
 I think also, mmaping many gigabytes has a certain cost in setting up
 the page tables for the mapping that's not insignificant.  Even on a
 64-bit os, mmaping a 20 GB file just to access some small portion of
 it could be inefficient.

Yes, I agree there are hidden costs in the memory management layers of the OS. 
Huge TLB pages (2 MB) on AMD64 systems can potentially be a solution to 
decrease that cost. I had started a bit to experiment with that, but my kernel 
was not recent enough to benefit from all functionnalities or it didn't seem 
really practical to use.

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-18 Thread Trent Piepho
Do you see page file activity?  If you look at /proc/pid/smaps, you
should be able to see the actual status of the mapping of your data
file.  Probably it is consuming a large number of pages of RAM, but
also there should be zero pages written to swap.  All clean private or
clean shared, zero anonymous and zero swap.

I think the system unresponsiveness is probably do to I/O scheduling.
You're process has queued a lot of I/O reads and everything else has
to wait in the queue.  So all other I/O sees huge latencies.

And too, a 20 GB mapping is probably thrashing the TLB.  Do huge pages
actually get used?  On the embedded systems I'm more intimately
familiar with, only normal 4k pages are used by user processes.  Huge
TLBs are more of a special case that can be used by the kernel for
things like frame buffer mappings and SoC register windows.


On Wed, Dec 18, 2013 at 2:02 PM, Even Rouault
even.roua...@mines-paris.org wrote:
 Le mercredi 18 décembre 2013 21:09:48, Trent Piepho a écrit :
 On Wed, Dec 18, 2013 at 11:46 AM, Even Rouault

 even.roua...@mines-paris.org wrote:
  Le mercredi 18 décembre 2013 19:53:37, Frank Warmerdam a écrit :
  I imagined an available virtual method on the band which could be
  implemented - primarily by the RawBand class to try and mmap() the data
  and return the layout.  But when that fails, or is unavailable it could
  use your existing methodology with a layout that seems well tuned to
  the underlying data organization.
 
  Yes, that should be doable, but with the limitation I raised about the
  memory management of file-based mmap() : if you mmap() a file larger
  than RAM, and read it entirely, without explicit madvise() to discard
  regions no longer needed, it will fill RAM and cause disk swapping. I
  should retest to confirm. Perhaps there are some OS level tuning to
  avoid that ?

 For Linux, if you mmap a file and do not write to it, the pages will
 be clean.  This means that under memory pressure those pages can be
 dropped without paging out to swap.  They are already backed on disk
 in the mmaped file.  Only dirty anonymous mapped pages (anon mmap,
 malloc() memory from mmap() or brk(), stack, etc.) would need to be
 written to swap.

 Yes, that's the theory. But in practice, on my system ( kernel 2.6.32-46-
 generic 64 bit - Ubuntu 10.04 - 4 GB RAM ), the system becomes rather
 unresponsive as soon as the process has read a part of the file that is
 equivalent to the initial remaining free RAM. The 'top' utility shows it to
 consume ~ 2.7 GB, which must be the free RAM.

 Here's the test program I've used :

 test_mmap.c :

 #define _LARGEFILE64_SOURCE 1
 #include sys/mman.h
 #include sys/types.h
 #include sys/stat.h
 #include assert.h
 #include fcntl.h
 #include stdio.h
 #include string.h
 #include unistd.h

 int main(int argc, char* argv[])
 {
 int fd;
 struct stat64 buf;
 char* ptr;
 long long i;
 int res = 0;
 int bDontNeed = 0;

 assert( argc == 2 || argc == 3 );
 if( argc == 3  strcmp(argv[2], -dontneed) == 0 )
 bDontNeed = 1;
 fd = open(argv[1], O_RDONLY);
 assert(fd = 0);
 assert(stat64(argv[1], buf) == 0);
 ptr = (char*) mmap(NULL, buf.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
 assert(ptr);
 for(i = 0; i buf.st_size; i+= 4096)
 {
 /* Discard the pages every 500 MB read */
 if( bDontNeed  ((i % (1024 * 1024 * 500)) == 0) )
 madvise(ptr, buf.st_size, MADV_DONTNEED);

 res += ptr[i];
 }
 close(fd);
 return res;
 }

 $ gcc -Wall -g test_mmap.c -o test_mmap

 $ ./test_mmap eudem_dem_4258_europe.tif
 (the file is 20 GB large)

 -- system becomes unresponsive

 $ ./test_mmap eudem_dem_4258_europe.tif -dontneed

 -- system remains usable. Every 500 MB read, a madvise() call will
 explicitely discard all pages. That's just for test. It couldn't be used in
 practice.

 == Does anyone reproduce similar behaviour ?


 Of course if you touch a large amount of memory and know you're never
 use it again, you can help the OS out when it comes to deciding which
 pages to free by using madvise.

 One think to consider is that a 32-bit OS can only memory map about
 2-3 GB at once, even though there is no trouble using files much
 larger than this size.  If you want to access a large file with
 mmap(), you might need to use some kind of sliding window.

 Yes, I'm well aware of that. But 32bit systems are now becoming increasingly
 legacy, so we shouldn't worry too much about them.


 I think also, mmaping many gigabytes has a certain cost in setting up
 the page tables for the mapping that's not insignificant.  Even on a
 64-bit os, mmaping a 20 GB file just to access some small portion of
 it could be inefficient.

 Yes, I agree there are hidden costs in the memory management layers of the OS.
 Huge TLB pages (2 MB) on AMD64 systems can potentially be a solution to
 decrease that cost. I had started a bit to experiment with that, but my kernel
 was not 

[gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-17 Thread Even Rouault
Hi,

This is a call for discussion for RFC 45: GDAL datasets and raster bands as 
virtual memory mappings

Beginning of the RFC inline (the full RFC includes a few colorful schemas !) :



== Summary ==

This document proposes additions to GDAL so that image data of GDAL datasets 
and
raster bands can be seen as virtual memory mappings, for hopefully simpler 
usage.

== Rationale ==

When one wants to read or write image data from/into a GDAL dataset or raster
band, one must use the RasterIO() interface for the regions of interest that
are read or written. For small images, the most convenient solution is usually
to read/write the whole image in a single request where the region of interest
is the full raster extent. For larger images, particularly when they do not
fit entirely in RAM, this is not possible, and if one wants to operate on the
whole image, one must use a windowing strategy to avoid memory issues : 
typically
by proceeding scanline (or group of scanlines) by scanline, or by blocks for 
tiled
images. This can make the writing of algorithms more complicated when they 
need
to access a neighbourhoud of pixels around each pixel of interest, since the 
size of this
extra window must be taken into account, leading to overlapping regions of
interests. Nothing that cannot be solved, but that requires some additional
thinking that distracts from the followed main purpose.

The proposed addition of this RFC is to make the image data appear as a single
array accessed with a pointer, without being limited by the size of RAM with
respect to the size of the dataset (excepted limitations imposed by the CPU
architecture and the operating system)



Best regards,

Even

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-17 Thread Even Rouault
Le mardi 17 décembre 2013 21:54:31, Even Rouault a écrit :
 Hi,
 
 This is a call for discussion for RFC 45: GDAL datasets and raster bands
 as virtual memory mappings

Here's the link to the RFC :

http://trac.osgeo.org/gdal/wiki/rfc45_virtualmem

Even

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev


Re: [gdal-dev] Call for discussion on RFC 45: GDAL datasets and raster bands as virtual memory mappings

2013-12-17 Thread Frank Warmerdam
Even,

Very impressive work, I am supportive.

IMHO it would be wonderful if there was also an mmap() based mechanism
where you could ask for the virtual memory chunk and you get it back (if it
works) along with stride values to access in it.  This could likely be made
to work for most raw based formats and a few others too.  It might also
allow non-mmap() based files to return an organization based more on their
actual organization for efficiency.

Best regards,
Frank



On Tue, Dec 17, 2013 at 1:01 PM, Even Rouault
even.roua...@mines-paris.orgwrote:

 Le mardi 17 décembre 2013 21:54:31, Even Rouault a écrit :
  Hi,
 
  This is a call for discussion for RFC 45: GDAL datasets and raster bands
  as virtual memory mappings

 Here's the link to the RFC :

 http://trac.osgeo.org/gdal/wiki/rfc45_virtualmem

 Even

 --
 Geospatial professional services
 http://even.rouault.free.fr/services.html
 ___
 gdal-dev mailing list
 gdal-dev@lists.osgeo.org
 http://lists.osgeo.org/mailman/listinfo/gdal-dev




-- 
---+--
I set the clouds in motion - turn up   | Frank Warmerdam,
warmer...@pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush| Geospatial Software Developer
___
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev