I have been informed by a colleague attempting to convert a 1.4GB TIF file using gdal_polygonize.py on a g2.2xlarge Amazon instance (8 vCPU, 15gb RAM) that the processing took over 2 weeks running constantly. I have also been told that the same conversion using commercial tooling was completed in a few minutes.
As a result, I'm currently investigating to see if there is an opportunity for improving the performance of the gdal_polygonize.py TIF to JSON conversion. I have run a strace while attempting the same conversion, but stopped after a few hours (the gdal_polygonize.py status indicator was showing between 5% and 7.5% complete). The strace results are: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 99.40 2.348443 9 252474 read 0.18 0.004139 3 1554 lseek 0.12 0.002878 7 439 269 open 0.10 0.002447 20 123 87 stat 0.10 0.002429 5 459 mmap 0.02 0.000561 3 208 munmap 0.02 0.000529 2 216 fstat 0.02 0.000504 3 188 mprotect 0.01 0.000314 2 188 brk 0.01 0.000173 29 6 5 unlink 0.00 0.000109 11 10 getdents 0.00 0.000098 1 67 rt_sigaction 0.00 0.000000 0 4 write 0.00 0.000000 0 173 close 0.00 0.000000 0 12 lstat 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 2 rt_sigreturn 0.00 0.000000 0 5 1 ioctl 0.00 0.000000 0 91 91 access 0.00 0.000000 0 20 mremap 0.00 0.000000 0 5 3 execve 0.00 0.000000 0 1 getcwd 0.00 0.000000 0 4 2 readlink 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 1 getuid 0.00 0.000000 0 1 getgid 0.00 0.000000 0 1 geteuid 0.00 0.000000 0 1 getegid 0.00 0.000000 0 2 arch_prctl 0.00 0.000000 0 4 1 futex 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 5 openat 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 2.362624 256268 459 total FYI - I performed my test inside a vagrant virtualbox guest with 30GB memory and 8 CPUS assigned to the guest. It appears that the input TIF file is read in small pieces at a time. I have shared the results here in case any one else is looking at optimising the performance of the conversion or already has ideas where the code can be optimised. Best regards, Chris
_______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
