That is great josh! Is the goal to contribute this to imageio-ext? If you can try and cite your sources in the header it will help ..
On Fri, Aug 2, 2019 at 1:03 PM Josh Fix via GeoTools-Devel < geotools-devel@lists.sourceforge.net> wrote: > So here is my first stab at the reader that supports VFS paths: > > https://github.com/joshfix/gdal-vfs-reader > > It is very much a prototype/WIP. I would guess that 95% of the code was > simply copied and pasted from various sources in imageio-ext and geotools > and reassembled into these new classes. I'm sure there are things I've > overlooked, but the basic functionality seems to work. The input stream > implementation is a bit precarious and I'm not entirely certain how to > handle it, but the image mosaic configuration requires an implementation. > Also, there is a lot of work yet to be done on how to handle parsing > various URLs to proper virtual paths. > > I would love to get this more "production-ready" and would be very happy > if anybody would be able to take a look at the project and provide feedback. > > On Fri, Jun 28, 2019 at 10:07 AM Josh Fix <j...@federal.planet.com> wrote: > >> Thanks for the insite Andrea. I find myself getting mixed up quite a bit >> between the imageio-ext classes, the geotools class, the image readers and >> the coverage readers. I suppose my ultimate goal would be to support a >> GridCoverage2DReader implementation that accepts vsicurl Strings (or likely >> some sort of wrapper class that does regex validation as you mentioned). I >> am definitely on the same page and understanding that we do not want to be >> reading streams ourselves, and want gdal to handle all of the magic. It >> was interesting to see that even when I pass some version of a GDAL image >> reader SPI to the GridCoverage2DReader, it would use the image reader to >> call gdal.Open to read the image metadata, but the actual read() method to >> read the image data itself was still using a file stream via the >> RasterLayerRequest/RasterLayerResponse objects. So I don't fully >> understand where the actual starting point for implementing this would be. >> It seems like that read method should be deferring to the image reader >> passed in? >> >> Understood re: the s3-geotiff plugin not being a reader itself and just >> extending GeoTiffReader. Having the /vsi capability seems like it would >> replace the need for any cloud specific inputstream implementation and >> GeoTiffReader (or other format) extension (eg the s3-geotiff plugin or the >> azure-geotiff plugin), but it also sounds like a great idea to update the >> imageio-ext >> tiff reader. >> >> Once everything settles down and everybody has more time to think about >> this and come up with a plan, I am definitely more than happy to help with >> the implementation. >> >> On Fri, Jun 28, 2019 at 1:50 AM Andrea Aime <andrea.a...@geo-solutions.it> >> wrote: >> >>> Hi Josh, >>> I am under the impression that things are getting mixed up, but have >>> little time to write this mail and none to check >>> what I'm writing, so please take it with a pinch of salt. >>> >>> If you want to support visicurl directly from imageio-ext gdal based >>> readers, IMHO all the discussion about image >>> streams and the like is useless, because what we do in the end is to >>> give GDAL a source and directions of >>> what we want it to read, and get back on the other end a result, >>> everything related to efficient COG support >>> and eventual memory caching happens inside GDAL itself. What would be >>> needed there, is to allow a "/vsi..." >>> source to be passed in as a plain string, without trying to validate it >>> as a file. Maybe not just any string, >>> a light regex based validation could be applied in case GDAL does not >>> give suitable error messages. >>> >>> ---------------- >>> The above makes sense if you want to use GDAL readers directly. If >>> instead you want to use the pure java TIFF >>> reader, the part below applies. They should not be mixed. >>> ---------------- >>> >>> When it comes to "The current s3-geotiff reader reads byte chunks from >>> s3 and does not support COG" >>> well, the s3-geotiff is not a geotiff reader at all, it's just a >>> ImageInputStream allowing the pure java >>> imageio-ext tiff reader to work. >>> There are two issues there: >>> >>> - The imageio-ext tiff reader has not been redesigned to take >>> adavantage of the COG structure and does a number of small reads, and >>> jumps >>> around. It should be modified to take advantage of COG structure instead. >>> - The S3 based image input stream is, at least in its default >>> configuration, quite bad in terms of IO, the way I see it, it works fine >>> only when the relevant portion of the file is already in the local memory >>> or disk cache >>> >>> To have efficient COG support the reader should be modified first, and >>> using a simple, non caching, block oriented HTTPImageInputStream (to be >>> written!) >>> and then we can see where and how caching helps (afaik GDAL does not do >>> much of it and it's still working fine) >>> >>> Cheers >>> Andrea >>> >>> >>> >>> On Wed, Jun 26, 2019 at 7:46 PM Josh Fix via GeoTools-Devel < >>> geotools-devel@lists.sourceforge.net> wrote: >>> >>>> Thanks so much for the response and I'm excited to hear that there is >>>> interest. Does any of the existing GDAL code specifically support COGs? My >>>> imagery skills aren't very strong, but I would hope that we would be able >>>> to use the request geometry to only request the specific range of bytes >>>> required. I'm assuming this is the part that would be handled by gdal >>>> itself, however I don't know what is necessary to make the request. >>>> Looking at these sample requests: >>>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF I would assume >>>> we'd have to convert the geo coords into the pixel space of the image, then >>>> pass those coords in the request (to translate? warp?) and gdal would know >>>> which ranges to request. >>>> >>>> Not that this is any sort of revelation, but building the right Format >>>> implementation will also be important. I've found that I can pass an http >>>> url as a String directly to GeoTiffReader and it will read it fine, however >>>> when used as part of an ImageMosaic, it gets converted to a URL and the >>>> GeoTiffFormat class will throw an exception if the URL is anything other >>>> than a URL to a file. In this instance, I've had to provide a custom >>>> GeoTiffFormat implementation to not error out if the URL doesn't point to a >>>> file, and continue to build the GeoTiffReader and pass along the string >>>> value of the URL. >>>> >>>> The current s3-geotiff reader reads byte chunks from s3 and does not >>>> support COG. It also uses a non-standard format where the protocol is s3, >>>> and it strips out the last 2 parts of the path to determine the bucket and >>>> key. Then it relies on configuration to determine the region, etc. The >>>> Azure GeoTIFF reader I built follows the exact same code structure but >>>> implements the Azure API, but allows azure-specific "wasb://" and >>>> "wasbs://" protocols in the URLs. I was well on my way to creating a >>>> custom "HttpGeoTiffReader" when I started encountering the issues that >>>> started this thread. All that is to say, none of that seems optimal and >>>> building support for VFS would mean all of these can be replaced, as it >>>> supports all major cloud providers and would have baked-in COG support. >>>> Pretty exciting :) >>>> >>>> Additionally, it would be interesting to support some lightweight >>>> in-memory caching (or potentially external caching?) in the same way the >>>> s3-geotiff reader uses ehcache. And finally, support for async requests to >>>> assist when there may be multiple concurrent images/granules being >>>> requested. >>>> >>>> Thanks! >>>> >>>> Josh >>>> >>>> On Wed, Jun 26, 2019 at 10:13 AM Daniele Romagnoli < >>>> daniele.romagn...@geo-solutions.it> wrote: >>>> >>>>> Hi Josh, >>>>> I might be back with some more helpful feedbacks in the next few days >>>>> but I wanted to provide at least my reply right now, since I started >>>>> ImageIO-EXT/GeoTools/GeoServer - GDAL support/integration many years ago. >>>>> I think that supporting VFS would be a great and interesting >>>>> contribution to the project. >>>>> I need to check back the code more in detail since it's a couple of >>>>> years I didn't touch it, so I don't have precise feedbacks right now :) >>>>> Anyway, I think that a couple of key points are: >>>>> - updating the ImageIO-EXT low level reader machinery to handle that >>>>> new type of input. ImageReaders have a "setInput" method accepting an >>>>> object. The ImageReaderSPI will tell which types of input objects are >>>>> supported. SPI also have a canDecodeInput method which tell if the >>>>> provided >>>>> input can be decoded. I think that the currently supported inputs are >>>>> Files, URLs, Strings (representing URL or Files), and >>>>> FileImageInputStream*. Note that, in the past, we also had to support an >>>>> ECWP protocol as format requirement, so we had to setup an ECWP input >>>>> stream. However, I think that it has never being used within >>>>> GeoTools/GeoServer in practice. >>>>> - updating the GeoTools GDAL base classes. As you said, the >>>>> RasterLayerResponse is trying to setup a FileImageInputStreamExt because >>>>> the 99% of the GDAL ImageIO-Ext readers use that. However, if my memory >>>>> serves me right, the base GridCoverage2DReader abstract classes accept a >>>>> generic Object input (as well as the ImageReader as said before) so we may >>>>> "relax" the implementation to also support the VFS. >>>>> >>>>> As I said, sorry for these "generic" feedbacks. Hope they can be used >>>>> as a starting point for further investigations. >>>>> >>>>> Best Regards, >>>>> Daniele >>>>> >>>>> >>>>> >>>>> On Tue, Jun 25, 2019 at 5:36 PM Josh Fix via GeoTools-Devel < >>>>> geotools-devel@lists.sourceforge.net> wrote: >>>>> >>>>>> Hi all. I have code that builds an ImageMosaicReader that utilizes a >>>>>> custom format and input stream SPI. I define the classes I want to use >>>>>> in >>>>>> the Properties class used by the GranuleCatalog and in the >>>>>> CatalogConfigurationBean used by the >>>>>> ImageMosaicDescriptor/ImageMosaicReader. Generally, it looks something >>>>>> like this: >>>>>> >>>>>> props.put(Utils.Prop.SUGGESTED_IS_SPI, >>>>>> MyImageInputStreamSpi.class.getCanonicalName()); >>>>>> props.put(Utils.Prop.SUGGESTED_FORMAT, >>>>>> MyGeoTiffFormat.class.getCanonicalName()); >>>>>> props.put(Utils.Prop.SUGGESTED_SPI, >>>>>> "it.geosolutions.imageioimpl.plugins.tiff.TIFFImageReaderSpi"); >>>>>> >>>>>> This has worked great in the past using custom input stream classes >>>>>> to read from S3 and Azure. My current goal is to be able to create >>>>>> mosaics >>>>>> from any given http endpoint, but specifically focus on cloud optimized >>>>>> geotiffs ( >>>>>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF#HowtoreaditwithGDAL) >>>>>> using the "vsicurl" VFS. I've written GDAL code in the past to open >>>>>> geotiffs on S3 using the /vsis3 VFS and it was incredibly efficient, so I >>>>>> sought out to do the same in GeoTools for my mosaic. >>>>>> >>>>>> I was delighted to discover many modules supporting GDAL readers. I >>>>>> started to experiment with >>>>>> https://github.com/geosolutions-it/imageio-ext/blob/master/plugin/gdal/gdalgeotiff/src/main/java/it/geosolutions/imageio/plugins/geotiff/GeoTiffImageReader.java >>>>>> and >>>>>> created my own Format implementation using the packages in >>>>>> https://github.com/geotools/geotools/tree/master/modules/plugin/imageio-ext-gdal/src/main/java/org/geotools/coverageio/gdal >>>>>> as >>>>>> templates. I created a simple GeoTiffReader class that extends from >>>>>> BaseGdalReader, which is responsible for creating the >>>>>> GeoTiffImageReaderSpi >>>>>> and GeoTiffFormat instances. >>>>>> >>>>>> A VFS URL takes the form: "/vsicurl/ >>>>>> https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/153/075/LC08_L1TP_153075_20190515_20190515_01_RT/LC08_L1TP_153075_20190515_20190515_01_RT_B3.TIF"... >>>>>> simply taking the http/https url an prefixing it with "/vsicurl/". This >>>>>> string can be passed directly to `gdal.Open()` and it will return a >>>>>> Dataset. >>>>>> >>>>>> The issue is that the input object that ends up being passed to >>>>>> BaseGridCoverage2DReader ends up expecting a file, either in the form of >>>>>> a >>>>>> URL, FileInputStreamExt, or String (see >>>>>> BaseGridCoverage2DReader.checkSource). I thought I might be able to >>>>>> trick >>>>>> it by passing in the vsicurl path prefixed with the file protocol >>>>>> (file:///vsicurl/https://s3-us-west...). The issue is that when >>>>>> Java returns the file path, it removes the second forward slash from >>>>>> https://, leaving you with the path "/vsicurl/https:/s3-us-west...", >>>>>> which gdal does not like. >>>>>> >>>>>> I duplicated a sufficient number of GeoTools classes to be able to >>>>>> add in simple string replacements for "https:/" -> "https://" where >>>>>> needed. This allowed me to create the reader and successfully create the >>>>>> metadata object (in GDALImageReader.setInput). The >>>>>> BseGridCoverage2DReader >>>>>> constructor completes fully and all of the coverage properties, layout, >>>>>> resolution info, etc is built. At this point I was very hopeful that the >>>>>> only change necessary was modifying these classes to not only support >>>>>> files, but also strings that begin with "/vsi". >>>>>> >>>>>> Unfortunately this idea fell apart when I called "reader.read()". >>>>>> Long story short, the RasterLayerResponse object tries to create >>>>>> a FileImageInputStreamExtImpl, so it's not just reading the dataset using >>>>>> `gdal.Open`. At this point I'm not 100% certain what the best path >>>>>> forward >>>>>> would be. I know conversations in the past have centered around adding >>>>>> and >>>>>> supporting custom protocols to java.net.URL, but this is unique in that >>>>>> these "paths" are not valid URLs or file paths due to the double forward >>>>>> slash in the middle of the path. >>>>>> >>>>>> I would like to know if supporting VFS is something the community is >>>>>> interested in. I intend to continue to work on a solution for myself, >>>>>> but >>>>>> would love to work with you all and contribute back if there is interest. >>>>>> >>>>>> I apologize for the long read. I hope everything makes sense and >>>>>> please let me know if there are any questions. >>>>>> >>>>>> Josh >>>>>> _______________________________________________ >>>>>> GeoTools-Devel mailing list >>>>>> GeoTools-Devel@lists.sourceforge.net >>>>>> https://lists.sourceforge.net/lists/listinfo/geotools-devel >>>>>> >>>>> >>>>> >>>>> -- >>>>> Regards, >>>>> Daniele Romagnoli >>>>> == >>>>> GeoServer Professional Services from the experts! Visit >>>>> http://goo.gl/it488V for more information. >>>>> == >>>>> >>>>> Ing. Daniele Romagnoli >>>>> Senior Software Engineer >>>>> >>>>> GeoSolutions S.A.S. >>>>> Via di Montramito 3/A >>>>> <https://www.google.com/maps/search/Via+di+Montramito+3%2FA+55054+%C2%A0Massarosa?entry=gmail&source=g> >>>>> 55054 Massarosa >>>>> <https://www.google.com/maps/search/Via+di+Montramito+3%2FA+55054+%C2%A0Massarosa?entry=gmail&source=g> >>>>> (LU) >>>>> Italy >>>>> phone: +39 0584 962313 >>>>> fax: +39 0584 1660272 >>>>> >>>>> http://www.geo-solutions.it >>>>> http://twitter.com/geosolutions_it >>>>> >>>>> ------------------------------------------------------- >>>>> >>>>> Con riferimento alla normativa sul trattamento dei dati personali >>>>> (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati >>>>> “GDPR”), >>>>> si precisa che ogni circostanza inerente alla presente email (il suo >>>>> contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è >>>>> riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il >>>>> messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra >>>>> operazione è illecita. Le sarei comunque grato se potesse darmene notizia. >>>>> >>>>> This email is intended only for the person or entity to which it is >>>>> addressed and may contain information that is privileged, confidential or >>>>> otherwise protected from disclosure. We remind that - as provided by >>>>> European Regulation 2016/679 “GDPR” - copying, dissemination or use of >>>>> this >>>>> e-mail or the information herein by anyone other than the intended >>>>> recipient is prohibited. If you have received this email by mistake, >>>>> please >>>>> notify us immediately by telephone or e-mail. >>>>> >>>> _______________________________________________ >>>> GeoTools-Devel mailing list >>>> GeoTools-Devel@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/geotools-devel >>>> >>> >>> >>> -- >>> >>> Regards, Andrea Aime == GeoServer Professional Services from the >>> experts! Visit http://goo.gl/it488V for more information. == Ing. >>> Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di >>> Montramito 3/A 55054 Massarosa >>> <https://www.google.com/maps/search/Via+di+Montramito+3%2FA%0D%0A55054++Massarosa?entry=gmail&source=g> >>> (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 >>> http://www.geo-solutions.it http://twitter.com/geosolutions_it >>> ------------------------------------------------------- *Con >>> riferimento alla normativa sul trattamento dei dati personali (Reg. UE >>> 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si >>> precisa che ogni circostanza inerente alla presente email (il suo >>> contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è >>> riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il >>> messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra >>> operazione è illecita. Le sarei comunque grato se potesse darmene notizia. >>> This email is intended only for the person or entity to which it is >>> addressed and may contain information that is privileged, confidential or >>> otherwise protected from disclosure. We remind that - as provided by >>> European Regulation 2016/679 “GDPR” - copying, dissemination or use of this >>> e-mail or the information herein by anyone other than the intended >>> recipient is prohibited. If you have received this email by mistake, please >>> notify us immediately by telephone or e-mail.* >>> >> > > -- > Josh Fix > Shuttle Commander > Planet Federal > +1 321.444.0412 > j...@federal.planet.com > https://federal.planet.com > _______________________________________________ > GeoTools-Devel mailing list > GeoTools-Devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/geotools-devel > -- -- Jody Garnett
_______________________________________________ GeoTools-Devel mailing list GeoTools-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geotools-devel