So here is my first stab at the reader that supports VFS paths: https://github.com/joshfix/gdal-vfs-reader
It is very much a prototype/WIP. I would guess that 95% of the code was simply copied and pasted from various sources in imageio-ext and geotools and reassembled into these new classes. I'm sure there are things I've overlooked, but the basic functionality seems to work. The input stream implementation is a bit precarious and I'm not entirely certain how to handle it, but the image mosaic configuration requires an implementation. Also, there is a lot of work yet to be done on how to handle parsing various URLs to proper virtual paths. I would love to get this more "production-ready" and would be very happy if anybody would be able to take a look at the project and provide feedback. On Fri, Jun 28, 2019 at 10:07 AM Josh Fix <j...@federal.planet.com> wrote: > Thanks for the insite Andrea. I find myself getting mixed up quite a bit > between the imageio-ext classes, the geotools class, the image readers and > the coverage readers. I suppose my ultimate goal would be to support a > GridCoverage2DReader implementation that accepts vsicurl Strings (or likely > some sort of wrapper class that does regex validation as you mentioned). I > am definitely on the same page and understanding that we do not want to be > reading streams ourselves, and want gdal to handle all of the magic. It > was interesting to see that even when I pass some version of a GDAL image > reader SPI to the GridCoverage2DReader, it would use the image reader to > call gdal.Open to read the image metadata, but the actual read() method to > read the image data itself was still using a file stream via the > RasterLayerRequest/RasterLayerResponse objects. So I don't fully > understand where the actual starting point for implementing this would be. > It seems like that read method should be deferring to the image reader > passed in? > > Understood re: the s3-geotiff plugin not being a reader itself and just > extending GeoTiffReader. Having the /vsi capability seems like it would > replace the need for any cloud specific inputstream implementation and > GeoTiffReader (or other format) extension (eg the s3-geotiff plugin or the > azure-geotiff plugin), but it also sounds like a great idea to update the > imageio-ext > tiff reader. > > Once everything settles down and everybody has more time to think about > this and come up with a plan, I am definitely more than happy to help with > the implementation. > > On Fri, Jun 28, 2019 at 1:50 AM Andrea Aime <andrea.a...@geo-solutions.it> > wrote: > >> Hi Josh, >> I am under the impression that things are getting mixed up, but have >> little time to write this mail and none to check >> what I'm writing, so please take it with a pinch of salt. >> >> If you want to support visicurl directly from imageio-ext gdal based >> readers, IMHO all the discussion about image >> streams and the like is useless, because what we do in the end is to give >> GDAL a source and directions of >> what we want it to read, and get back on the other end a result, >> everything related to efficient COG support >> and eventual memory caching happens inside GDAL itself. What would be >> needed there, is to allow a "/vsi..." >> source to be passed in as a plain string, without trying to validate it >> as a file. Maybe not just any string, >> a light regex based validation could be applied in case GDAL does not >> give suitable error messages. >> >> ---------------- >> The above makes sense if you want to use GDAL readers directly. If >> instead you want to use the pure java TIFF >> reader, the part below applies. They should not be mixed. >> ---------------- >> >> When it comes to "The current s3-geotiff reader reads byte chunks from s3 >> and does not support COG" >> well, the s3-geotiff is not a geotiff reader at all, it's just a >> ImageInputStream allowing the pure java >> imageio-ext tiff reader to work. >> There are two issues there: >> >> - The imageio-ext tiff reader has not been redesigned to take >> adavantage of the COG structure and does a number of small reads, and >> jumps >> around. It should be modified to take advantage of COG structure instead. >> - The S3 based image input stream is, at least in its default >> configuration, quite bad in terms of IO, the way I see it, it works fine >> only when the relevant portion of the file is already in the local memory >> or disk cache >> >> To have efficient COG support the reader should be modified first, and >> using a simple, non caching, block oriented HTTPImageInputStream (to be >> written!) >> and then we can see where and how caching helps (afaik GDAL does not do >> much of it and it's still working fine) >> >> Cheers >> Andrea >> >> >> >> On Wed, Jun 26, 2019 at 7:46 PM Josh Fix via GeoTools-Devel < >> geotools-devel@lists.sourceforge.net> wrote: >> >>> Thanks so much for the response and I'm excited to hear that there is >>> interest. Does any of the existing GDAL code specifically support COGs? My >>> imagery skills aren't very strong, but I would hope that we would be able >>> to use the request geometry to only request the specific range of bytes >>> required. I'm assuming this is the part that would be handled by gdal >>> itself, however I don't know what is necessary to make the request. >>> Looking at these sample requests: >>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF I would assume >>> we'd have to convert the geo coords into the pixel space of the image, then >>> pass those coords in the request (to translate? warp?) and gdal would know >>> which ranges to request. >>> >>> Not that this is any sort of revelation, but building the right Format >>> implementation will also be important. I've found that I can pass an http >>> url as a String directly to GeoTiffReader and it will read it fine, however >>> when used as part of an ImageMosaic, it gets converted to a URL and the >>> GeoTiffFormat class will throw an exception if the URL is anything other >>> than a URL to a file. In this instance, I've had to provide a custom >>> GeoTiffFormat implementation to not error out if the URL doesn't point to a >>> file, and continue to build the GeoTiffReader and pass along the string >>> value of the URL. >>> >>> The current s3-geotiff reader reads byte chunks from s3 and does not >>> support COG. It also uses a non-standard format where the protocol is s3, >>> and it strips out the last 2 parts of the path to determine the bucket and >>> key. Then it relies on configuration to determine the region, etc. The >>> Azure GeoTIFF reader I built follows the exact same code structure but >>> implements the Azure API, but allows azure-specific "wasb://" and >>> "wasbs://" protocols in the URLs. I was well on my way to creating a >>> custom "HttpGeoTiffReader" when I started encountering the issues that >>> started this thread. All that is to say, none of that seems optimal and >>> building support for VFS would mean all of these can be replaced, as it >>> supports all major cloud providers and would have baked-in COG support. >>> Pretty exciting :) >>> >>> Additionally, it would be interesting to support some lightweight >>> in-memory caching (or potentially external caching?) in the same way the >>> s3-geotiff reader uses ehcache. And finally, support for async requests to >>> assist when there may be multiple concurrent images/granules being >>> requested. >>> >>> Thanks! >>> >>> Josh >>> >>> On Wed, Jun 26, 2019 at 10:13 AM Daniele Romagnoli < >>> daniele.romagn...@geo-solutions.it> wrote: >>> >>>> Hi Josh, >>>> I might be back with some more helpful feedbacks in the next few days >>>> but I wanted to provide at least my reply right now, since I started >>>> ImageIO-EXT/GeoTools/GeoServer - GDAL support/integration many years ago. >>>> I think that supporting VFS would be a great and interesting >>>> contribution to the project. >>>> I need to check back the code more in detail since it's a couple of >>>> years I didn't touch it, so I don't have precise feedbacks right now :) >>>> Anyway, I think that a couple of key points are: >>>> - updating the ImageIO-EXT low level reader machinery to handle that >>>> new type of input. ImageReaders have a "setInput" method accepting an >>>> object. The ImageReaderSPI will tell which types of input objects are >>>> supported. SPI also have a canDecodeInput method which tell if the provided >>>> input can be decoded. I think that the currently supported inputs are >>>> Files, URLs, Strings (representing URL or Files), and >>>> FileImageInputStream*. Note that, in the past, we also had to support an >>>> ECWP protocol as format requirement, so we had to setup an ECWP input >>>> stream. However, I think that it has never being used within >>>> GeoTools/GeoServer in practice. >>>> - updating the GeoTools GDAL base classes. As you said, the >>>> RasterLayerResponse is trying to setup a FileImageInputStreamExt because >>>> the 99% of the GDAL ImageIO-Ext readers use that. However, if my memory >>>> serves me right, the base GridCoverage2DReader abstract classes accept a >>>> generic Object input (as well as the ImageReader as said before) so we may >>>> "relax" the implementation to also support the VFS. >>>> >>>> As I said, sorry for these "generic" feedbacks. Hope they can be used >>>> as a starting point for further investigations. >>>> >>>> Best Regards, >>>> Daniele >>>> >>>> >>>> >>>> On Tue, Jun 25, 2019 at 5:36 PM Josh Fix via GeoTools-Devel < >>>> geotools-devel@lists.sourceforge.net> wrote: >>>> >>>>> Hi all. I have code that builds an ImageMosaicReader that utilizes a >>>>> custom format and input stream SPI. I define the classes I want to use in >>>>> the Properties class used by the GranuleCatalog and in the >>>>> CatalogConfigurationBean used by the >>>>> ImageMosaicDescriptor/ImageMosaicReader. Generally, it looks something >>>>> like this: >>>>> >>>>> props.put(Utils.Prop.SUGGESTED_IS_SPI, >>>>> MyImageInputStreamSpi.class.getCanonicalName()); >>>>> props.put(Utils.Prop.SUGGESTED_FORMAT, >>>>> MyGeoTiffFormat.class.getCanonicalName()); >>>>> props.put(Utils.Prop.SUGGESTED_SPI, >>>>> "it.geosolutions.imageioimpl.plugins.tiff.TIFFImageReaderSpi"); >>>>> >>>>> This has worked great in the past using custom input stream classes to >>>>> read from S3 and Azure. My current goal is to be able to create mosaics >>>>> from any given http endpoint, but specifically focus on cloud optimized >>>>> geotiffs ( >>>>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF#HowtoreaditwithGDAL) >>>>> using the "vsicurl" VFS. I've written GDAL code in the past to open >>>>> geotiffs on S3 using the /vsis3 VFS and it was incredibly efficient, so I >>>>> sought out to do the same in GeoTools for my mosaic. >>>>> >>>>> I was delighted to discover many modules supporting GDAL readers. I >>>>> started to experiment with >>>>> https://github.com/geosolutions-it/imageio-ext/blob/master/plugin/gdal/gdalgeotiff/src/main/java/it/geosolutions/imageio/plugins/geotiff/GeoTiffImageReader.java >>>>> and >>>>> created my own Format implementation using the packages in >>>>> https://github.com/geotools/geotools/tree/master/modules/plugin/imageio-ext-gdal/src/main/java/org/geotools/coverageio/gdal >>>>> as >>>>> templates. I created a simple GeoTiffReader class that extends from >>>>> BaseGdalReader, which is responsible for creating the >>>>> GeoTiffImageReaderSpi >>>>> and GeoTiffFormat instances. >>>>> >>>>> A VFS URL takes the form: "/vsicurl/ >>>>> https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/153/075/LC08_L1TP_153075_20190515_20190515_01_RT/LC08_L1TP_153075_20190515_20190515_01_RT_B3.TIF"... >>>>> simply taking the http/https url an prefixing it with "/vsicurl/". This >>>>> string can be passed directly to `gdal.Open()` and it will return a >>>>> Dataset. >>>>> >>>>> The issue is that the input object that ends up being passed to >>>>> BaseGridCoverage2DReader ends up expecting a file, either in the form of a >>>>> URL, FileInputStreamExt, or String (see >>>>> BaseGridCoverage2DReader.checkSource). I thought I might be able to trick >>>>> it by passing in the vsicurl path prefixed with the file protocol >>>>> (file:///vsicurl/https://s3-us-west...). The issue is that when Java >>>>> returns the file path, it removes the second forward slash from https://, >>>>> leaving you with the path "/vsicurl/https:/s3-us-west...", which gdal does >>>>> not like. >>>>> >>>>> I duplicated a sufficient number of GeoTools classes to be able to add >>>>> in simple string replacements for "https:/" -> "https://" where >>>>> needed. This allowed me to create the reader and successfully create the >>>>> metadata object (in GDALImageReader.setInput). The >>>>> BseGridCoverage2DReader >>>>> constructor completes fully and all of the coverage properties, layout, >>>>> resolution info, etc is built. At this point I was very hopeful that the >>>>> only change necessary was modifying these classes to not only support >>>>> files, but also strings that begin with "/vsi". >>>>> >>>>> Unfortunately this idea fell apart when I called "reader.read()". >>>>> Long story short, the RasterLayerResponse object tries to create >>>>> a FileImageInputStreamExtImpl, so it's not just reading the dataset using >>>>> `gdal.Open`. At this point I'm not 100% certain what the best path >>>>> forward >>>>> would be. I know conversations in the past have centered around adding >>>>> and >>>>> supporting custom protocols to java.net.URL, but this is unique in that >>>>> these "paths" are not valid URLs or file paths due to the double forward >>>>> slash in the middle of the path. >>>>> >>>>> I would like to know if supporting VFS is something the community is >>>>> interested in. I intend to continue to work on a solution for myself, but >>>>> would love to work with you all and contribute back if there is interest. >>>>> >>>>> I apologize for the long read. I hope everything makes sense and >>>>> please let me know if there are any questions. >>>>> >>>>> Josh >>>>> _______________________________________________ >>>>> GeoTools-Devel mailing list >>>>> GeoTools-Devel@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/geotools-devel >>>>> >>>> >>>> >>>> -- >>>> Regards, >>>> Daniele Romagnoli >>>> == >>>> GeoServer Professional Services from the experts! Visit >>>> http://goo.gl/it488V for more information. >>>> == >>>> >>>> Ing. Daniele Romagnoli >>>> Senior Software Engineer >>>> >>>> GeoSolutions S.A.S. >>>> Via di Montramito 3/A >>>> 55054 Massarosa (LU) >>>> Italy >>>> phone: +39 0584 962313 >>>> fax: +39 0584 1660272 >>>> >>>> http://www.geo-solutions.it >>>> http://twitter.com/geosolutions_it >>>> >>>> ------------------------------------------------------- >>>> >>>> Con riferimento alla normativa sul trattamento dei dati personali (Reg. >>>> UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si >>>> precisa che ogni circostanza inerente alla presente email (il suo >>>> contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è >>>> riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il >>>> messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra >>>> operazione è illecita. Le sarei comunque grato se potesse darmene notizia. >>>> >>>> This email is intended only for the person or entity to which it is >>>> addressed and may contain information that is privileged, confidential or >>>> otherwise protected from disclosure. We remind that - as provided by >>>> European Regulation 2016/679 “GDPR” - copying, dissemination or use of this >>>> e-mail or the information herein by anyone other than the intended >>>> recipient is prohibited. If you have received this email by mistake, please >>>> notify us immediately by telephone or e-mail. >>>> >>> _______________________________________________ >>> GeoTools-Devel mailing list >>> GeoTools-Devel@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/geotools-devel >>> >> >> >> -- >> >> Regards, Andrea Aime == GeoServer Professional Services from the experts! >> Visit http://goo.gl/it488V for more information. == Ing. Andrea Aime >> @geowolf Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054 >> Massarosa (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 >> 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it >> ------------------------------------------------------- *Con riferimento >> alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 - >> Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni >> circostanza inerente alla presente email (il suo contenuto, gli eventuali >> allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i >> destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per >> errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le >> sarei comunque grato se potesse darmene notizia. This email is intended >> only for the person or entity to which it is addressed and may contain >> information that is privileged, confidential or otherwise protected from >> disclosure. We remind that - as provided by European Regulation 2016/679 >> “GDPR” - copying, dissemination or use of this e-mail or the information >> herein by anyone other than the intended recipient is prohibited. If you >> have received this email by mistake, please notify us immediately by >> telephone or e-mail.* >> > -- Josh Fix Shuttle Commander Planet Federal +1 321.444.0412 j...@federal.planet.com https://federal.planet.com
_______________________________________________ GeoTools-Devel mailing list GeoTools-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geotools-devel