Hey thanks Jody! Ya I would love to contribute it. I just figured it would be better to let it incubate in my repo until it's received enough criticism. I will definitely need my hand held a little as I've never contributed significant amounts of code to another project before. There are also some things that we might want to change in the existing projects also; for example, GdalVfsImageReader extends from GDALImageReader. GDALImageReader uses a lot of private variables that I have to re-declare in GdalVfsImageReader, as I don't have access to them. This also means that there are several methods I had to 1:1 copy into my class so that they had access to my variables and weren't running in the super class where the variables weren't set. Simply making some of those protected could reduce a lot of code duplication.
I also had to make my coverage reader extend AbstractGridCoverage2DReader instead of BaseGDALGridCoverage2DReader due to some of the checks in the constructor of BaseGridCoverage2DReader. This again resulted in me copying a lot of the methods 1:1 out of the GDAL coverage classes and into mine. I'm not sure if there is a good way to work around that. Is there a particular header format I should follow or any examples I could look at? I may have to cite several sources for each class. Thanks! On Fri, Aug 2, 2019 at 7:26 PM Jody Garnett <jody.garn...@gmail.com> wrote: > That is great josh! Is the goal to contribute this to imageio-ext? If you > can try and cite your sources in the header it will help .. > > On Fri, Aug 2, 2019 at 1:03 PM Josh Fix via GeoTools-Devel < > geotools-devel@lists.sourceforge.net> wrote: > >> So here is my first stab at the reader that supports VFS paths: >> >> https://github.com/joshfix/gdal-vfs-reader >> >> It is very much a prototype/WIP. I would guess that 95% of the code was >> simply copied and pasted from various sources in imageio-ext and geotools >> and reassembled into these new classes. I'm sure there are things I've >> overlooked, but the basic functionality seems to work. The input stream >> implementation is a bit precarious and I'm not entirely certain how to >> handle it, but the image mosaic configuration requires an implementation. >> Also, there is a lot of work yet to be done on how to handle parsing >> various URLs to proper virtual paths. >> >> I would love to get this more "production-ready" and would be very happy >> if anybody would be able to take a look at the project and provide feedback. >> >> On Fri, Jun 28, 2019 at 10:07 AM Josh Fix <j...@federal.planet.com> >> wrote: >> >>> Thanks for the insite Andrea. I find myself getting mixed up quite a >>> bit between the imageio-ext classes, the geotools class, the image readers >>> and the coverage readers. I suppose my ultimate goal would be to support a >>> GridCoverage2DReader implementation that accepts vsicurl Strings (or likely >>> some sort of wrapper class that does regex validation as you mentioned). I >>> am definitely on the same page and understanding that we do not want to be >>> reading streams ourselves, and want gdal to handle all of the magic. It >>> was interesting to see that even when I pass some version of a GDAL image >>> reader SPI to the GridCoverage2DReader, it would use the image reader to >>> call gdal.Open to read the image metadata, but the actual read() method to >>> read the image data itself was still using a file stream via the >>> RasterLayerRequest/RasterLayerResponse objects. So I don't fully >>> understand where the actual starting point for implementing this would be. >>> It seems like that read method should be deferring to the image reader >>> passed in? >>> >>> Understood re: the s3-geotiff plugin not being a reader itself and just >>> extending GeoTiffReader. Having the /vsi capability seems like it would >>> replace the need for any cloud specific inputstream implementation and >>> GeoTiffReader (or other format) extension (eg the s3-geotiff plugin or the >>> azure-geotiff plugin), but it also sounds like a great idea to update the >>> imageio-ext >>> tiff reader. >>> >>> Once everything settles down and everybody has more time to think about >>> this and come up with a plan, I am definitely more than happy to help with >>> the implementation. >>> >>> On Fri, Jun 28, 2019 at 1:50 AM Andrea Aime < >>> andrea.a...@geo-solutions.it> wrote: >>> >>>> Hi Josh, >>>> I am under the impression that things are getting mixed up, but have >>>> little time to write this mail and none to check >>>> what I'm writing, so please take it with a pinch of salt. >>>> >>>> If you want to support visicurl directly from imageio-ext gdal based >>>> readers, IMHO all the discussion about image >>>> streams and the like is useless, because what we do in the end is to >>>> give GDAL a source and directions of >>>> what we want it to read, and get back on the other end a result, >>>> everything related to efficient COG support >>>> and eventual memory caching happens inside GDAL itself. What would be >>>> needed there, is to allow a "/vsi..." >>>> source to be passed in as a plain string, without trying to validate it >>>> as a file. Maybe not just any string, >>>> a light regex based validation could be applied in case GDAL does not >>>> give suitable error messages. >>>> >>>> ---------------- >>>> The above makes sense if you want to use GDAL readers directly. If >>>> instead you want to use the pure java TIFF >>>> reader, the part below applies. They should not be mixed. >>>> ---------------- >>>> >>>> When it comes to "The current s3-geotiff reader reads byte chunks from >>>> s3 and does not support COG" >>>> well, the s3-geotiff is not a geotiff reader at all, it's just a >>>> ImageInputStream allowing the pure java >>>> imageio-ext tiff reader to work. >>>> There are two issues there: >>>> >>>> - The imageio-ext tiff reader has not been redesigned to take >>>> adavantage of the COG structure and does a number of small reads, and >>>> jumps >>>> around. It should be modified to take advantage of COG structure >>>> instead. >>>> - The S3 based image input stream is, at least in its default >>>> configuration, quite bad in terms of IO, the way I see it, it works fine >>>> only when the relevant portion of the file is already in the local >>>> memory >>>> or disk cache >>>> >>>> To have efficient COG support the reader should be modified first, and >>>> using a simple, non caching, block oriented HTTPImageInputStream (to be >>>> written!) >>>> and then we can see where and how caching helps (afaik GDAL does not do >>>> much of it and it's still working fine) >>>> >>>> Cheers >>>> Andrea >>>> >>>> >>>> >>>> On Wed, Jun 26, 2019 at 7:46 PM Josh Fix via GeoTools-Devel < >>>> geotools-devel@lists.sourceforge.net> wrote: >>>> >>>>> Thanks so much for the response and I'm excited to hear that there is >>>>> interest. Does any of the existing GDAL code specifically support COGs? >>>>> My >>>>> imagery skills aren't very strong, but I would hope that we would be able >>>>> to use the request geometry to only request the specific range of bytes >>>>> required. I'm assuming this is the part that would be handled by gdal >>>>> itself, however I don't know what is necessary to make the request. >>>>> Looking at these sample requests: >>>>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF I would >>>>> assume we'd have to convert the geo coords into the pixel space of the >>>>> image, then pass those coords in the request (to translate? warp?) and >>>>> gdal >>>>> would know which ranges to request. >>>>> >>>>> Not that this is any sort of revelation, but building the right Format >>>>> implementation will also be important. I've found that I can pass an http >>>>> url as a String directly to GeoTiffReader and it will read it fine, >>>>> however >>>>> when used as part of an ImageMosaic, it gets converted to a URL and the >>>>> GeoTiffFormat class will throw an exception if the URL is anything other >>>>> than a URL to a file. In this instance, I've had to provide a custom >>>>> GeoTiffFormat implementation to not error out if the URL doesn't point to >>>>> a >>>>> file, and continue to build the GeoTiffReader and pass along the string >>>>> value of the URL. >>>>> >>>>> The current s3-geotiff reader reads byte chunks from s3 and does not >>>>> support COG. It also uses a non-standard format where the protocol is s3, >>>>> and it strips out the last 2 parts of the path to determine the bucket and >>>>> key. Then it relies on configuration to determine the region, etc. The >>>>> Azure GeoTIFF reader I built follows the exact same code structure but >>>>> implements the Azure API, but allows azure-specific "wasb://" and >>>>> "wasbs://" protocols in the URLs. I was well on my way to creating a >>>>> custom "HttpGeoTiffReader" when I started encountering the issues that >>>>> started this thread. All that is to say, none of that seems optimal and >>>>> building support for VFS would mean all of these can be replaced, as it >>>>> supports all major cloud providers and would have baked-in COG support. >>>>> Pretty exciting :) >>>>> >>>>> Additionally, it would be interesting to support some lightweight >>>>> in-memory caching (or potentially external caching?) in the same way the >>>>> s3-geotiff reader uses ehcache. And finally, support for async requests >>>>> to >>>>> assist when there may be multiple concurrent images/granules being >>>>> requested. >>>>> >>>>> Thanks! >>>>> >>>>> Josh >>>>> >>>>> On Wed, Jun 26, 2019 at 10:13 AM Daniele Romagnoli < >>>>> daniele.romagn...@geo-solutions.it> wrote: >>>>> >>>>>> Hi Josh, >>>>>> I might be back with some more helpful feedbacks in the next few days >>>>>> but I wanted to provide at least my reply right now, since I started >>>>>> ImageIO-EXT/GeoTools/GeoServer - GDAL support/integration many years ago. >>>>>> I think that supporting VFS would be a great and interesting >>>>>> contribution to the project. >>>>>> I need to check back the code more in detail since it's a couple of >>>>>> years I didn't touch it, so I don't have precise feedbacks right now :) >>>>>> Anyway, I think that a couple of key points are: >>>>>> - updating the ImageIO-EXT low level reader machinery to handle that >>>>>> new type of input. ImageReaders have a "setInput" method accepting an >>>>>> object. The ImageReaderSPI will tell which types of input objects are >>>>>> supported. SPI also have a canDecodeInput method which tell if the >>>>>> provided >>>>>> input can be decoded. I think that the currently supported inputs are >>>>>> Files, URLs, Strings (representing URL or Files), and >>>>>> FileImageInputStream*. Note that, in the past, we also had to support an >>>>>> ECWP protocol as format requirement, so we had to setup an ECWP input >>>>>> stream. However, I think that it has never being used within >>>>>> GeoTools/GeoServer in practice. >>>>>> - updating the GeoTools GDAL base classes. As you said, the >>>>>> RasterLayerResponse is trying to setup a FileImageInputStreamExt because >>>>>> the 99% of the GDAL ImageIO-Ext readers use that. However, if my memory >>>>>> serves me right, the base GridCoverage2DReader abstract classes accept a >>>>>> generic Object input (as well as the ImageReader as said before) so we >>>>>> may >>>>>> "relax" the implementation to also support the VFS. >>>>>> >>>>>> As I said, sorry for these "generic" feedbacks. Hope they can be used >>>>>> as a starting point for further investigations. >>>>>> >>>>>> Best Regards, >>>>>> Daniele >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jun 25, 2019 at 5:36 PM Josh Fix via GeoTools-Devel < >>>>>> geotools-devel@lists.sourceforge.net> wrote: >>>>>> >>>>>>> Hi all. I have code that builds an ImageMosaicReader that utilizes >>>>>>> a custom format and input stream SPI. I define the classes I want to >>>>>>> use >>>>>>> in the Properties class used by the GranuleCatalog and in the >>>>>>> CatalogConfigurationBean used by the >>>>>>> ImageMosaicDescriptor/ImageMosaicReader. Generally, it looks something >>>>>>> like this: >>>>>>> >>>>>>> props.put(Utils.Prop.SUGGESTED_IS_SPI, >>>>>>> MyImageInputStreamSpi.class.getCanonicalName()); >>>>>>> props.put(Utils.Prop.SUGGESTED_FORMAT, >>>>>>> MyGeoTiffFormat.class.getCanonicalName()); >>>>>>> props.put(Utils.Prop.SUGGESTED_SPI, >>>>>>> "it.geosolutions.imageioimpl.plugins.tiff.TIFFImageReaderSpi"); >>>>>>> >>>>>>> This has worked great in the past using custom input stream classes >>>>>>> to read from S3 and Azure. My current goal is to be able to create >>>>>>> mosaics >>>>>>> from any given http endpoint, but specifically focus on cloud optimized >>>>>>> geotiffs ( >>>>>>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF#HowtoreaditwithGDAL) >>>>>>> using the "vsicurl" VFS. I've written GDAL code in the past to open >>>>>>> geotiffs on S3 using the /vsis3 VFS and it was incredibly efficient, so >>>>>>> I >>>>>>> sought out to do the same in GeoTools for my mosaic. >>>>>>> >>>>>>> I was delighted to discover many modules supporting GDAL readers. I >>>>>>> started to experiment with >>>>>>> https://github.com/geosolutions-it/imageio-ext/blob/master/plugin/gdal/gdalgeotiff/src/main/java/it/geosolutions/imageio/plugins/geotiff/GeoTiffImageReader.java >>>>>>> and >>>>>>> created my own Format implementation using the packages in >>>>>>> https://github.com/geotools/geotools/tree/master/modules/plugin/imageio-ext-gdal/src/main/java/org/geotools/coverageio/gdal >>>>>>> as >>>>>>> templates. I created a simple GeoTiffReader class that extends from >>>>>>> BaseGdalReader, which is responsible for creating the >>>>>>> GeoTiffImageReaderSpi >>>>>>> and GeoTiffFormat instances. >>>>>>> >>>>>>> A VFS URL takes the form: "/vsicurl/ >>>>>>> https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/153/075/LC08_L1TP_153075_20190515_20190515_01_RT/LC08_L1TP_153075_20190515_20190515_01_RT_B3.TIF"... >>>>>>> simply taking the http/https url an prefixing it with "/vsicurl/". This >>>>>>> string can be passed directly to `gdal.Open()` and it will return a >>>>>>> Dataset. >>>>>>> >>>>>>> The issue is that the input object that ends up being passed to >>>>>>> BaseGridCoverage2DReader ends up expecting a file, either in the form >>>>>>> of a >>>>>>> URL, FileInputStreamExt, or String (see >>>>>>> BaseGridCoverage2DReader.checkSource). I thought I might be able to >>>>>>> trick >>>>>>> it by passing in the vsicurl path prefixed with the file protocol >>>>>>> (file:///vsicurl/https://s3-us-west...). The issue is that when >>>>>>> Java returns the file path, it removes the second forward slash from >>>>>>> https://, leaving you with the path >>>>>>> "/vsicurl/https:/s3-us-west...", which gdal does not like. >>>>>>> >>>>>>> I duplicated a sufficient number of GeoTools classes to be able to >>>>>>> add in simple string replacements for "https:/" -> "https://" where >>>>>>> needed. This allowed me to create the reader and successfully create >>>>>>> the >>>>>>> metadata object (in GDALImageReader.setInput). The >>>>>>> BseGridCoverage2DReader >>>>>>> constructor completes fully and all of the coverage properties, layout, >>>>>>> resolution info, etc is built. At this point I was very hopeful that >>>>>>> the >>>>>>> only change necessary was modifying these classes to not only support >>>>>>> files, but also strings that begin with "/vsi". >>>>>>> >>>>>>> Unfortunately this idea fell apart when I called "reader.read()". >>>>>>> Long story short, the RasterLayerResponse object tries to create >>>>>>> a FileImageInputStreamExtImpl, so it's not just reading the dataset >>>>>>> using >>>>>>> `gdal.Open`. At this point I'm not 100% certain what the best path >>>>>>> forward >>>>>>> would be. I know conversations in the past have centered around adding >>>>>>> and >>>>>>> supporting custom protocols to java.net.URL, but this is unique in that >>>>>>> these "paths" are not valid URLs or file paths due to the double forward >>>>>>> slash in the middle of the path. >>>>>>> >>>>>>> I would like to know if supporting VFS is something the community is >>>>>>> interested in. I intend to continue to work on a solution for myself, >>>>>>> but >>>>>>> would love to work with you all and contribute back if there is >>>>>>> interest. >>>>>>> >>>>>>> I apologize for the long read. I hope everything makes sense and >>>>>>> please let me know if there are any questions. >>>>>>> >>>>>>> Josh >>>>>>> _______________________________________________ >>>>>>> GeoTools-Devel mailing list >>>>>>> GeoTools-Devel@lists.sourceforge.net >>>>>>> https://lists.sourceforge.net/lists/listinfo/geotools-devel >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Daniele Romagnoli >>>>>> == >>>>>> GeoServer Professional Services from the experts! Visit >>>>>> http://goo.gl/it488V for more information. >>>>>> == >>>>>> >>>>>> Ing. Daniele Romagnoli >>>>>> Senior Software Engineer >>>>>> >>>>>> GeoSolutions S.A.S. >>>>>> Via di Montramito 3/A >>>>>> <https://www.google.com/maps/search/Via+di+Montramito+3%2FA+55054+%C2%A0Massarosa?entry=gmail&source=g> >>>>>> 55054 Massarosa >>>>>> <https://www.google.com/maps/search/Via+di+Montramito+3%2FA+55054+%C2%A0Massarosa?entry=gmail&source=g> >>>>>> (LU) >>>>>> Italy >>>>>> phone: +39 0584 962313 >>>>>> fax: +39 0584 1660272 >>>>>> >>>>>> http://www.geo-solutions.it >>>>>> http://twitter.com/geosolutions_it >>>>>> >>>>>> ------------------------------------------------------- >>>>>> >>>>>> Con riferimento alla normativa sul trattamento dei dati personali >>>>>> (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati >>>>>> “GDPR”), >>>>>> si precisa che ogni circostanza inerente alla presente email (il suo >>>>>> contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è >>>>>> riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il >>>>>> messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra >>>>>> operazione è illecita. Le sarei comunque grato se potesse darmene >>>>>> notizia. >>>>>> >>>>>> This email is intended only for the person or entity to which it is >>>>>> addressed and may contain information that is privileged, confidential or >>>>>> otherwise protected from disclosure. We remind that - as provided by >>>>>> European Regulation 2016/679 “GDPR” - copying, dissemination or use of >>>>>> this >>>>>> e-mail or the information herein by anyone other than the intended >>>>>> recipient is prohibited. If you have received this email by mistake, >>>>>> please >>>>>> notify us immediately by telephone or e-mail. >>>>>> >>>>> _______________________________________________ >>>>> GeoTools-Devel mailing list >>>>> GeoTools-Devel@lists.sourceforge.net >>>>> https://lists.sourceforge.net/lists/listinfo/geotools-devel >>>>> >>>> >>>> >>>> -- >>>> >>>> Regards, Andrea Aime == GeoServer Professional Services from the >>>> experts! Visit http://goo.gl/it488V for more information. == Ing. >>>> Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di >>>> Montramito 3/A 55054 Massarosa >>>> <https://www.google.com/maps/search/Via+di+Montramito+3%2FA%0D%0A55054++Massarosa?entry=gmail&source=g> >>>> (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549 >>>> http://www.geo-solutions.it http://twitter.com/geosolutions_it >>>> ------------------------------------------------------- *Con >>>> riferimento alla normativa sul trattamento dei dati personali (Reg. UE >>>> 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si >>>> precisa che ogni circostanza inerente alla presente email (il suo >>>> contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è >>>> riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il >>>> messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra >>>> operazione è illecita. Le sarei comunque grato se potesse darmene notizia. >>>> This email is intended only for the person or entity to which it is >>>> addressed and may contain information that is privileged, confidential or >>>> otherwise protected from disclosure. We remind that - as provided by >>>> European Regulation 2016/679 “GDPR” - copying, dissemination or use of this >>>> e-mail or the information herein by anyone other than the intended >>>> recipient is prohibited. If you have received this email by mistake, please >>>> notify us immediately by telephone or e-mail.* >>>> >>> >> >> -- >> Josh Fix >> Shuttle Commander >> Planet Federal >> +1 321.444.0412 >> j...@federal.planet.com >> https://federal.planet.com >> _______________________________________________ >> GeoTools-Devel mailing list >> GeoTools-Devel@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/geotools-devel >> > -- > -- > Jody Garnett > -- Josh Fix Shuttle Commander Planet Federal +1 321.444.0412 j...@federal.planet.com https://federal.planet.com
_______________________________________________ GeoTools-Devel mailing list GeoTools-Devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/geotools-devel