So here is my first stab at the reader that supports VFS paths:

https://github.com/joshfix/gdal-vfs-reader

It is very much a prototype/WIP.  I would guess that 95% of the code was
simply copied and pasted from various sources in imageio-ext and geotools
and reassembled into these new classes.  I'm sure there are things I've
overlooked, but the basic functionality seems to work.  The input stream
implementation is a bit precarious and I'm not entirely certain how to
handle it, but the image mosaic configuration requires an implementation.
Also, there is a lot of work yet to be done on how to handle parsing
various URLs to proper virtual paths.

I would love to get this more "production-ready" and would be very happy if
anybody would be able to take a look at the project and provide feedback.

On Fri, Jun 28, 2019 at 10:07 AM Josh Fix <j...@federal.planet.com> wrote:

> Thanks for the insite Andrea.  I find myself getting mixed up quite a bit
> between the imageio-ext classes, the geotools class, the image readers and
> the coverage readers.  I suppose my ultimate goal would be to support a
> GridCoverage2DReader implementation that accepts vsicurl Strings (or likely
> some sort of wrapper class that does regex validation as you mentioned).  I
> am definitely on the same page and understanding that we do not want to be
> reading streams ourselves, and want gdal to handle all of the magic.  It
> was interesting to see that even when I pass some version of a GDAL image
> reader SPI to the GridCoverage2DReader, it would use the image reader to
> call gdal.Open to read the image metadata, but the actual read() method to
> read the image data itself was still using a file stream via the
> RasterLayerRequest/RasterLayerResponse objects.  So I don't fully
> understand where the actual starting point for implementing this would be.
> It seems like that read method should be deferring to the image reader
> passed in?
>
> Understood re: the s3-geotiff plugin not being a reader itself and just
> extending GeoTiffReader.  Having the /vsi capability seems like it would
> replace the need for any cloud specific inputstream implementation and
> GeoTiffReader (or other format) extension (eg the s3-geotiff plugin or the
> azure-geotiff plugin), but it also sounds like a great idea to update the 
> imageio-ext
> tiff reader.
>
> Once everything settles down and everybody has more time to think about
> this and come up with a plan, I am definitely more than happy to help with
> the implementation.
>
> On Fri, Jun 28, 2019 at 1:50 AM Andrea Aime <andrea.a...@geo-solutions.it>
> wrote:
>
>> Hi Josh,
>> I am under the impression that things are getting mixed up, but have
>> little time to write this mail and none to check
>> what I'm writing, so please take it with a pinch of salt.
>>
>> If you want to support visicurl directly from imageio-ext gdal based
>> readers, IMHO all the discussion about image
>> streams and the like is useless, because what we do in the end is to give
>> GDAL a source and directions of
>> what we want it to read, and get back on the other end a result,
>> everything related to efficient COG support
>> and eventual memory caching happens inside GDAL itself. What would be
>> needed there, is to allow a "/vsi..."
>> source to be passed in as a plain string, without trying to validate it
>> as a file. Maybe not just any string,
>> a light regex based validation could be applied in case GDAL does not
>> give suitable error messages.
>>
>> ----------------
>> The above makes sense if you want to use GDAL readers directly. If
>> instead you want to use the pure java TIFF
>> reader, the part below applies. They should not be mixed.
>> ----------------
>>
>> When it comes to "The current s3-geotiff reader reads byte chunks from s3
>> and does not support COG"
>> well, the s3-geotiff is not a geotiff reader at all, it's just a
>> ImageInputStream allowing the pure java
>> imageio-ext tiff reader to work.
>> There are two issues there:
>>
>>    - The imageio-ext tiff reader has not been redesigned to take
>>    adavantage of the COG structure and does a number of small reads, and 
>> jumps
>>    around. It should be modified to take advantage of COG structure instead.
>>    - The S3 based image input stream is, at least in its default
>>    configuration, quite bad in terms of IO, the way I see it, it works fine
>>    only when the relevant portion of the file is already in the local memory
>>    or disk cache
>>
>> To have efficient COG support the reader should be modified first, and
>> using a simple, non caching, block oriented HTTPImageInputStream (to be
>> written!)
>> and then we can see where and how caching helps (afaik GDAL does not do
>> much of it and it's still working fine)
>>
>> Cheers
>> Andrea
>>
>>
>>
>> On Wed, Jun 26, 2019 at 7:46 PM Josh Fix via GeoTools-Devel <
>> geotools-devel@lists.sourceforge.net> wrote:
>>
>>> Thanks so much for the response and I'm excited to hear that there is
>>> interest.  Does any of the existing GDAL code specifically support COGs? My
>>> imagery skills aren't very strong, but I would hope that we would be able
>>> to use the request geometry to only request the specific range of bytes
>>> required.  I'm assuming this is the part that would be handled by gdal
>>> itself, however I don't know what is necessary to make the request.
>>> Looking at these sample requests:
>>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF  I would assume
>>> we'd have to convert the geo coords into the pixel space of the image, then
>>> pass those coords in the request (to translate? warp?) and gdal would know
>>> which ranges to request.
>>>
>>> Not that this is any sort of revelation, but building the right Format
>>> implementation will also be important.  I've found that I can pass an http
>>> url as a String directly to GeoTiffReader and it will read it fine, however
>>> when used as part of an ImageMosaic, it gets converted to a URL and the
>>> GeoTiffFormat class will throw an exception if the URL is anything other
>>> than a URL to a file.  In this instance, I've had to provide a custom
>>> GeoTiffFormat implementation to not error out if the URL doesn't point to a
>>> file, and continue to build the GeoTiffReader and pass along the string
>>> value of the URL.
>>>
>>> The current s3-geotiff reader reads byte chunks from s3 and does not
>>> support COG.  It also uses a non-standard format where the protocol is s3,
>>> and it strips out the last 2 parts of the path to determine the bucket and
>>> key.  Then it relies on configuration to determine the region, etc.   The
>>> Azure GeoTIFF reader I built follows the exact same code structure but
>>> implements the Azure API, but allows azure-specific "wasb://" and
>>> "wasbs://" protocols in the URLs.  I was well on my way to creating a
>>> custom "HttpGeoTiffReader" when I started encountering the issues that
>>> started this thread.  All that is to say, none of that seems optimal and
>>> building support for VFS would mean all of these can be replaced, as it
>>> supports all major cloud providers and would have baked-in COG support.
>>> Pretty exciting :)
>>>
>>> Additionally, it would be interesting to support some lightweight
>>> in-memory caching (or potentially external caching?) in the same way the
>>> s3-geotiff reader uses ehcache.  And finally, support for async requests to
>>> assist when there may be multiple concurrent images/granules being
>>> requested.
>>>
>>> Thanks!
>>>
>>> Josh
>>>
>>> On Wed, Jun 26, 2019 at 10:13 AM Daniele Romagnoli <
>>> daniele.romagn...@geo-solutions.it> wrote:
>>>
>>>> Hi Josh,
>>>> I might be back with some more helpful feedbacks in the next few days
>>>> but I wanted to provide at least my reply right now, since I started
>>>> ImageIO-EXT/GeoTools/GeoServer - GDAL support/integration many years ago.
>>>> I think that supporting VFS would be a great and interesting
>>>> contribution to the project.
>>>> I need to check back the code more in detail since it's a couple of
>>>> years I didn't touch it, so I don't have precise feedbacks right now :)
>>>> Anyway, I think that a couple of key points are:
>>>> - updating the ImageIO-EXT low level reader machinery to handle that
>>>> new type of input. ImageReaders have a "setInput" method accepting an
>>>> object. The ImageReaderSPI will tell which types of input objects are
>>>> supported. SPI also have a canDecodeInput method which tell if the provided
>>>> input can be decoded. I think that the currently supported inputs are
>>>> Files, URLs, Strings (representing URL or Files), and
>>>> FileImageInputStream*. Note that, in the past, we also had to support an
>>>> ECWP protocol as format requirement, so we had to setup an ECWP input
>>>> stream. However, I think that it has never being used within
>>>> GeoTools/GeoServer in practice.
>>>> - updating the GeoTools GDAL base classes. As you said, the
>>>> RasterLayerResponse is trying to setup a FileImageInputStreamExt because
>>>> the 99% of the GDAL ImageIO-Ext readers use that. However, if my memory
>>>> serves me right, the base GridCoverage2DReader abstract classes accept a
>>>> generic Object input (as well as the ImageReader as said before) so we may
>>>> "relax" the implementation to also support the VFS.
>>>>
>>>> As I said, sorry for these "generic" feedbacks. Hope they can be used
>>>> as a starting point for further investigations.
>>>>
>>>> Best Regards,
>>>> Daniele
>>>>
>>>>
>>>>
>>>> On Tue, Jun 25, 2019 at 5:36 PM Josh Fix via GeoTools-Devel <
>>>> geotools-devel@lists.sourceforge.net> wrote:
>>>>
>>>>> Hi all.  I have code that builds an ImageMosaicReader that utilizes a
>>>>> custom format and input stream SPI.  I define the classes I want to use in
>>>>> the Properties class used by the GranuleCatalog and in the
>>>>> CatalogConfigurationBean used by the
>>>>> ImageMosaicDescriptor/ImageMosaicReader.  Generally, it looks something
>>>>> like this:
>>>>>
>>>>> props.put(Utils.Prop.SUGGESTED_IS_SPI,
>>>>> MyImageInputStreamSpi.class.getCanonicalName());
>>>>> props.put(Utils.Prop.SUGGESTED_FORMAT,
>>>>> MyGeoTiffFormat.class.getCanonicalName());
>>>>> props.put(Utils.Prop.SUGGESTED_SPI,
>>>>> "it.geosolutions.imageioimpl.plugins.tiff.TIFFImageReaderSpi");
>>>>>
>>>>> This has worked great in the past using custom input stream classes to
>>>>> read from S3 and Azure.  My current goal is to be able to create mosaics
>>>>> from any given http endpoint, but specifically focus on cloud optimized
>>>>> geotiffs (
>>>>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF#HowtoreaditwithGDAL)
>>>>> using the "vsicurl" VFS.  I've written GDAL code in the past to open
>>>>> geotiffs on S3 using the /vsis3 VFS and it was incredibly efficient, so I
>>>>> sought out to do the same in GeoTools for my mosaic.
>>>>>
>>>>> I was delighted to discover many modules supporting GDAL readers.  I
>>>>> started to experiment with
>>>>> https://github.com/geosolutions-it/imageio-ext/blob/master/plugin/gdal/gdalgeotiff/src/main/java/it/geosolutions/imageio/plugins/geotiff/GeoTiffImageReader.java
>>>>>  and
>>>>> created my own Format implementation using the packages in
>>>>> https://github.com/geotools/geotools/tree/master/modules/plugin/imageio-ext-gdal/src/main/java/org/geotools/coverageio/gdal
>>>>>  as
>>>>> templates.  I created a simple GeoTiffReader class that extends from
>>>>> BaseGdalReader, which is responsible for creating the 
>>>>> GeoTiffImageReaderSpi
>>>>> and GeoTiffFormat instances.
>>>>>
>>>>> A VFS URL takes the form: "/vsicurl/
>>>>> https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/153/075/LC08_L1TP_153075_20190515_20190515_01_RT/LC08_L1TP_153075_20190515_20190515_01_RT_B3.TIF";...
>>>>> simply taking the http/https url an prefixing it with "/vsicurl/".  This
>>>>> string can be passed directly to `gdal.Open()` and it will return a 
>>>>> Dataset.
>>>>>
>>>>> The issue is that the input object that ends up being passed to
>>>>> BaseGridCoverage2DReader ends up expecting a file, either in the form of a
>>>>> URL, FileInputStreamExt, or String (see
>>>>> BaseGridCoverage2DReader.checkSource).  I thought I might be able to trick
>>>>> it by passing in the vsicurl path prefixed with the file protocol
>>>>> (file:///vsicurl/https://s3-us-west...).  The issue is that when Java
>>>>> returns the file path, it removes the second forward slash from https://,
>>>>> leaving you with the path "/vsicurl/https:/s3-us-west...", which gdal does
>>>>> not like.
>>>>>
>>>>> I duplicated a sufficient number of GeoTools classes to be able to add
>>>>> in simple string replacements for "https:/" -> "https://"; where
>>>>> needed.  This allowed me to create the reader and successfully create the
>>>>> metadata object (in GDALImageReader.setInput).  The 
>>>>> BseGridCoverage2DReader
>>>>> constructor completes fully and all of the coverage properties, layout,
>>>>> resolution info, etc is built.  At this point I was very hopeful that the
>>>>> only change necessary was modifying these classes to not only support
>>>>> files, but also strings that begin with "/vsi".
>>>>>
>>>>> Unfortunately this idea fell apart when I called "reader.read()".
>>>>> Long story short, the RasterLayerResponse object tries to create
>>>>> a FileImageInputStreamExtImpl, so it's not just reading the dataset using
>>>>> `gdal.Open`.  At this point I'm not 100% certain what the best path 
>>>>> forward
>>>>> would be.  I know conversations in the past have centered around adding 
>>>>> and
>>>>> supporting custom protocols to java.net.URL, but this is unique in that
>>>>> these "paths" are not valid URLs or file paths due to the double forward
>>>>> slash in the middle of the path.
>>>>>
>>>>> I would like to know if supporting VFS is something the community is
>>>>> interested in.  I intend to continue to work on a solution for myself, but
>>>>> would love to work with you all and contribute back if there is interest.
>>>>>
>>>>> I apologize for the long read.  I hope everything makes sense and
>>>>> please let me know if there are any questions.
>>>>>
>>>>> Josh
>>>>> _______________________________________________
>>>>> GeoTools-Devel mailing list
>>>>> GeoTools-Devel@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/geotools-devel
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Daniele Romagnoli
>>>> ==
>>>> GeoServer Professional Services from the experts! Visit
>>>> http://goo.gl/it488V for more information.
>>>> ==
>>>>
>>>> Ing. Daniele Romagnoli
>>>> Senior Software Engineer
>>>>
>>>> GeoSolutions S.A.S.
>>>> Via di Montramito 3/A
>>>> 55054  Massarosa (LU)
>>>> Italy
>>>> phone: +39 0584 962313
>>>> fax:      +39 0584 1660272
>>>>
>>>> http://www.geo-solutions.it
>>>> http://twitter.com/geosolutions_it
>>>>
>>>> -------------------------------------------------------
>>>>
>>>> Con riferimento alla normativa sul trattamento dei dati personali (Reg.
>>>> UE 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si
>>>> precisa che ogni circostanza inerente alla presente email (il suo
>>>> contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è
>>>> riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il
>>>> messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra
>>>> operazione è illecita. Le sarei comunque grato se potesse darmene notizia.
>>>>
>>>> This email is intended only for the person or entity to which it is
>>>> addressed and may contain information that is privileged, confidential or
>>>> otherwise protected from disclosure. We remind that - as provided by
>>>> European Regulation 2016/679 “GDPR” - copying, dissemination or use of this
>>>> e-mail or the information herein by anyone other than the intended
>>>> recipient is prohibited. If you have received this email by mistake, please
>>>> notify us immediately by telephone or e-mail.
>>>>
>>> _______________________________________________
>>> GeoTools-Devel mailing list
>>> GeoTools-Devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/geotools-devel
>>>
>>
>>
>> --
>>
>> Regards, Andrea Aime == GeoServer Professional Services from the experts!
>> Visit http://goo.gl/it488V for more information. == Ing. Andrea Aime
>> @geowolf Technical Lead GeoSolutions S.A.S. Via di Montramito 3/A 55054
>> Massarosa (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339
>> 8844549 http://www.geo-solutions.it http://twitter.com/geosolutions_it
>> ------------------------------------------------------- *Con riferimento
>> alla normativa sul trattamento dei dati personali (Reg. UE 2016/679 -
>> Regolamento generale sulla protezione dei dati “GDPR”), si precisa che ogni
>> circostanza inerente alla presente email (il suo contenuto, gli eventuali
>> allegati, etc.) è un dato la cui conoscenza è riservata al/i solo/i
>> destinatario/i indicati dallo scrivente. Se il messaggio Le è giunto per
>> errore, è tenuta/o a cancellarlo, ogni altra operazione è illecita. Le
>> sarei comunque grato se potesse darmene notizia. This email is intended
>> only for the person or entity to which it is addressed and may contain
>> information that is privileged, confidential or otherwise protected from
>> disclosure. We remind that - as provided by European Regulation 2016/679
>> “GDPR” - copying, dissemination or use of this e-mail or the information
>> herein by anyone other than the intended recipient is prohibited. If you
>> have received this email by mistake, please notify us immediately by
>> telephone or e-mail.*
>>
>

-- 
Josh Fix
Shuttle Commander
Planet Federal
+1 321.444.0412
j...@federal.planet.com
https://federal.planet.com
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to