That is great josh! Is the goal to contribute this to imageio-ext?  If you
can try and cite your sources in the header it will help ..

On Fri, Aug 2, 2019 at 1:03 PM Josh Fix via GeoTools-Devel <
geotools-devel@lists.sourceforge.net> wrote:

> So here is my first stab at the reader that supports VFS paths:
>
> https://github.com/joshfix/gdal-vfs-reader
>
> It is very much a prototype/WIP.  I would guess that 95% of the code was
> simply copied and pasted from various sources in imageio-ext and geotools
> and reassembled into these new classes.  I'm sure there are things I've
> overlooked, but the basic functionality seems to work.  The input stream
> implementation is a bit precarious and I'm not entirely certain how to
> handle it, but the image mosaic configuration requires an implementation.
> Also, there is a lot of work yet to be done on how to handle parsing
> various URLs to proper virtual paths.
>
> I would love to get this more "production-ready" and would be very happy
> if anybody would be able to take a look at the project and provide feedback.
>
> On Fri, Jun 28, 2019 at 10:07 AM Josh Fix <j...@federal.planet.com> wrote:
>
>> Thanks for the insite Andrea.  I find myself getting mixed up quite a bit
>> between the imageio-ext classes, the geotools class, the image readers and
>> the coverage readers.  I suppose my ultimate goal would be to support a
>> GridCoverage2DReader implementation that accepts vsicurl Strings (or likely
>> some sort of wrapper class that does regex validation as you mentioned).  I
>> am definitely on the same page and understanding that we do not want to be
>> reading streams ourselves, and want gdal to handle all of the magic.  It
>> was interesting to see that even when I pass some version of a GDAL image
>> reader SPI to the GridCoverage2DReader, it would use the image reader to
>> call gdal.Open to read the image metadata, but the actual read() method to
>> read the image data itself was still using a file stream via the
>> RasterLayerRequest/RasterLayerResponse objects.  So I don't fully
>> understand where the actual starting point for implementing this would be.
>> It seems like that read method should be deferring to the image reader
>> passed in?
>>
>> Understood re: the s3-geotiff plugin not being a reader itself and just
>> extending GeoTiffReader.  Having the /vsi capability seems like it would
>> replace the need for any cloud specific inputstream implementation and
>> GeoTiffReader (or other format) extension (eg the s3-geotiff plugin or the
>> azure-geotiff plugin), but it also sounds like a great idea to update the 
>> imageio-ext
>> tiff reader.
>>
>> Once everything settles down and everybody has more time to think about
>> this and come up with a plan, I am definitely more than happy to help with
>> the implementation.
>>
>> On Fri, Jun 28, 2019 at 1:50 AM Andrea Aime <andrea.a...@geo-solutions.it>
>> wrote:
>>
>>> Hi Josh,
>>> I am under the impression that things are getting mixed up, but have
>>> little time to write this mail and none to check
>>> what I'm writing, so please take it with a pinch of salt.
>>>
>>> If you want to support visicurl directly from imageio-ext gdal based
>>> readers, IMHO all the discussion about image
>>> streams and the like is useless, because what we do in the end is to
>>> give GDAL a source and directions of
>>> what we want it to read, and get back on the other end a result,
>>> everything related to efficient COG support
>>> and eventual memory caching happens inside GDAL itself. What would be
>>> needed there, is to allow a "/vsi..."
>>> source to be passed in as a plain string, without trying to validate it
>>> as a file. Maybe not just any string,
>>> a light regex based validation could be applied in case GDAL does not
>>> give suitable error messages.
>>>
>>> ----------------
>>> The above makes sense if you want to use GDAL readers directly. If
>>> instead you want to use the pure java TIFF
>>> reader, the part below applies. They should not be mixed.
>>> ----------------
>>>
>>> When it comes to "The current s3-geotiff reader reads byte chunks from
>>> s3 and does not support COG"
>>> well, the s3-geotiff is not a geotiff reader at all, it's just a
>>> ImageInputStream allowing the pure java
>>> imageio-ext tiff reader to work.
>>> There are two issues there:
>>>
>>>    - The imageio-ext tiff reader has not been redesigned to take
>>>    adavantage of the COG structure and does a number of small reads, and 
>>> jumps
>>>    around. It should be modified to take advantage of COG structure instead.
>>>    - The S3 based image input stream is, at least in its default
>>>    configuration, quite bad in terms of IO, the way I see it, it works fine
>>>    only when the relevant portion of the file is already in the local memory
>>>    or disk cache
>>>
>>> To have efficient COG support the reader should be modified first, and
>>> using a simple, non caching, block oriented HTTPImageInputStream (to be
>>> written!)
>>> and then we can see where and how caching helps (afaik GDAL does not do
>>> much of it and it's still working fine)
>>>
>>> Cheers
>>> Andrea
>>>
>>>
>>>
>>> On Wed, Jun 26, 2019 at 7:46 PM Josh Fix via GeoTools-Devel <
>>> geotools-devel@lists.sourceforge.net> wrote:
>>>
>>>> Thanks so much for the response and I'm excited to hear that there is
>>>> interest.  Does any of the existing GDAL code specifically support COGs? My
>>>> imagery skills aren't very strong, but I would hope that we would be able
>>>> to use the request geometry to only request the specific range of bytes
>>>> required.  I'm assuming this is the part that would be handled by gdal
>>>> itself, however I don't know what is necessary to make the request.
>>>> Looking at these sample requests:
>>>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF  I would assume
>>>> we'd have to convert the geo coords into the pixel space of the image, then
>>>> pass those coords in the request (to translate? warp?) and gdal would know
>>>> which ranges to request.
>>>>
>>>> Not that this is any sort of revelation, but building the right Format
>>>> implementation will also be important.  I've found that I can pass an http
>>>> url as a String directly to GeoTiffReader and it will read it fine, however
>>>> when used as part of an ImageMosaic, it gets converted to a URL and the
>>>> GeoTiffFormat class will throw an exception if the URL is anything other
>>>> than a URL to a file.  In this instance, I've had to provide a custom
>>>> GeoTiffFormat implementation to not error out if the URL doesn't point to a
>>>> file, and continue to build the GeoTiffReader and pass along the string
>>>> value of the URL.
>>>>
>>>> The current s3-geotiff reader reads byte chunks from s3 and does not
>>>> support COG.  It also uses a non-standard format where the protocol is s3,
>>>> and it strips out the last 2 parts of the path to determine the bucket and
>>>> key.  Then it relies on configuration to determine the region, etc.   The
>>>> Azure GeoTIFF reader I built follows the exact same code structure but
>>>> implements the Azure API, but allows azure-specific "wasb://" and
>>>> "wasbs://" protocols in the URLs.  I was well on my way to creating a
>>>> custom "HttpGeoTiffReader" when I started encountering the issues that
>>>> started this thread.  All that is to say, none of that seems optimal and
>>>> building support for VFS would mean all of these can be replaced, as it
>>>> supports all major cloud providers and would have baked-in COG support.
>>>> Pretty exciting :)
>>>>
>>>> Additionally, it would be interesting to support some lightweight
>>>> in-memory caching (or potentially external caching?) in the same way the
>>>> s3-geotiff reader uses ehcache.  And finally, support for async requests to
>>>> assist when there may be multiple concurrent images/granules being
>>>> requested.
>>>>
>>>> Thanks!
>>>>
>>>> Josh
>>>>
>>>> On Wed, Jun 26, 2019 at 10:13 AM Daniele Romagnoli <
>>>> daniele.romagn...@geo-solutions.it> wrote:
>>>>
>>>>> Hi Josh,
>>>>> I might be back with some more helpful feedbacks in the next few days
>>>>> but I wanted to provide at least my reply right now, since I started
>>>>> ImageIO-EXT/GeoTools/GeoServer - GDAL support/integration many years ago.
>>>>> I think that supporting VFS would be a great and interesting
>>>>> contribution to the project.
>>>>> I need to check back the code more in detail since it's a couple of
>>>>> years I didn't touch it, so I don't have precise feedbacks right now :)
>>>>> Anyway, I think that a couple of key points are:
>>>>> - updating the ImageIO-EXT low level reader machinery to handle that
>>>>> new type of input. ImageReaders have a "setInput" method accepting an
>>>>> object. The ImageReaderSPI will tell which types of input objects are
>>>>> supported. SPI also have a canDecodeInput method which tell if the 
>>>>> provided
>>>>> input can be decoded. I think that the currently supported inputs are
>>>>> Files, URLs, Strings (representing URL or Files), and
>>>>> FileImageInputStream*. Note that, in the past, we also had to support an
>>>>> ECWP protocol as format requirement, so we had to setup an ECWP input
>>>>> stream. However, I think that it has never being used within
>>>>> GeoTools/GeoServer in practice.
>>>>> - updating the GeoTools GDAL base classes. As you said, the
>>>>> RasterLayerResponse is trying to setup a FileImageInputStreamExt because
>>>>> the 99% of the GDAL ImageIO-Ext readers use that. However, if my memory
>>>>> serves me right, the base GridCoverage2DReader abstract classes accept a
>>>>> generic Object input (as well as the ImageReader as said before) so we may
>>>>> "relax" the implementation to also support the VFS.
>>>>>
>>>>> As I said, sorry for these "generic" feedbacks. Hope they can be used
>>>>> as a starting point for further investigations.
>>>>>
>>>>> Best Regards,
>>>>> Daniele
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 25, 2019 at 5:36 PM Josh Fix via GeoTools-Devel <
>>>>> geotools-devel@lists.sourceforge.net> wrote:
>>>>>
>>>>>> Hi all.  I have code that builds an ImageMosaicReader that utilizes a
>>>>>> custom format and input stream SPI.  I define the classes I want to use 
>>>>>> in
>>>>>> the Properties class used by the GranuleCatalog and in the
>>>>>> CatalogConfigurationBean used by the
>>>>>> ImageMosaicDescriptor/ImageMosaicReader.  Generally, it looks something
>>>>>> like this:
>>>>>>
>>>>>> props.put(Utils.Prop.SUGGESTED_IS_SPI,
>>>>>> MyImageInputStreamSpi.class.getCanonicalName());
>>>>>> props.put(Utils.Prop.SUGGESTED_FORMAT,
>>>>>> MyGeoTiffFormat.class.getCanonicalName());
>>>>>> props.put(Utils.Prop.SUGGESTED_SPI,
>>>>>> "it.geosolutions.imageioimpl.plugins.tiff.TIFFImageReaderSpi");
>>>>>>
>>>>>> This has worked great in the past using custom input stream classes
>>>>>> to read from S3 and Azure.  My current goal is to be able to create 
>>>>>> mosaics
>>>>>> from any given http endpoint, but specifically focus on cloud optimized
>>>>>> geotiffs (
>>>>>> https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF#HowtoreaditwithGDAL)
>>>>>> using the "vsicurl" VFS.  I've written GDAL code in the past to open
>>>>>> geotiffs on S3 using the /vsis3 VFS and it was incredibly efficient, so I
>>>>>> sought out to do the same in GeoTools for my mosaic.
>>>>>>
>>>>>> I was delighted to discover many modules supporting GDAL readers.  I
>>>>>> started to experiment with
>>>>>> https://github.com/geosolutions-it/imageio-ext/blob/master/plugin/gdal/gdalgeotiff/src/main/java/it/geosolutions/imageio/plugins/geotiff/GeoTiffImageReader.java
>>>>>>  and
>>>>>> created my own Format implementation using the packages in
>>>>>> https://github.com/geotools/geotools/tree/master/modules/plugin/imageio-ext-gdal/src/main/java/org/geotools/coverageio/gdal
>>>>>>  as
>>>>>> templates.  I created a simple GeoTiffReader class that extends from
>>>>>> BaseGdalReader, which is responsible for creating the 
>>>>>> GeoTiffImageReaderSpi
>>>>>> and GeoTiffFormat instances.
>>>>>>
>>>>>> A VFS URL takes the form: "/vsicurl/
>>>>>> https://s3-us-west-2.amazonaws.com/landsat-pds/c1/L8/153/075/LC08_L1TP_153075_20190515_20190515_01_RT/LC08_L1TP_153075_20190515_20190515_01_RT_B3.TIF";...
>>>>>> simply taking the http/https url an prefixing it with "/vsicurl/".  This
>>>>>> string can be passed directly to `gdal.Open()` and it will return a 
>>>>>> Dataset.
>>>>>>
>>>>>> The issue is that the input object that ends up being passed to
>>>>>> BaseGridCoverage2DReader ends up expecting a file, either in the form of 
>>>>>> a
>>>>>> URL, FileInputStreamExt, or String (see
>>>>>> BaseGridCoverage2DReader.checkSource).  I thought I might be able to 
>>>>>> trick
>>>>>> it by passing in the vsicurl path prefixed with the file protocol
>>>>>> (file:///vsicurl/https://s3-us-west...).  The issue is that when
>>>>>> Java returns the file path, it removes the second forward slash from
>>>>>> https://, leaving you with the path "/vsicurl/https:/s3-us-west...",
>>>>>> which gdal does not like.
>>>>>>
>>>>>> I duplicated a sufficient number of GeoTools classes to be able to
>>>>>> add in simple string replacements for "https:/" -> "https://"; where
>>>>>> needed.  This allowed me to create the reader and successfully create the
>>>>>> metadata object (in GDALImageReader.setInput).  The 
>>>>>> BseGridCoverage2DReader
>>>>>> constructor completes fully and all of the coverage properties, layout,
>>>>>> resolution info, etc is built.  At this point I was very hopeful that the
>>>>>> only change necessary was modifying these classes to not only support
>>>>>> files, but also strings that begin with "/vsi".
>>>>>>
>>>>>> Unfortunately this idea fell apart when I called "reader.read()".
>>>>>> Long story short, the RasterLayerResponse object tries to create
>>>>>> a FileImageInputStreamExtImpl, so it's not just reading the dataset using
>>>>>> `gdal.Open`.  At this point I'm not 100% certain what the best path 
>>>>>> forward
>>>>>> would be.  I know conversations in the past have centered around adding 
>>>>>> and
>>>>>> supporting custom protocols to java.net.URL, but this is unique in that
>>>>>> these "paths" are not valid URLs or file paths due to the double forward
>>>>>> slash in the middle of the path.
>>>>>>
>>>>>> I would like to know if supporting VFS is something the community is
>>>>>> interested in.  I intend to continue to work on a solution for myself, 
>>>>>> but
>>>>>> would love to work with you all and contribute back if there is interest.
>>>>>>
>>>>>> I apologize for the long read.  I hope everything makes sense and
>>>>>> please let me know if there are any questions.
>>>>>>
>>>>>> Josh
>>>>>> _______________________________________________
>>>>>> GeoTools-Devel mailing list
>>>>>> GeoTools-Devel@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/geotools-devel
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Daniele Romagnoli
>>>>> ==
>>>>> GeoServer Professional Services from the experts! Visit
>>>>> http://goo.gl/it488V for more information.
>>>>> ==
>>>>>
>>>>> Ing. Daniele Romagnoli
>>>>> Senior Software Engineer
>>>>>
>>>>> GeoSolutions S.A.S.
>>>>> Via di Montramito 3/A
>>>>> <https://www.google.com/maps/search/Via+di+Montramito+3%2FA+55054+%C2%A0Massarosa?entry=gmail&source=g>
>>>>> 55054  Massarosa
>>>>> <https://www.google.com/maps/search/Via+di+Montramito+3%2FA+55054+%C2%A0Massarosa?entry=gmail&source=g>
>>>>> (LU)
>>>>> Italy
>>>>> phone: +39 0584 962313
>>>>> fax:      +39 0584 1660272
>>>>>
>>>>> http://www.geo-solutions.it
>>>>> http://twitter.com/geosolutions_it
>>>>>
>>>>> -------------------------------------------------------
>>>>>
>>>>> Con riferimento alla normativa sul trattamento dei dati personali
>>>>> (Reg. UE 2016/679 - Regolamento generale sulla protezione dei dati 
>>>>> “GDPR”),
>>>>> si precisa che ogni circostanza inerente alla presente email (il suo
>>>>> contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è
>>>>> riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il
>>>>> messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra
>>>>> operazione è illecita. Le sarei comunque grato se potesse darmene notizia.
>>>>>
>>>>> This email is intended only for the person or entity to which it is
>>>>> addressed and may contain information that is privileged, confidential or
>>>>> otherwise protected from disclosure. We remind that - as provided by
>>>>> European Regulation 2016/679 “GDPR” - copying, dissemination or use of 
>>>>> this
>>>>> e-mail or the information herein by anyone other than the intended
>>>>> recipient is prohibited. If you have received this email by mistake, 
>>>>> please
>>>>> notify us immediately by telephone or e-mail.
>>>>>
>>>> _______________________________________________
>>>> GeoTools-Devel mailing list
>>>> GeoTools-Devel@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/geotools-devel
>>>>
>>>
>>>
>>> --
>>>
>>> Regards, Andrea Aime == GeoServer Professional Services from the
>>> experts! Visit http://goo.gl/it488V for more information. == Ing.
>>> Andrea Aime @geowolf Technical Lead GeoSolutions S.A.S. Via di
>>> Montramito 3/A 55054 Massarosa
>>> <https://www.google.com/maps/search/Via+di+Montramito+3%2FA%0D%0A55054++Massarosa?entry=gmail&source=g>
>>> (LU) phone: +39 0584 962313 fax: +39 0584 1660272 mob: +39 339 8844549
>>> http://www.geo-solutions.it http://twitter.com/geosolutions_it
>>> ------------------------------------------------------- *Con
>>> riferimento alla normativa sul trattamento dei dati personali (Reg. UE
>>> 2016/679 - Regolamento generale sulla protezione dei dati “GDPR”), si
>>> precisa che ogni circostanza inerente alla presente email (il suo
>>> contenuto, gli eventuali allegati, etc.) è un dato la cui conoscenza è
>>> riservata al/i solo/i destinatario/i indicati dallo scrivente. Se il
>>> messaggio Le è giunto per errore, è tenuta/o a cancellarlo, ogni altra
>>> operazione è illecita. Le sarei comunque grato se potesse darmene notizia.
>>> This email is intended only for the person or entity to which it is
>>> addressed and may contain information that is privileged, confidential or
>>> otherwise protected from disclosure. We remind that - as provided by
>>> European Regulation 2016/679 “GDPR” - copying, dissemination or use of this
>>> e-mail or the information herein by anyone other than the intended
>>> recipient is prohibited. If you have received this email by mistake, please
>>> notify us immediately by telephone or e-mail.*
>>>
>>
>
> --
> Josh Fix
> Shuttle Commander
> Planet Federal
> +1 321.444.0412
> j...@federal.planet.com
> https://federal.planet.com
> _______________________________________________
> GeoTools-Devel mailing list
> GeoTools-Devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geotools-devel
>
-- 
--
Jody Garnett
_______________________________________________
GeoTools-Devel mailing list
GeoTools-Devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to