Michael,
this error comes from libarrow-cpp.
Several potential causes:
- one of the parquet file is corrupted
- one of the parquet files is valid but uses "something" that
libarrow-cpp can't understand. I mention this because we have seen an
interoperability issues between files generated by the go implementation
of parquet and libarrow-cpp. Not clear which end is the culprit. But the
error message was different than yours
- there's an I/O error when getting data from S3
- some other bug in libarrow-cpp...
Perhaps run in --debug on mode to see in the traces which parquet file
cause the issue ? (assuming that the error can also be reproduced when
reading one of the component parquet files, and not just when reading
the whole dataset...)
https://github.com/search?q=repo%3Aapache%2Farrow+TProtocolException%3A+Exceeded+size+limit&type=issues
shows a number of issues where this error message pops up
Even
Le 16/11/2023 à 14:37, Smith, Michael ERDC-RDE-CRREL-NH CIV via gdal-dev
a écrit :
Using gdal3.8 (ghcr.io/osgeo/gdal:ubuntu-full-3.8.0) , got an error I
haven’t seen before:
ReadNext() failed: Couldn't deserialize thrift: TProtocolException:
Exceeded size limit
Deserializing page header failed.
This happened at 92%
Command: ogr2ogr -f gpkg /data/overturemaps_2023_11_14.gpkg
/vsis3/overturemaps-us-west-2/release/2023-11-14-alpha.0/theme=buildings/
theme=buildings -progress -NLT MULTIPOLYGON
So anyone know what this means and what caused it and any workarounds?
Mike
--
Michael Smith
US Army Corps of Engineers
Remote Sensing/GIS Center
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev