Michael,

this error comes from libarrow-cpp.

Several potential causes:

- one of the parquet file is corrupted

- one of the parquet files is valid but uses "something" that libarrow-cpp can't understand. I mention this because we have seen an interoperability issues between files generated by the go implementation of parquet and libarrow-cpp. Not clear which end is the culprit. But the error message was different than yours

- there's an I/O error when getting data from S3

- some other bug in libarrow-cpp...

Perhaps run in --debug on mode to see in the traces which parquet file cause the issue ? (assuming that the error can also be reproduced when reading one of the component parquet files, and not just when reading the whole dataset...)

https://github.com/search?q=repo%3Aapache%2Farrow+TProtocolException%3A+Exceeded+size+limit&type=issues shows a number of issues where this error message pops up

Even

Le 16/11/2023 à 14:37, Smith, Michael ERDC-RDE-CRREL-NH CIV via gdal-dev a écrit :

Using gdal3.8 (ghcr.io/osgeo/gdal:ubuntu-full-3.8.0) , got an error I haven’t seen before:

ReadNext() failed: Couldn't deserialize thrift: TProtocolException: Exceeded size limit

Deserializing page header failed.

This happened at 92%

Command: ogr2ogr -f gpkg /data/overturemaps_2023_11_14.gpkg /vsis3/overturemaps-us-west-2/release/2023-11-14-alpha.0/theme=buildings/ theme=buildings -progress -NLT MULTIPOLYGON

So anyone know what this means and what caused it and any workarounds?

Mike

--

Michael Smith

US Army Corps of Engineers

Remote Sensing/GIS Center


_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to