The files are produced by Parquet C++ through pyarrow. Turbodbc cannot itself write Parquet, it only talks ODBC with a database and then returns Arrow tables/Pandas Dataframes. The conversion Arrow -> Parquet is done in pyarrow.
Additionally I would add zstandard +... that were recently added to the Parquet standard to parquet-cpp quite soon. This is nice for users that only use tools that are on the newest version of Parquet, for older tools we will probably see the above error more often as people will use the new codecs despite warnings in the documentation. Uwe (note that besides being involved in Arrow and Parquet, I'm one of the two turbodbc developers) > Am 20.11.2017 um 17:44 schrieb Jacques Nadeau <[email protected]>: > > Got it, nice catch. Thanks for the help! > > On Mon, Nov 20, 2017 at 8:42 AM, Ryan Blue <[email protected]> > wrote: > >> The file that the user posted is stored with Brotli compression. You should >> be able to read it with the latest Parquet master. I can cat the contents >> with our tools that use brotli. >> >> I'm surprised to see files like this already. We added the new compression >> codecs just recently. Also, whatever wrote this file should not default to >> brotli and should warn users that using brotli compression breaks forward >> compatibility: older readers can't read the files or metadata because of >> how Thrift handles enums. >> >> rb >> >> On Mon, Nov 20, 2017 at 8:34 AM, Jacques Nadeau <[email protected]> >> wrote: >> >>> One of our community members hit an issue where we couldn't parse a >> Parquet >>> footer. It looks like the file is missing the Codec field for a column >> but >>> the Parquet Thrift spec expects one. >>> >>> https://community.dremio.com/t/unable-to-read-parquet- >>> footer-with-file-generated-with-turbodbc/474/9 >>> >>> Was there a recent change in format? Any thoughts would be appreciated. >>> >>> thanks, >>> Jacques >>> >> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >>
