Re: Codec value missing from Turbodbc files? Format issue?

Uwe L. Korn Mon, 20 Nov 2017 09:26:42 -0800

The files are produced by Parquet C++ through pyarrow. Turbodbc cannot itself 
write Parquet, it only talks ODBC with a database and then returns Arrow 
tables/Pandas Dataframes. The conversion Arrow -> Parquet is done in pyarrow.


Additionally I would add zstandard +... that were recently added to the Parquet 
standard to parquet-cpp quite soon. This is nice for users that only use tools 
that are on the newest version of Parquet, for older tools we will probably see 
the above error more often as people will use the new codecs despite warnings 
in the documentation.

Uwe

(note that besides being involved in Arrow and Parquet, I'm one of the two 
turbodbc developers)

> Am 20.11.2017 um 17:44 schrieb Jacques Nadeau <[email protected]>:
> 
> Got it, nice catch. Thanks for the help!
> 
> On Mon, Nov 20, 2017 at 8:42 AM, Ryan Blue <[email protected]>
> wrote:
> 
>> The file that the user posted is stored with Brotli compression. You should
>> be able to read it with the latest Parquet master. I can cat the contents
>> with our tools that use brotli.
>> 
>> I'm surprised to see files like this already. We added the new compression
>> codecs just recently. Also, whatever wrote this file should not default to
>> brotli and should warn users that using brotli compression breaks forward
>> compatibility: older readers can't read the files or metadata because of
>> how Thrift handles enums.
>> 
>> rb
>> 
>> On Mon, Nov 20, 2017 at 8:34 AM, Jacques Nadeau <[email protected]>
>> wrote:
>> 
>>> One of our community members hit an issue where we couldn't parse a
>> Parquet
>>> footer. It looks like the file is missing the Codec field for a column
>> but
>>> the Parquet Thrift spec expects one.
>>> 
>>> https://community.dremio.com/t/unable-to-read-parquet-
>>> footer-with-file-generated-with-turbodbc/474/9
>>> 
>>> Was there a recent change in format? Any thoughts would be appreciated.
>>> 
>>> thanks,
>>> Jacques
>>> 
>> 
>> 
>> 
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>

Re: Codec value missing from Turbodbc files? Format issue?

Reply via email to