Re: [Question] Change Column Type in Parquet File

Ronnie Huang Thu, 18 Jul 2019 05:36:11 -0700

Hi Tim,

You are really really helpful.


I did testing in impala 3.2 and hive 2.0, they were both working fine. Out 
platform team is planning to upgrade impala and hive to fix this. We only need 
to update the metadata after the engine upgrading.

Thank a lot and wish you have a nice day.

Best Regards,
Ronnie
________________________________
From: Tim Armstrong <[email protected]>
Sent: Wednesday, July 17, 2019 12:50 PM
To: Parquet Dev
Cc: Ronnie Huang
Subject: Re: [Question] Change Column Type in Parquet File

I think generally the best solution, if it's supported by the tools you're 
using, is to do schema evolution by  *not* rewriting the files and just 
updating the metadata, and rely on the engine that's querying the table to 
promote the int32 to int64 if the parquet file has an int32 but the hive schema 
has an int64.

E.g. the support has been added in Impala and Hive: 
https://issues.apache.org/jira/browse/HIVE-12080, 
https://issues.apache.org/jira/browse/IMPALA-6373. I'm not sure about other 
engines.

Generally Parquet is not designed to support modifying files in-place - if you 
want to change a file's schema, you would regenerate the file.

On Tue, Jul 16, 2019 at 8:38 PM Ronnie Huang 
<[email protected]<mailto:[email protected]>> wrote:
Hi Parquet Devs,

Our team is working on userid changing from int to bigint in whole hadoop 
system. It's easy for us to quick refresh non-partitioned tables, however, more 
partitioned tables have huge partition files. We are trying to find a quick 
solution to change data type fast without refreshing partition one by one. 
That's why I send you this email.

I take a look at your website https://github.com/apache/parquet-format to 
understand parquet format but I still confused on metadata, so l list following 
questions:

  1.  If I want to change one column type, I need to change it in file metadata 
and column (chunk) metadata, am I right or missing anything?
  2.  If I change one column type from int32 to int64 in file metadata and 
column (chunk) metadata directly, can compressed data be read correctly? If 
not, what's problem?

Thank you so much for your time and we would be appreciated if you could reply.

Best Regards,
Ronnie

Re: [Question] Change Column Type in Parquet File

Reply via email to