You're welcome, it's always nice to hear that we're able to help out. On Thu, Jul 18, 2019 at 8:42 PM Ronnie Huang <[email protected]> wrote:
> Hi Tim, > > You are really really helpful. > > I did testing in impala 3.2 and hive 2.0, they were both working fine. Out > platform team is planning to upgrade impala and hive to fix this. We only > need to update the metadata after the engine upgrading. > > Thank a lot and wish you have a nice day. > > Best Regards, > Ronnie > ------------------------------ > *From:* Tim Armstrong <[email protected]> > *Sent:* Wednesday, July 17, 2019 12:50 PM > *To:* Parquet Dev > *Cc:* Ronnie Huang > *Subject:* Re: [Question] Change Column Type in Parquet File > > I think generally the best solution, if it's supported by the tools you're > using, is to do schema evolution by *not* rewriting the files and just > updating the metadata, and rely on the engine that's querying the table to > promote the int32 to int64 if the parquet file has an int32 but the hive > schema has an int64. > > E.g. the support has been added in Impala and Hive: > https://issues.apache.org/jira/browse/HIVE-12080, > https://issues.apache.org/jira/browse/IMPALA-6373. I'm not sure about > other engines. > > Generally Parquet is not designed to support modifying files in-place - if > you want to change a file's schema, you would regenerate the file. > > On Tue, Jul 16, 2019 at 8:38 PM Ronnie Huang <[email protected]> > wrote: > > Hi Parquet Devs, > > Our team is working on userid changing from int to bigint in whole hadoop > system. It's easy for us to quick refresh non-partitioned tables, however, > more partitioned tables have huge partition files. We are trying to find a > quick solution to change data type fast without refreshing partition one by > one. That's why I send you this email. > > I take a look at your website https://github.com/apache/parquet-format to > understand parquet format but I still confused on metadata, so l list > following questions: > > 1. If I want to change one column type, I need to change it in file > metadata and column (chunk) metadata, am I right or missing anything? > 2. If I change one column type from int32 to int64 in file metadata and > column (chunk) metadata directly, can compressed data be read correctly? If > not, what's problem? > > Thank you so much for your time and we would be appreciated if you could > reply. > > Best Regards, > Ronnie > > >
