Re: [Question] Change Column Type in Parquet File

Tim Armstrong Thu, 18 Jul 2019 07:48:38 -0700

You're welcome, it's always nice to hear that we're able to help out.

On Thu, Jul 18, 2019 at 8:42 PM Ronnie Huang <[email protected]> wrote:


> Hi Tim,
>
> You are really really helpful.
>
> I did testing in impala 3.2 and hive 2.0, they were both working fine. Out
> platform team is planning to upgrade impala and hive to fix this. We only
> need to update the metadata after the engine upgrading.
>
> Thank a lot and wish you have a nice day.
>
> Best Regards,
> Ronnie
> ------------------------------
> *From:* Tim Armstrong <[email protected]>
> *Sent:* Wednesday, July 17, 2019 12:50 PM
> *To:* Parquet Dev
> *Cc:* Ronnie Huang
> *Subject:* Re: [Question] Change Column Type in Parquet File
>
> I think generally the best solution, if it's supported by the tools you're
> using, is to do schema evolution by  *not* rewriting the files and just
> updating the metadata, and rely on the engine that's querying the table to
> promote the int32 to int64 if the parquet file has an int32 but the hive
> schema has an int64.
>
> E.g. the support has been added in Impala and Hive:
> https://issues.apache.org/jira/browse/HIVE-12080,
> https://issues.apache.org/jira/browse/IMPALA-6373. I'm not sure about
> other engines.
>
> Generally Parquet is not designed to support modifying files in-place - if
> you want to change a file's schema, you would regenerate the file.
>
> On Tue, Jul 16, 2019 at 8:38 PM Ronnie Huang <[email protected]>
> wrote:
>
> Hi Parquet Devs,
>
> Our team is working on userid changing from int to bigint in whole hadoop
> system. It's easy for us to quick refresh non-partitioned tables, however,
> more partitioned tables have huge partition files. We are trying to find a
> quick solution to change data type fast without refreshing partition one by
> one. That's why I send you this email.
>
> I take a look at your website https://github.com/apache/parquet-format to
> understand parquet format but I still confused on metadata, so l list
> following questions:
>
>   1.  If I want to change one column type, I need to change it in file
> metadata and column (chunk) metadata, am I right or missing anything?
>   2.  If I change one column type from int32 to int64 in file metadata and
> column (chunk) metadata directly, can compressed data be read correctly? If
> not, what's problem?
>
> Thank you so much for your time and we would be appreciated if you could
> reply.
>
> Best Regards,
> Ronnie
>
>
>

Re: [Question] Change Column Type in Parquet File

Reply via email to