Re: Adding new columns to existing Hive tables

Prasad Chakka Tue, 09 Mar 2010 13:24:48 -0800

All it says is that when you change metadata, the underlying data is not 
reformatted to juggle around to fit the new metadata.


Eg.
If your data has 3 columns and they are named a, b, c in metadata and if you 
replace this set of names with "new_a, d, new_b, new_c" and you shouldn't 
expect columns new_b, & new_c to have same values as old columns b and c.

But in most cases you will be a column at the end of existing list and they 
will return null if such a column doesn't exist in data.


________________________________
From: Ryan LeCompte <[email protected]>
Reply-To: <[email protected]>
Date: Tue, 9 Mar 2010 12:24:50 -0800
To: <[email protected]>
Subject: Adding new columns to existing Hive tables

It looks like we can add columns to existing tables via:

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT 
col_comment], ...)

However, I see the following comment in the Hive docs:


"NOTE: These commands will only modify Hive's metadata, and will NOT
reorganize or reformat existing data. Users should make sure the actual
data layout conforms with the metadata definition."



Question: If we already have a table that has lots of data in it, and I execute 
the above statement to add a column, will I still be able to query existing 
data? Or do I need to re-import somehow all of the data and fill in a value for 
the new column? The idea is to be able to add a new column, and make sure that 
the column value exists for all NEW partitions in the same table. I would hate 
to have to reload all of the old data just to specify a NULL value for the new 
column.


Will this work as expected or a data re-load is necessary every time we add a 
new column to be able to still query older data?

Thanks!

Ryan

Re: Adding new columns to existing Hive tables

Reply via email to