Quanlong Huang created IMPALA-12487:
---------------------------------------
Summary: Skip reloading file metadata for ALTER_TABLE events with
trivial changes in StorageDescriptor
Key: IMPALA-12487
URL: https://issues.apache.org/jira/browse/IMPALA-12487
Project: IMPALA
Issue Type: Improvement
Reporter: Quanlong Huang
Attachments: ALTER_TABLE_event_with_SD_changes.png
IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE events.
However, ALTER_TABLE events that have trivial changes in StorageDescriptor are
not handled in IMPALA-11534. Some of them can skip reloading file metadata. The
thrift defination of StorageDescriptor (not all of the fields are related to
file metadata):
{code:java}
// this object holds all the information about physical storage of the data
belonging to a table
struct StorageDescriptor {
1: list<FieldSchema> cols, // required (refer to types defined above)
2: string location, // defaults to <warehouse loc>/<db loc>/tablename
3: string inputFormat, // SequenceFileInputFormat (binary) or
TextInputFormat` or custom format
4: string outputFormat, // SequenceFileOutputFormat (binary) or
IgnoreKeyTextOutputFormat or custom format
5: bool compressed, // compressed or not
6: i32 numBuckets, // this must be specified if there are any
dimension columns
7: SerDeInfo serdeInfo, // serialization and deserialization information
8: list<string> bucketCols, // reducer grouping columns and clustering
columns and bucketing columns`
9: list<Order> sortCols, // sort order of the data in each bucket
10: map<string, string> parameters, // any user supplied key value hash
11: optional SkewedInfo skewedInfo, // skewed information
12: optional bool storedAsSubDirectories // stored as subdirectories
or not
} {code}
The attached screenshot is an example comparing the before and after Table
object of an ALTER_TABLE event that has trivial changes in StorageDescriptor.
It just clears the field of 'storedAsSubDirectories:false', and that field
defaults to be false. So actually makes no difference in the StorageDescriptor.
I think we can compare changes in the StorageDescriptor and only reload file
metadata if any of these changes:
* 'location'
* 'storedAsSubDirectories'
Note that the default of 'storedAsSubDirectories' is false so removing
'storedAsSubDirectories:false' is considered as unchanged.
CC [~hemanth619], [~csringhofer]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]