[
https://issues.apache.org/jira/browse/IMPALA-12487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831368#comment-17831368
]
ASF subversion and git services commented on IMPALA-12487:
----------------------------------------------------------
Commit 52b11ab6aa0dbd2d99e6d583d740858b9f892b5f in impala's branch
refs/heads/master from Sai Hemanth Gantasala
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=52b11ab6a ]
IMPALA-12487: Skip reloading file metadata for ALTER_TABLE events with
trivial changes in StorageDescriptor
IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE
events. However, ALTER_TABLE events that have trivial changes in
StorageDescriptor are not handled in IMPALA-11534. The only changes
that require file metadata reload are: location, rowformat, fileformat,
serde, and storedAsSubDirectories. The file metadata reload can be
skipped for all other changes in SD.
Testing:
1) Manual testing by changing SD parameters in local environment.
2) Added unit tests for the same in MetastoreEventsProcessorTest class.
Change-Id: I6fd9a9504bf93d2529dc7accbf436ad83e51d8ac
Reviewed-on: http://gerrit.cloudera.org:8080/21019
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Skip reloading file metadata for ALTER_TABLE events with trivial changes in
> StorageDescriptor
> ---------------------------------------------------------------------------------------------
>
> Key: IMPALA-12487
> URL: https://issues.apache.org/jira/browse/IMPALA-12487
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Quanlong Huang
> Assignee: Sai Hemanth Gantasala
> Priority: Critical
> Attachments: ALTER_TABLE_event_with_SD_changes.png
>
>
> IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE
> events. However, ALTER_TABLE events that have trivial changes in
> StorageDescriptor are not handled in IMPALA-11534. Some of them can skip
> reloading file metadata. The thrift defination of StorageDescriptor (not all
> of the fields are related to file metadata):
> {code:java}
> // this object holds all the information about physical storage of the data
> belonging to a table
> struct StorageDescriptor {
> 1: list<FieldSchema> cols, // required (refer to types defined above)
> 2: string location, // defaults to <warehouse loc>/<db
> loc>/tablename
> 3: string inputFormat, // SequenceFileInputFormat (binary) or
> TextInputFormat` or custom format
> 4: string outputFormat, // SequenceFileOutputFormat (binary) or
> IgnoreKeyTextOutputFormat or custom format
> 5: bool compressed, // compressed or not
> 6: i32 numBuckets, // this must be specified if there are any
> dimension columns
> 7: SerDeInfo serdeInfo, // serialization and deserialization information
> 8: list<string> bucketCols, // reducer grouping columns and clustering
> columns and bucketing columns`
> 9: list<Order> sortCols, // sort order of the data in each bucket
> 10: map<string, string> parameters, // any user supplied key value hash
> 11: optional SkewedInfo skewedInfo, // skewed information
> 12: optional bool storedAsSubDirectories // stored as
> subdirectories or not
> } {code}
> The attached screenshot is an example comparing the before and after Table
> object of an ALTER_TABLE event that has trivial changes in StorageDescriptor.
> It just clears the field of 'storedAsSubDirectories:false', and that field
> defaults to be false. So actually makes no difference in the
> StorageDescriptor.
> I think we can compare changes in the StorageDescriptor and only reload file
> metadata if any of these changes:
> * 'location'
> * 'storedAsSubDirectories'
> Note that the default of 'storedAsSubDirectories' is false so removing
> 'storedAsSubDirectories:false' is considered as unchanged.
> CC [~hemanth619], [~csringhofer]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]