Quanlong Huang created IMPALA-12487:
---------------------------------------

             Summary: Skip reloading file metadata for ALTER_TABLE events with 
trivial changes in StorageDescriptor
                 Key: IMPALA-12487
                 URL: https://issues.apache.org/jira/browse/IMPALA-12487
             Project: IMPALA
          Issue Type: Improvement
            Reporter: Quanlong Huang
         Attachments: ALTER_TABLE_event_with_SD_changes.png

IMPALA-11534 skips reloading file metadata for some trivial ALTER_TABLE events. 
However, ALTER_TABLE events that have trivial changes in StorageDescriptor are 
not handled in IMPALA-11534. Some of them can skip reloading file metadata. The 
thrift defination of StorageDescriptor (not all of the fields are related to 
file metadata):
{code:java}
// this object holds all the information about physical storage of the data 
belonging to a table
struct StorageDescriptor {
  1: list<FieldSchema> cols,  // required (refer to types defined above)
  2: string location,         // defaults to <warehouse loc>/<db loc>/tablename
  3: string inputFormat,      // SequenceFileInputFormat (binary) or 
TextInputFormat`  or custom format
  4: string outputFormat,     // SequenceFileOutputFormat (binary) or 
IgnoreKeyTextOutputFormat or custom format
  5: bool   compressed,       // compressed or not
  6: i32    numBuckets,       // this must be specified if there are any 
dimension columns
  7: SerDeInfo    serdeInfo,  // serialization and deserialization information
  8: list<string> bucketCols, // reducer grouping columns and clustering 
columns and bucketing columns`
  9: list<Order>  sortCols,   // sort order of the data in each bucket
  10: map<string, string> parameters, // any user supplied key value hash
  11: optional SkewedInfo skewedInfo, // skewed information
  12: optional bool   storedAsSubDirectories       // stored as subdirectories 
or not
} {code}
The attached screenshot is an example comparing the before and after Table 
object of an ALTER_TABLE event that has trivial changes in StorageDescriptor. 
It just clears the field of 'storedAsSubDirectories:false', and that field 
defaults to be false. So actually makes no difference in the StorageDescriptor.

I think we can compare changes in the StorageDescriptor and only reload file 
metadata if any of these changes:
 * 'location'
 * 'storedAsSubDirectories'

Note that the default of 'storedAsSubDirectories' is false so removing 
'storedAsSubDirectories:false' is considered as unchanged.

CC [~hemanth619], [~csringhofer] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to