[ https://issues.apache.org/jira/browse/IMPALA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong resolved IMPALA-580. ---------------------------------- Resolution: Cannot Reproduce > Inconsistent or blank fileFormats values passed to CM > ----------------------------------------------------- > > Key: IMPALA-580 > URL: https://issues.apache.org/jira/browse/IMPALA-580 > Project: IMPALA > Issue Type: Bug > Components: Backend > Affects Versions: Impala 1.1 > Environment: Impala 1.1.0 and CM 4.6.2. > Reporter: John Russell > Priority: Minor > > In the CM "Query Details" page, one of the fields is "File Formats". If I > query a table created with STORED AS SEQFILE with the BZip2 compression > codec, CM shows a line like: > File Formats: SEQUENCE_FILE/BZIP2 > That seems intuitive. However, for other combinations of file format and > compression codec, the "File Formats" value is blank or seems misleading. > select * from seqfile_snappy limit 5 -> file formats in CM is blank > select * from rcfile_snappy limit 5 -> file formats in CM is blank > select count(*) from seqfile_deflate -> file formats in CM = > SEQUENCE_FILE/DEFAULT > select count(*) from rcfile_deflate -> file formats in CM = RC_FILE/DEFAULT > (is DEFAULT a typo for DEFLATE since this happens for both SEQFILE and RCFILE > tables?) > select count(*) from parquet_snappy -> file formats = PARQUET/NONE > I also see PARQUET/NONE for a Parquet table compressed with GZip. > I also see PARQUET/NONE for a Parquet table where the Impala data directory > contains data files compressed with different codecs. I understand CM could > in some cases display multiple values in this "File Formats" field, and > that's what I'd expect to happen in this case. (The same way I'd expect > multiple "File Formats" values for a join of tables with different file > formats, or a query against a partitioned table where partitions had > different file formats.) > I did not have an LZO-compressed text table, so I didn't check if that case > would produce TEXT/LZO as expected. > I did not have an Avro table, so I didn't check those combinations. > I did not check Avro, SEQFILE, or RCFILE with data files from more than one > compression codec in the same directory. > Other than the above cases, I think I checked every combination of file > format and codec, and the only issues I saw were those I listed. > impala-shell PROFILE output or CM profile text available if desired. -- This message was sent by Atlassian JIRA (v7.6.3#76005)