[
https://issues.apache.org/jira/browse/IMPALA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Armstrong resolved IMPALA-580.
----------------------------------
Resolution: Cannot Reproduce
> Inconsistent or blank fileFormats values passed to CM
> -----------------------------------------------------
>
> Key: IMPALA-580
> URL: https://issues.apache.org/jira/browse/IMPALA-580
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 1.1
> Environment: Impala 1.1.0 and CM 4.6.2.
> Reporter: John Russell
> Priority: Minor
>
> In the CM "Query Details" page, one of the fields is "File Formats". If I
> query a table created with STORED AS SEQFILE with the BZip2 compression
> codec, CM shows a line like:
> File Formats: SEQUENCE_FILE/BZIP2
> That seems intuitive. However, for other combinations of file format and
> compression codec, the "File Formats" value is blank or seems misleading.
> select * from seqfile_snappy limit 5 -> file formats in CM is blank
> select * from rcfile_snappy limit 5 -> file formats in CM is blank
> select count(*) from seqfile_deflate -> file formats in CM =
> SEQUENCE_FILE/DEFAULT
> select count(*) from rcfile_deflate -> file formats in CM = RC_FILE/DEFAULT
> (is DEFAULT a typo for DEFLATE since this happens for both SEQFILE and RCFILE
> tables?)
> select count(*) from parquet_snappy -> file formats = PARQUET/NONE
> I also see PARQUET/NONE for a Parquet table compressed with GZip.
> I also see PARQUET/NONE for a Parquet table where the Impala data directory
> contains data files compressed with different codecs. I understand CM could
> in some cases display multiple values in this "File Formats" field, and
> that's what I'd expect to happen in this case. (The same way I'd expect
> multiple "File Formats" values for a join of tables with different file
> formats, or a query against a partitioned table where partitions had
> different file formats.)
> I did not have an LZO-compressed text table, so I didn't check if that case
> would produce TEXT/LZO as expected.
> I did not have an Avro table, so I didn't check those combinations.
> I did not check Avro, SEQFILE, or RCFILE with data files from more than one
> compression codec in the same directory.
> Other than the above cases, I think I checked every combination of file
> format and codec, and the only issues I saw were those I listed.
> impala-shell PROFILE output or CM profile text available if desired.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]