[ 
https://issues.apache.org/jira/browse/IMPALA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-580.
----------------------------------
    Resolution: Cannot Reproduce

> Inconsistent or blank fileFormats values passed to CM
> -----------------------------------------------------
>
>                 Key: IMPALA-580
>                 URL: https://issues.apache.org/jira/browse/IMPALA-580
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 1.1
>         Environment: Impala 1.1.0 and CM 4.6.2.
>            Reporter: John Russell
>            Priority: Minor
>
> In the CM "Query Details" page, one of the fields is "File Formats". If I 
> query a table created with STORED AS SEQFILE with the BZip2 compression 
> codec, CM shows a line like:
> File Formats: SEQUENCE_FILE/BZIP2
> That seems intuitive. However, for other combinations of file format and 
> compression codec, the "File Formats" value is blank or seems misleading. 
> select * from seqfile_snappy limit 5 -> file formats in CM is blank
> select * from rcfile_snappy limit 5 -> file formats in CM is blank
> select count(*) from seqfile_deflate -> file formats in CM = 
> SEQUENCE_FILE/DEFAULT
> select count(*) from rcfile_deflate -> file formats in CM = RC_FILE/DEFAULT 
> (is DEFAULT a typo for DEFLATE since this happens for both SEQFILE and RCFILE 
> tables?)
> select count(*) from parquet_snappy -> file formats =  PARQUET/NONE
> I also see PARQUET/NONE for a Parquet table compressed with GZip.
> I also see PARQUET/NONE for a Parquet table where the Impala data directory 
> contains data files compressed with different codecs. I understand CM could 
> in some cases display multiple values in this "File Formats" field, and 
> that's what I'd expect to happen in this case. (The same way I'd expect 
> multiple "File Formats" values for a join of tables with different file 
> formats, or a query against a partitioned table where partitions had 
> different file formats.)
> I did not have an LZO-compressed text table, so I didn't check if that case 
> would produce TEXT/LZO as expected.
> I did not have an Avro table, so I didn't check those combinations.
> I did not check Avro, SEQFILE, or RCFILE with data files from more than one 
> compression codec in the same directory.
> Other than the above cases, I think I checked every combination of file 
> format and codec, and the only issues I saw were those I listed.
> impala-shell PROFILE output or CM profile text available if desired.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to