[ 
https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan updated IMPALA-8549:
--------------------------
    Description: 
Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{000000_0.deflate}}). Impala currently does not support reading {{.deflate}} 
text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not 
one of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
[https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).

Currently, the frontend assigns a compression type to a file depending on its 
extension. For instance, the functional_text_def database is stored as a file 
with a .deflate extension and is assigned the compression type DEFLATE. The 
HdfsTextScanner class receives this value and uses it directly to create a 
decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
without extensions, so the frontend interprets their compression type as NONE. 
However, in the backend, each of their corresponding scanners implement custom 
logic of their own to read file headers and override the existing NONE 
compression type of files to new values, such as DEFAULT or DEFLATE.

  was:
Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
text files stored using zlib / deflate (results in files such as 
{{000000_0.deflate}}). Impala currently does not support reading {{.deflate}} 
files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one 
of the enabled plugins: 'LZO'}}.

Moreover, the default compression codec in Hadoop is zlib / deflate (see 
{{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
will be written by default.

Impala does support zlib / deflate with other file formats though: Avro, 
RCFiles, SequenceFiles (see 
https://impala.apache.org/docs/build/html/topics/impala_file_formats.html).


> Add support for scanning DEFLATE text files
> -------------------------------------------
>
>                 Key: IMPALA-8549
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8549
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Ethan
>            Priority: Minor
>              Labels: ramp-up
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing 
> text files stored using zlib / deflate (results in files such as 
> {{000000_0.deflate}}). Impala currently does not support reading {{.deflate}} 
> text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is 
> not one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see 
> {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, 
> if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files 
> will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, 
> RCFiles, SequenceFiles (see 
> [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).
> Currently, the frontend assigns a compression type to a file depending on its 
> extension. For instance, the functional_text_def database is stored as a file 
> with a .deflate extension and is assigned the compression type DEFLATE. The 
> HdfsTextScanner class receives this value and uses it directly to create a 
> decompressor. The functional_\{avro,seq,rc}_databases are stored as files 
> without extensions, so the frontend interprets their compression type as 
> NONE. However, in the backend, each of their corresponding scanners implement 
> custom logic of their own to read file headers and override the existing NONE 
> compression type of files to new values, such as DEFAULT or DEFLATE.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to