[ 
https://issues.apache.org/jira/browse/IMPALA-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172650#comment-17172650
 ] 

ASF subversion and git services commented on IMPALA-9004:
---------------------------------------------------------

Commit dbbd40308a6d1cef77bfe45e016e775c918e0539 in impala's branch 
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dbbd403 ]

IMPALA-10005: Fix Snappy decompression for non-block filesystems

Snappy-compressed text always uses THdfsCompression::SNAPPY_BLOCKED
type compression in the backend. However, for non-block filesystems,
the frontend is incorrectly passing THdfsCompression::SNAPPY instead.
On debug builds, this leads to a DCHECK when trying to read
Snappy-compressed text. On release builds, it fails to decompress
the data.

This fixes the frontend to always pass THdfsCompression::SNAPPY_BLOCKED
for Snappy-compressed text.

This reworks query_test/test_compressed_formats.py to provide better
coverage:
 - Changed the RC and Seq test cases to verify that the file extension
   doesn't matter. Added Avro to this case as well.
 - Fixed the text case to use appropriate extensions (fixing IMPALA-9004)
 - Changed the utility function so it doesn't use Hive. This allows it
   to be enabled on non-HDFS filesystems like S3.
 - Changed the test to use unique_database and allow parallel execution.
 - Changed the test to run in the core job, so it now has coverage on
   the usual S3 test configuration. It is reasonably quick (1-2 minutes)
   and runs in parallel.

Testing:
 - Exhaustive job
 - Core s3 job
 - Changed the frontend to force it to use the code for non-block
   filesystems (i.e. the TFileSplitGeneratorSpec code) and
   verified that it is now able to read Snappy-compressed text.

Change-Id: I0879f2fc0bf75bb5c15cecb845ece46a901601ac
Reviewed-on: http://gerrit.cloudera.org:8080/16278
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Sahil Takiar <[email protected]>


> TestCompressedFormats is broken for text files
> ----------------------------------------------
>
>                 Key: IMPALA-9004
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9004
>             Project: IMPALA
>          Issue Type: Test
>            Reporter: Sahil Takiar
>            Priority: Major
>
> While working onĀ IMPALA-8950, we made a fix to {{TestCompressedFormats}} so 
> that it actually checks the exit status of the {{hdfs dfs -cp}} command, 
> turns out that this command has been silently failing whenever 
> {{test_compressed_formats}} runs with {{file_format}} = {{text}}.
> For some reason, data load writes compressed text files with their 
> corresponding file compression suffix, but for compressed seq/rc files, it 
> does not:
> {code:java}
> hdfs dfs -ls /test-warehouse/tinytable_seq_*
> Found 1 items
> -rwxr-xr-x   3 systest supergroup        325 2019-08-22 14:32 
> /test-warehouse/tinytable_seq_bzip/000000_0
> Found 1 items
> -rwxr-xr-x   3 systest supergroup        215 2019-08-22 14:32 
> /test-warehouse/tinytable_seq_def/000000_0
> Found 1 items
> -rwxr-xr-x   3 systest supergroup        260 2019-08-22 14:32 
> /test-warehouse/tinytable_seq_gzip/000000_0
> Found 1 items
> -rwxr-xr-x   3 systest supergroup        301 2019-08-22 14:32 
> /test-warehouse/tinytable_seq_record_bzip/000000_0
> Found 1 items
> -rwxr-xr-x   3 systest supergroup        209 2019-08-22 14:32 
> /test-warehouse/tinytable_seq_record_def/000000_0
> Found 1 items
> -rwxr-xr-x   3 systest supergroup        242 2019-08-22 14:32 
> /test-warehouse/tinytable_seq_record_gzip/000000_0
> Found 1 items
> -rwxr-xr-x   3 systest supergroup        233 2019-08-22 14:32 
> /test-warehouse/tinytable_seq_record_snap/000000_0
> Found 2 items
> -rwxr-xr-x   3 systest supergroup        243 2019-08-22 14:32 
> /test-warehouse/tinytable_seq_snap/000000_0
> hdfs dfs -ls /test-warehouse/tinytable_text_*
> Found 1 items
> -rwxr-xr-x   3 systest supergroup         59 2019-08-22 14:32 
> /test-warehouse/tinytable_text_bzip/000000_0.bz2
> Found 1 items
> -rwxr-xr-x   3 systest supergroup         28 2019-08-22 14:32 
> /test-warehouse/tinytable_text_def/000000_0.deflate
> Found 1 items
> -rwxr-xr-x   3 systest supergroup         40 2019-08-22 14:32 
> /test-warehouse/tinytable_text_gzip/000000_0.gz
> Found 2 items
> -rwxr-xr-x   3 systest supergroup         87 2019-08-22 14:32 
> /test-warehouse/tinytable_text_lzo/000000_0.lzo
> -rw-r--r--   3 systest supergroup          8 2019-08-22 14:42 
> /test-warehouse/tinytable_text_lzo/000000_0.lzo.index
> Found 1 items
> -rwxr-xr-x   3 systest supergroup         41 2019-08-22 14:32 
> /test-warehouse/tinytable_text_snap/000000_0.snappy{code}
> Not sure if that is by design or not, but it is causing the tests to fail for 
> all text files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to