[
https://issues.apache.org/jira/browse/IMPALA-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172650#comment-17172650
]
ASF subversion and git services commented on IMPALA-9004:
---------------------------------------------------------
Commit dbbd40308a6d1cef77bfe45e016e775c918e0539 in impala's branch
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dbbd403 ]
IMPALA-10005: Fix Snappy decompression for non-block filesystems
Snappy-compressed text always uses THdfsCompression::SNAPPY_BLOCKED
type compression in the backend. However, for non-block filesystems,
the frontend is incorrectly passing THdfsCompression::SNAPPY instead.
On debug builds, this leads to a DCHECK when trying to read
Snappy-compressed text. On release builds, it fails to decompress
the data.
This fixes the frontend to always pass THdfsCompression::SNAPPY_BLOCKED
for Snappy-compressed text.
This reworks query_test/test_compressed_formats.py to provide better
coverage:
- Changed the RC and Seq test cases to verify that the file extension
doesn't matter. Added Avro to this case as well.
- Fixed the text case to use appropriate extensions (fixing IMPALA-9004)
- Changed the utility function so it doesn't use Hive. This allows it
to be enabled on non-HDFS filesystems like S3.
- Changed the test to use unique_database and allow parallel execution.
- Changed the test to run in the core job, so it now has coverage on
the usual S3 test configuration. It is reasonably quick (1-2 minutes)
and runs in parallel.
Testing:
- Exhaustive job
- Core s3 job
- Changed the frontend to force it to use the code for non-block
filesystems (i.e. the TFileSplitGeneratorSpec code) and
verified that it is now able to read Snappy-compressed text.
Change-Id: I0879f2fc0bf75bb5c15cecb845ece46a901601ac
Reviewed-on: http://gerrit.cloudera.org:8080/16278
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Sahil Takiar <[email protected]>
> TestCompressedFormats is broken for text files
> ----------------------------------------------
>
> Key: IMPALA-9004
> URL: https://issues.apache.org/jira/browse/IMPALA-9004
> Project: IMPALA
> Issue Type: Test
> Reporter: Sahil Takiar
> Priority: Major
>
> While working onĀ IMPALA-8950, we made a fix to {{TestCompressedFormats}} so
> that it actually checks the exit status of the {{hdfs dfs -cp}} command,
> turns out that this command has been silently failing whenever
> {{test_compressed_formats}} runs with {{file_format}} = {{text}}.
> For some reason, data load writes compressed text files with their
> corresponding file compression suffix, but for compressed seq/rc files, it
> does not:
> {code:java}
> hdfs dfs -ls /test-warehouse/tinytable_seq_*
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 325 2019-08-22 14:32
> /test-warehouse/tinytable_seq_bzip/000000_0
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 215 2019-08-22 14:32
> /test-warehouse/tinytable_seq_def/000000_0
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 260 2019-08-22 14:32
> /test-warehouse/tinytable_seq_gzip/000000_0
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 301 2019-08-22 14:32
> /test-warehouse/tinytable_seq_record_bzip/000000_0
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 209 2019-08-22 14:32
> /test-warehouse/tinytable_seq_record_def/000000_0
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 242 2019-08-22 14:32
> /test-warehouse/tinytable_seq_record_gzip/000000_0
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 233 2019-08-22 14:32
> /test-warehouse/tinytable_seq_record_snap/000000_0
> Found 2 items
> -rwxr-xr-x 3 systest supergroup 243 2019-08-22 14:32
> /test-warehouse/tinytable_seq_snap/000000_0
> hdfs dfs -ls /test-warehouse/tinytable_text_*
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 59 2019-08-22 14:32
> /test-warehouse/tinytable_text_bzip/000000_0.bz2
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 28 2019-08-22 14:32
> /test-warehouse/tinytable_text_def/000000_0.deflate
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 40 2019-08-22 14:32
> /test-warehouse/tinytable_text_gzip/000000_0.gz
> Found 2 items
> -rwxr-xr-x 3 systest supergroup 87 2019-08-22 14:32
> /test-warehouse/tinytable_text_lzo/000000_0.lzo
> -rw-r--r-- 3 systest supergroup 8 2019-08-22 14:42
> /test-warehouse/tinytable_text_lzo/000000_0.lzo.index
> Found 1 items
> -rwxr-xr-x 3 systest supergroup 41 2019-08-22 14:32
> /test-warehouse/tinytable_text_snap/000000_0.snappy{code}
> Not sure if that is by design or not, but it is causing the tests to fail for
> all text files.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]