[
https://issues.apache.org/jira/browse/IMPALA-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172649#comment-17172649
]
ASF subversion and git services commented on IMPALA-10005:
----------------------------------------------------------
Commit dbbd40308a6d1cef77bfe45e016e775c918e0539 in impala's branch
refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=dbbd403 ]
IMPALA-10005: Fix Snappy decompression for non-block filesystems
Snappy-compressed text always uses THdfsCompression::SNAPPY_BLOCKED
type compression in the backend. However, for non-block filesystems,
the frontend is incorrectly passing THdfsCompression::SNAPPY instead.
On debug builds, this leads to a DCHECK when trying to read
Snappy-compressed text. On release builds, it fails to decompress
the data.
This fixes the frontend to always pass THdfsCompression::SNAPPY_BLOCKED
for Snappy-compressed text.
This reworks query_test/test_compressed_formats.py to provide better
coverage:
- Changed the RC and Seq test cases to verify that the file extension
doesn't matter. Added Avro to this case as well.
- Fixed the text case to use appropriate extensions (fixing IMPALA-9004)
- Changed the utility function so it doesn't use Hive. This allows it
to be enabled on non-HDFS filesystems like S3.
- Changed the test to use unique_database and allow parallel execution.
- Changed the test to run in the core job, so it now has coverage on
the usual S3 test configuration. It is reasonably quick (1-2 minutes)
and runs in parallel.
Testing:
- Exhaustive job
- Core s3 job
- Changed the frontend to force it to use the code for non-block
filesystems (i.e. the TFileSplitGeneratorSpec code) and
verified that it is now able to read Snappy-compressed text.
Change-Id: I0879f2fc0bf75bb5c15cecb845ece46a901601ac
Reviewed-on: http://gerrit.cloudera.org:8080/16278
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Sahil Takiar <[email protected]>
> Impala can't read Snappy compressed text files on S3 or ABFS
> ------------------------------------------------------------
>
> Key: IMPALA-10005
> URL: https://issues.apache.org/jira/browse/IMPALA-10005
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.0
> Reporter: Joe McDonnell
> Assignee: Joe McDonnell
> Priority: Blocker
>
> When reading snappy compressed text from S3 or ABFS on a release build, it
> fails to decompress:
>
> {noformat}
> I0723 21:19:43.712909 229706 status.cc:128] Snappy: RawUncompress failed
> @ 0xae26c9 impala::Status::Status()
> @ 0x107635b impala::SnappyDecompressor::ProcessBlock()
> @ 0x11b1f2d
> impala::HdfsTextScanner::FillByteBufferCompressedFile()
> @ 0x11b23ef impala::HdfsTextScanner::FillByteBuffer()
> @ 0x11af96f impala::HdfsTextScanner::FillByteBufferWrapper()
> @ 0x11b096b impala::HdfsTextScanner::ProcessRange()
> @ 0x11b2b31 impala::HdfsTextScanner::GetNextInternal()
> @ 0x118644b impala::HdfsScanner::ProcessSplit()
> @ 0x11774c2 impala::HdfsScanNode::ProcessSplit()
> @ 0x1178805 impala::HdfsScanNode::ScannerThread()
> @ 0x1100f31 impala::Thread::SuperviseThread()
> @ 0x1101a79 boost::detail::thread_data<>::run()
> @ 0x16a3449 thread_proxy
> @ 0x7fc522befe24 start_thread
> @ 0x7fc522919bac __clone{noformat}
> When using a debug build, Impala hits the following DCHECK:
>
>
> {noformat}
> F0723 23:45:12.849973 249653 hdfs-text-scanner.cc:197] Check failed:
> stream_>file_desc()>file_compression != THdfsCompression::SNAPPY FE should
> have generated SNAPPY_BLOCKED instead.{noformat}
> That DCHECK explains why it would fail to decompress. It is using the wrong
> THdfsCompression.
> I reproduced this on master in my dev env by changing
> FileSystemUtil::supportsStorageIds() to always return true. This emulates the
> behavior on object stores like S3 and ABFS.
>
> {noformat}
> /**
> * Returns true if the filesystem supports storage UUIDs in BlockLocation
> calls.
> */
> public static boolean supportsStorageIds(FileSystem fs) {
> return false;
> }{noformat}
> This is specific to Snappy and does not appear to apply to other compression
> codecs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]