Joe McDonnell created IMPALA-10005:
--------------------------------------
Summary: Impala can't read Snappy compressed text files on S3 or
ABFS
Key: IMPALA-10005
URL: https://issues.apache.org/jira/browse/IMPALA-10005
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 4.0
Reporter: Joe McDonnell
When reading snappy compressed text from S3 or ABFS on a release build, it
fails to decompress:
{noformat}
I0723 21:19:43.712909 229706 status.cc:128] Snappy: RawUncompress failed
@ 0xae26c9 impala::Status::Status()
@ 0x107635b impala::SnappyDecompressor::ProcessBlock()
@ 0x11b1f2d
impala::HdfsTextScanner::FillByteBufferCompressedFile()
@ 0x11b23ef impala::HdfsTextScanner::FillByteBuffer()
@ 0x11af96f impala::HdfsTextScanner::FillByteBufferWrapper()
@ 0x11b096b impala::HdfsTextScanner::ProcessRange()
@ 0x11b2b31 impala::HdfsTextScanner::GetNextInternal()
@ 0x118644b impala::HdfsScanner::ProcessSplit()
@ 0x11774c2 impala::HdfsScanNode::ProcessSplit()
@ 0x1178805 impala::HdfsScanNode::ScannerThread()
@ 0x1100f31 impala::Thread::SuperviseThread()
@ 0x1101a79 boost::detail::thread_data<>::run()
@ 0x16a3449 thread_proxy
@ 0x7fc522befe24 start_thread
@ 0x7fc522919bac __clone{noformat}
When using a debug build, Impala hits the following DCHECK:
{noformat}
F0723 23:45:12.849973 249653 hdfs-text-scanner.cc:197] Check failed:
stream_>file_desc()>file_compression != THdfsCompression::SNAPPY FE should have
generated SNAPPY_BLOCKED instead.{noformat}
That DCHECK explains why it would fail to decompress. It is using the wrong
THdfsCompression.
I reproduced this on master in my dev env by changing
FileSystemUtil::supportsStorageIds() to always return true. This emulates the
behavior on object stores like S3 and ABFS.
{noformat}
/**
* Returns true if the filesystem supports storage UUIDs in BlockLocation
calls.
*/
public static boolean supportsStorageIds(FileSystem fs) {
return false;
}{noformat}
This is specific to Snappy and does not appear to apply to other compression
codecs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)