Joe McDonnell created IMPALA-10005:
--------------------------------------

             Summary: Impala can't read Snappy compressed text files on S3 or 
ABFS
                 Key: IMPALA-10005
                 URL: https://issues.apache.org/jira/browse/IMPALA-10005
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 4.0
            Reporter: Joe McDonnell


When reading snappy compressed text from S3 or ABFS on a release build, it 
fails to decompress:

 
{noformat}
I0723 21:19:43.712909 229706 status.cc:128] Snappy: RawUncompress failed
    @           0xae26c9  impala::Status::Status()
    @          0x107635b  impala::SnappyDecompressor::ProcessBlock()
    @          0x11b1f2d  
impala::HdfsTextScanner::FillByteBufferCompressedFile()
    @          0x11b23ef  impala::HdfsTextScanner::FillByteBuffer()
    @          0x11af96f  impala::HdfsTextScanner::FillByteBufferWrapper()
    @          0x11b096b  impala::HdfsTextScanner::ProcessRange()
    @          0x11b2b31  impala::HdfsTextScanner::GetNextInternal()
    @          0x118644b  impala::HdfsScanner::ProcessSplit()
    @          0x11774c2  impala::HdfsScanNode::ProcessSplit()
    @          0x1178805  impala::HdfsScanNode::ScannerThread()
    @          0x1100f31  impala::Thread::SuperviseThread()
    @          0x1101a79  boost::detail::thread_data<>::run()
    @          0x16a3449  thread_proxy
    @     0x7fc522befe24  start_thread
    @     0x7fc522919bac  __clone{noformat}
When using a debug build, Impala hits the following DCHECK:

 

 
{noformat}
F0723 23:45:12.849973 249653 hdfs-text-scanner.cc:197] Check failed: 
stream_>file_desc()>file_compression != THdfsCompression::SNAPPY FE should have 
generated SNAPPY_BLOCKED instead.{noformat}
That DCHECK explains why it would fail to decompress. It is using the wrong 
THdfsCompression.

I reproduced this on master in my dev env by changing 
FileSystemUtil::supportsStorageIds() to always return true. This emulates the 
behavior on object stores like S3 and ABFS.

 
{noformat}
  /**
   * Returns true if the filesystem supports storage UUIDs in BlockLocation 
calls.
   */
  public static boolean supportsStorageIds(FileSystem fs) {
    return false;
  }{noformat}
This is specific to Snappy and does not appear to apply to other compression 
codecs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to