[
https://issues.apache.org/jira/browse/ORC-590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17018539#comment-17018539
]
Quanlong Huang commented on ORC-590:
------------------------------------
Commit
[bf5b780|https://github.com/apache/orc/commit/bf5b7800930bfa030db83aba925d9d3b75852839]
of ORC-469 unintentionally removes some safety checks in
StringDictionaryColumnReader, which causes this issue.
> Crash in orc::RleDecoderV2::readByte
> ------------------------------------
>
> Key: ORC-590
> URL: https://issues.apache.org/jira/browse/ORC-590
> Project: ORC
> Issue Type: Bug
> Components: C++
> Reporter: Quanlong Huang
> Assignee: Zoltán Borók-Nagy
> Priority: Major
> Attachments: RleDecoderV2_next_crash.orc
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Hit a crash when reading a corrupt file.
> {code}
> (gdb) bt
> #0 0x00000000006108ad in orc::RleDecoderV2::readByte (this=0xd5a0d0) at
> /home/quanlong/workspace/orc/c++/src/RLEv2.hh:167
> #1 orc::RleDecoderV2::next (this=0xd5a0d0, data=0xd5a1d8, numValues=30,
> notNull=0x0) at /home/quanlong/workspace/orc/c++/src/RleDecoderV2.cc:119
> #2 0x00000000005f6b8c in
> orc::StringDictionaryColumnReader::StringDictionaryColumnReader
> (this=this@entry=0xb497a0, type=..., stripe=...) at
> /home/quanlong/workspace/orc/c++/src/ColumnReader.cc:581
> #3 0x00000000005f70bb in orc::buildReader (type=..., stripe=...) at
> /home/quanlong/workspace/orc/c++/src/ColumnReader.cc:1756
> #4 0x00000000005f722b in orc::StructColumnReader::StructColumnReader
> (this=this@entry=0xb07e40, type=..., stripe=...) at
> /home/quanlong/workspace/orc/c++/src/ColumnReader.cc:876
> #5 0x00000000005f701b in orc::buildReader (type=..., stripe=...) at
> /home/quanlong/workspace/orc/c++/src/ColumnReader.cc:1787
> #6 0x000000000059fd18 in orc::RowReaderImpl::startNextStripe (this=0xae2750)
> at /home/quanlong/workspace/orc/c++/src/Reader.cc:917
> #7 0x00000000005a016a in orc::RowReaderImpl::next (this=0xae2750, data=...)
> at /home/quanlong/workspace/orc/c++/src/Reader.cc:932
> #8 0x0000000000597a78 in scanFile (out=..., filename=<optimized out>,
> batchSize=batchSize@entry=1024) at
> /home/quanlong/workspace/orc/tools/src/FileScan.cc:39
> #9 0x00000000005972f8 in main (argc=1, argv=<optimized out>) at
> /home/quanlong/workspace/orc/tools/src/FileScan.cc:84
> (gdb) l
> 162
> 163 unsigned char readByte() {
> 164 if (bufferStart == bufferEnd) {
> 165 int bufferLength;
> 166 const void* bufferPointer;
> 167 if (!inputStream->Next(&bufferPointer, &bufferLength)) {
> 168 throw ParseError("bad read in RleDecoderV2::readByte");
> 169 }
> 170 bufferStart = static_cast<const char*>(bufferPointer);
> 171 bufferEnd = bufferStart + bufferLength;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)