[
https://issues.apache.org/jira/browse/ORC-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955167#comment-16955167
]
Quanlong Huang commented on ORC-414:
------------------------------------
[~xndai], yeah, it's just an unbound index error. We should not assume that
Footer has at least one type. For malformed files, it could be zero.
The crash happens inside ReaderImpl::ReaderImpl():
{code:c++}
contents->schema = REDUNDANT_MOVE(convertType(footer->types(0), *footer));
{code}
Here is the stack trace (git hash is 7821893db222b9adf41eb6db25bf14d6095d7503):
{code:c++}
google::protobuf::internal::RepeatedPtrFieldBase::Get<google::protobuf::RepeatedPtrField<orc::proto::Type>::TypeHandler>
repeated_field.h:1522
google::protobuf::RepeatedPtrField<orc::proto::Type>::Get repeated_field.h:1989
orc::proto::Footer::types orc_proto.pb.h:8965
orc::ReaderImpl::ReaderImpl Reader.cc:412
orc::createReader Reader.cc:1131
scanFile FileScan.cc:32
main FileScan.cc:84 {code}
Created a simple PR for this: https://github.com/apache/orc/pull/438
> [C++] ORC files with malformed protobuf objects can crash a release build
> -------------------------------------------------------------------------
>
> Key: ORC-414
> URL: https://issues.apache.org/jira/browse/ORC-414
> Project: ORC
> Issue Type: Bug
> Components: C++
> Affects Versions: 1.5.3
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
> Attachments: malformed_protobuf.orc
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> ORC files can be produced by any external tools. Some corrupt files may
> contain malformed protobuf objects which can crash the process. The
> attachment is an example.
> In a debug build, protobuf will throw exceptions for this file:
> {code}
> $ build/tools/src/orc-scan maleformed_protobuf.orc
> [libprotobuf FATAL
> /mnt/volume1/impala-orc/orc/build/c++/libs/thirdparty/protobuf_ep-install/include/google/protobuf/repeated_field.h:1522]
> CHECK failed: (index) < (current_size_):
> Caught exception in maleformed_protobuf.orc: CHECK failed: (index) <
> (current_size_):
> {code}
> It hits a DCHECK which is eliminated in a release build.
> {code:c++}
> 1518 template <typename TypeHandler>
> 1519 inline const typename TypeHandler::Type&
> 1520 RepeatedPtrFieldBase::Get(int index) const {
> 1521 GOOGLE_DCHECK_GE(index, 0);
> 1522 GOOGLE_DCHECK_LT(index, current_size_);
> 1523 return *cast<TypeHandler>(rep_->elements[index]);
> 1524 }
> {code}
> In a release build, the process crash immediately, which means any system
> integrated with the orc-lib will crash when processing such kind of files.
> {code}
> $ build/tools/src/orc-scan maleformed_protobuf.orc
> Segmentation fault (core dumped)
> {code}
> The stacktrace for this crash:
> {code}
> #0 0x0000000000588c1e in
> orc::ReaderImpl::ReaderImpl(std::shared_ptr<orc::FileContents>,
> orc::ReaderOptions const&, unsigned long, unsigned long) ()
> #1 0x000000000058b1ee in orc::createReader(std::unique_ptr<orc::InputStream,
> std::default_delete<orc::InputStream> >, orc::ReaderOptions const&) ()
> #2 0x00000000005847c0 in scanFile (out=..., filename=0x7ffcf03a173d
> "maleformed_protobuf.orc", batchSize=batchSize@entry=1024) at
> /mnt/volume1/impala-orc/orc/tools/src/FileScan.cc:32
> #3 0x0000000000584150 in main (argc=<optimized out>, argv=<optimized out>)
> at /mnt/volume1/impala-orc/orc/tools/src/FileScan.cc:84
> {code}
> We may need to introduce checksums to avoid this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)