[ https://issues.apache.org/jira/browse/ORC-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801944#comment-16801944 ]
Todd Lipcon commented on ORC-414: --------------------------------- Is the issue here in the protobuf parser, or in our own lack of bounds checks when accessing a parsed protobuf object? From the stack trace it looks like it's probably the latter. As far as I know, protobuf is "safe" -- malformed protobuf serialization doesn't crash a process. > [C++] ORC files with malformed protobuf objects can crash a release build > ------------------------------------------------------------------------- > > Key: ORC-414 > URL: https://issues.apache.org/jira/browse/ORC-414 > Project: ORC > Issue Type: Bug > Components: C++ > Affects Versions: 1.5.3 > Reporter: Quanlong Huang > Priority: Major > Attachments: malformed_protobuf.orc > > > ORC files can be produced by any external tools. Some corrupt files may > contain malformed protobuf objects which can crash the process. The > attachment is an example. > In a debug build, protobuf will throw exceptions for this file: > {code} > $ build/tools/src/orc-scan maleformed_protobuf.orc > [libprotobuf FATAL > /mnt/volume1/impala-orc/orc/build/c++/libs/thirdparty/protobuf_ep-install/include/google/protobuf/repeated_field.h:1522] > CHECK failed: (index) < (current_size_): > Caught exception in maleformed_protobuf.orc: CHECK failed: (index) < > (current_size_): > {code} > It hits a DCHECK which is eliminated in a release build. > {code:c++} > 1518 template <typename TypeHandler> > 1519 inline const typename TypeHandler::Type& > 1520 RepeatedPtrFieldBase::Get(int index) const { > 1521 GOOGLE_DCHECK_GE(index, 0); > 1522 GOOGLE_DCHECK_LT(index, current_size_); > 1523 return *cast<TypeHandler>(rep_->elements[index]); > 1524 } > {code} > In a release build, the process crash immediately, which means any system > integrated with the orc-lib will crash when processing such kind of files. > {code} > $ build/tools/src/orc-scan maleformed_protobuf.orc > Segmentation fault (core dumped) > {code} > The stacktrace for this crash: > {code} > #0 0x0000000000588c1e in > orc::ReaderImpl::ReaderImpl(std::shared_ptr<orc::FileContents>, > orc::ReaderOptions const&, unsigned long, unsigned long) () > #1 0x000000000058b1ee in orc::createReader(std::unique_ptr<orc::InputStream, > std::default_delete<orc::InputStream> >, orc::ReaderOptions const&) () > #2 0x00000000005847c0 in scanFile (out=..., filename=0x7ffcf03a173d > "maleformed_protobuf.orc", batchSize=batchSize@entry=1024) at > /mnt/volume1/impala-orc/orc/tools/src/FileScan.cc:32 > #3 0x0000000000584150 in main (argc=<optimized out>, argv=<optimized out>) > at /mnt/volume1/impala-orc/orc/tools/src/FileScan.cc:84 > {code} > We may need to introduce checksums to avoid this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)