Tim Armstrong has posted comments on this change. Change subject: IMPALA-3441, IMPALA-3659: check for malformed Avro data ......................................................................
Patch Set 12: (5 comments) I'm going to take anoteher pass over the code to see if I can figure where those spurious errors might be coming from. http://gerrit.cloudera.org:8080/#/c/3072/12/be/src/exec/hdfs-avro-scanner.cc File be/src/exec/hdfs-avro-scanner.cc: Line 123: if (num_entries < 1) return Status("File header metadata has no data"); Can we add an error code for this and the other similar check. Something generic like AVRO_INVALID_METADATA is fine. http://gerrit.cloudera.org:8080/#/c/3072/11/be/src/exec/read-write-util.cc File be/src/exec/read-write-util.cc: Line 87: ReadWriteUtil::ZIntResult ReadWriteUtil::ReadZInt(uint8_t** buf, uint8_t* buf_end) { > They have to go here due to the template specialization, see http://stackov I'd expect it should be inlined. If we're concerned we could add an ALWAYS_INLINE to the template declaration/definition. http://gerrit.cloudera.org:8080/#/c/3072/12/be/src/exec/read-write-util.cc File be/src/exec/read-write-util.cc: Line 46: shift += 7; I'm pretty sure this can overflow. E.g. if all bytes before 0x80 are 0x7f. In that case we won't return an error. I think this is ok (not worth the cost of adding the overflow check), but we should document in the header. http://gerrit.cloudera.org:8080/#/c/3072/12/be/src/exec/read-write-util.h File be/src/exec/read-write-util.h: Line 110: /// (i.e. the first invalid byte). Document the behaviour in error cases. If the zig-zag encoded number has too many bytes, return an error. If the zig-zag encoded number has a valid number of bytes, but encodes a value that does not fit in the destination type, the result is unspecified. http://gerrit.cloudera.org:8080/#/c/3072/12/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: Line 128: DCHECK(!completed_io_buffers_.empty()); I'm not sure I understand this DCHECK. It seems like we could maybe hit it if the file was empty? It seems safe (to me) to leave it out, but maybe there's something I'm missing. -- To view, visit http://gerrit.cloudera.org:8080/3072 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 Gerrit-PatchSet: 12 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
