Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3441, IMPALA-3659: check for malformed Avro data
......................................................................


Patch Set 12:

(5 comments)

I'm going to take anoteher pass over the code to see if I can figure where 
those spurious errors might be coming from.

http://gerrit.cloudera.org:8080/#/c/3072/12/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

Line 123:   if (num_entries < 1) return Status("File header metadata has no 
data");
Can we add an error code for this and the other similar check. Something 
generic like AVRO_INVALID_METADATA is fine.


http://gerrit.cloudera.org:8080/#/c/3072/11/be/src/exec/read-write-util.cc
File be/src/exec/read-write-util.cc:

Line 87: ReadWriteUtil::ZIntResult ReadWriteUtil::ReadZInt(uint8_t** buf, 
uint8_t* buf_end) {
> They have to go here due to the template specialization, see http://stackov
I'd expect it should be inlined. If we're concerned we could add an 
ALWAYS_INLINE to the template declaration/definition.


http://gerrit.cloudera.org:8080/#/c/3072/12/be/src/exec/read-write-util.cc
File be/src/exec/read-write-util.cc:

Line 46:     shift += 7;
I'm pretty sure this can overflow. E.g. if all bytes before 0x80 are 0x7f. In 
that case we won't return an error. I think this is ok (not worth the cost of 
adding the overflow check), but we should document in the header.


http://gerrit.cloudera.org:8080/#/c/3072/12/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

Line 110:   /// (i.e. the first invalid byte).
Document the behaviour in error cases. If the zig-zag encoded number has too 
many bytes, return an error. If the zig-zag encoded number has a valid number 
of bytes, but encodes a value that does not fit in the destination type, the 
result is unspecified.


http://gerrit.cloudera.org:8080/#/c/3072/12/be/src/exec/scanner-context.cc
File be/src/exec/scanner-context.cc:

Line 128:     DCHECK(!completed_io_buffers_.empty());
I'm not sure I understand this DCHECK. It seems like we could maybe hit it if 
the file was empty? It seems safe (to me) to leave it out, but maybe there's 
something I'm missing.


-- 
To view, visit http://gerrit.cloudera.org:8080/3072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
Gerrit-PatchSet: 12
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to