Skye Wanderman-Milne has posted comments on this change. Change subject: IMPALA-3441: check for malformed Avro data ......................................................................
Patch Set 5: (4 comments) I used perf top with the --perf_map option and ReadZLong() is where most of the time is spent: 18.47% impalad [.] _ZN6impala13ReadWriteUtil9ReadZLongEPPhPlS3_ 9.11% impalad [.] _ZN6snappy13RawUncompressEPNS_6SourceEPc Unfortunately I need to add even more error checking to this function (IMPALA-3659), so maybe we should get this in and work on ReadZLong() perf in that patch. http://gerrit.cloudera.org:8080/#/c/3072/6/be/src/exec/hdfs-avro-scanner-ir.cc File be/src/exec/hdfs-avro-scanner-ir.cc: Line 176: ReportInvalidValue(TErrorCode::AVRO_INVALID_LENGTH, len); > UNLIKELY Done Line 180: ReportCorruptData(TErrorCode::AVRO_TRUNCATED_BLOCK); > UNLIKELY Done http://gerrit.cloudera.org:8080/#/c/3072/5/be/src/exec/hdfs-avro-scanner-test.cc File be/src/exec/hdfs-avro-scanner-test.cc: Line 190: // TODO: we don't handle invalid values (e.g. overflow) > There's an ASSERT_DEBUG_DEATH macro that does some magic to assert that it Dan pointed out that we need to fix this one way or another or we still may crash on corrupt files, so I'm gonna skip this for now since I need to fix it ASAP anyway. I'll add a test case for it when I fix it. http://gerrit.cloudera.org:8080/#/c/3072/6/testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test File testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test: Line 7 > Are you intending to fix this in this patch? Oops, forgot to address this. I'm having a lot of trouble matching these error messages with the current test verifier logic, so I commented them out for now and meant to file a JIRA to fix this. -- To view, visit http://gerrit.cloudera.org:8080/3072 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
