Skye Wanderman-Milne has uploaded a new change for review. http://gerrit.cloudera.org:8080/3072
Change subject: PREVIEW IMPALA-3441: check for malformed Avro data (prototype) ...................................................................... PREVIEW IMPALA-3441: check for malformed Avro data (prototype) This patch adds the plumbing to do more error checking the Avro scanner (both the codegen'd and interpreted paths), and does the out-of-bounds checks for encoded ints and at the beginnning of each tuple. I ran a local benchmark using the following query: set num_scanner_threads=1; select max(i) from default.avro_ints_big; where avro_ints_big is an Avro table with a single int column containing ~90MM values. With this patch, the total query time goes from 1.6s to 1.8s (12% increase), with the MaterializeTupleTime going from 975ms to 1194ms (22% increase). The one check missing from this prototype that will have affect the above benckmark is checking for a valid union value when determining whether a value is null. I'm working on adding this, and then the prototype will fully implement this benchmark query. If we're happy with this overall approach, I can add error checking for the other types as well. Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 --- M be/src/exec/hdfs-avro-scanner-ir.cc M be/src/exec/hdfs-avro-scanner.cc M be/src/exec/hdfs-avro-scanner.h M be/src/exec/read-write-util.cc M be/src/exec/read-write-util.h 5 files changed, 133 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/72/3072/1 -- To view, visit http://gerrit.cloudera.org:8080/3072 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne <[email protected]>
