Henry Robinson has uploaded a new change for review. http://gerrit.cloudera.org:8080/4020
Change subject: IMPALA-(3895,3859): Don't log file data on parse errors ...................................................................... IMPALA-(3895,3859): Don't log file data on parse errors Logging file or table data is a bad idea, and doing it by default is particularly bad. This patch changes HdfsScanNode::LogRowParseError() to log a file and offset only. Testing: See rewritten tests. To support testing this change, we also fix IMPALA-3895, by introducing a canonical string __DFS_FILENAME__ that all DFS filenames in the ERROR output are replaced with before comparing with the expected results. This fixes a number of issues with the old way of matching filenames which purported to be a regex, but really wasn't. In particular, we can now match the rest of an ERROR line after the filename, which was not possible before. In some cases, we don't want to substitute filenames because the ERROR output is looking for a very specific output. In that case we can write: $NAMENODE/<filename> and this patch will not perform _any_ filename substitutions on ERROR sections that contain the $NAMENODE string. Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 --- M be/src/exec/hdfs-scanner-ir.cc M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/hdfs-sequence-scanner.cc M be/src/exec/hdfs-sequence-scanner.h M be/src/exec/hdfs-text-scanner.cc M be/src/exec/hdfs-text-scanner.h M testdata/workloads/functional-query/queries/DataErrorsTest/avro-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-scan-node-errors.test M testdata/workloads/functional-query/queries/DataErrorsTest/hdfs-sequence-scan-errors.test M testdata/workloads/functional-query/queries/QueryTest/parquet-continue-on-error.test M testdata/workloads/functional-query/queries/QueryTest/strict-mode.test M tests/common/impala_test_suite.py M tests/common/test_result_verifier.py 14 files changed, 87 insertions(+), 153 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/20/4020/1 -- To view, visit http://gerrit.cloudera.org:8080/4020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5a604f8784a9ff7b4bf878f82ee7f56697df3272 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Henry Robinson <[email protected]>
