Skye Wanderman-Milne has posted comments on this change. Change subject: IMPALA-2853: introduce PARQUET_RESOLVE_BY_NAME query option ......................................................................
Patch Set 3: (7 comments) http://gerrit.cloudera.org:8080/#/c/2384/3/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 2025: if (col_type == NULL) DCHECK_EQ(next_idx, 0); > with the new way the code is structured, this might be more intuitive writt Done http://gerrit.cloudera.org:8080/#/c/2384/3/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 599: a value >= # > how about just simplify: Done http://gerrit.cloudera.org:8080/#/c/2384/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 169: 42: optional bool parquet_resolve_by_name = false > while i see your point about resolve-by-id needing a fallback, I think this Given that the only meaningful resolution orderings are: * id, name * id, ordinal * name * ordinal And that field IDs don't actually exist yet, I think we should keep this option (or change it to resolve_by_ordinal if that's somehow better), and later add a parquet_resolve_by_field_id option as well. If we get the field ids in by C6, we can rename this option to parquet_resolve_legacy_files_by_name or something. At the very least, even if field IDs aren't implemented by C6, we can still rename this option if we come up with something better. http://gerrit.cloudera.org:8080/#/c/2384/3/testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test File testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test: Line 55: '/test-warehouse/nested_resolution_by_name_test_parquet' > needs $FILESYSTEM_PREFIX Done Line 170: ==== > any way to test the map key/value logic? One way would be to generate custom files with switched and renamed fields. Or, with some light refactoring, I think it should be possible to unit test this case (and others). I think the only non-trival change would be changing the table descriptor to contain a single root record type that has all the column types as children, instead of special-casing the table-level columns. I'll send an email to the dev list about the column type change, since I think this is a good idea either way. Let me know what you think about unit testing vs generating files for end-to-end tests. I can do either, but I think unit testing will be better. If it turns out to be a bigger change than anticipated I'll just generate the files. http://gerrit.cloudera.org:8080/#/c/2384/3/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: Line 224: EXECUTE > maybe call it 'SHELL' since execute has many meanings? Good idea, done http://gerrit.cloudera.org:8080/#/c/2384/3/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: Line 240: > skip if s3 insert Done -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Silvius Rus <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-HasComments: Yes
