Skye Wanderman-Milne has posted comments on this change. Change subject: PREVIEW do parquet schema resolution by name instead of ordinal ......................................................................
Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/2384/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 193: resolve_by_name_(state->query_options().resolve_parquet_by_name), > let's just use the query option directly rather than having more state to k Done Line 1994: if (resolve_by_name_ && (next_idx == 0 || col_type->type == TYPE_STRUCT)) { > why is it that arrays and maps don't also take this path? a comment explai Added a comment. I'm not sure how to refactor them, I think they're expressing sufficiently different conditions that it's easiest/clearest to just spell them out. Maybe I'm missing something though. One thing that would make this and L1945 slightly simpler is changing the BE to have a single type tree rooted with a TYPE_STRUCT representing the whole table, rather than having to special-case the top-level table columns. I was thinking of proposing that after this patch goes in. Line 2005: scan_node_->hdfs_table()->col_descs()[table_idx].name() : : col_type->field_names[table_idx]; > nit: wrong indentation. Done Line 2010: "File $0 does not contain path '$1' > May be we can factor this string out or if you take the recommendation belo Done Line 2019: if (node->children.size() <= file_idx) { : VLOG_FILE << Substitute( : "File '$0' does not contain path '$1' (resolving by ordinal)", filename(), : PrintPath(path)); : *missing_field = true; : return NULL; : } > It seems that this code block can be factored out to the common path. It ap Good idea, done. http://gerrit.cloudera.org:8080/#/c/2384/1/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 425: resolve_by_name > query option Removed this member Line 598: its > nit: it's Done http://gerrit.cloudera.org:8080/#/c/2384/1/be/src/service/query-options.h File be/src/service/query-options.h: Line 77: QUERY_OPT_FN(resolve_parquet_by_name, RESOLVE_PARQUET_BY_NAME); > how about making this an enum (i.e. position, name, and eventually field-id I think field-id will be a separate option (or unconditional default) on top of this. Once we have field ids, I think we'll want to always use them if they exist in the file + table metadata. However, for legacy files that don't contain ids, you'll still need to resolve by name or ordinal. -- To view, visit http://gerrit.cloudera.org:8080/2384 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id0c715ea23792b2a6872610839a40532aabbb5a6 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Silvius Rus <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-HasComments: Yes
