Tim Armstrong has posted comments on this change. Change subject: IMPALA-3845: Split up hdfs-parquet-scanner.cc into more files/components. ......................................................................
Patch Set 2: (9 comments) Looks good. I checked the recent parquet patches to make sure that the changes were included in this. http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 66: const int64_t HdfsParquetScanner::FOOTER_SIZE = 100 * 1024; Let's put these in the header so that the constant value is visible at all points of use. You can actually just move the = <value>; into the header declaration, and keep the definitions here without the assignments so that storage is allocated in the data section. Line 230: ColumnReader* HdfsParquetScanner::CreateReader( Thought: if we move this function into the parquet-column-reader module, maybe we don't need to have all of the column readers defined in a .h. I think if you did that, parquet-column-reader.h would only need to expose "class ColumnReader" and this CreateReader() function. Line 372: TParquetFallbackSchemaResolution::NAME; Why not pass the enum into ParquetSchemaResolver? Seems like it makes the interface clearer compared with a bool. Line 982: PrintPath(*scan_node_->hdfs_table(), parent_path), filename())); Not your change but the # of args don't match up. http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: PS2, Line 45: ScratchTupleBatch Consider moving this into hdfs-parquet-scanner-internal.h. I think this header is big enough already. http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: PS2, Line 73: LevelDecoder ParquetLevelDecoder? PS2, Line 277: ColumnReader I think this is too generic - ParquetColumnReader? http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/exec/parquet-schema-resolver.cc File be/src/exec/parquet-schema-resolver.cc: I didn't look at this file in detail since it looked like you were just moving the functions and updating the class names. http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/util/debug-util.h File be/src/util/debug-util.h: Line 75: std::string PrintPath(const TableDescriptor& tbl_desc, const SchemaPath& path, I think it would be better to avoid the default argument and have two methods; PrintPath() and PrintSubPath(). It looks like this is replacing an old method that wrapped PrintPath(), so this would be more-or-less the old approach. -- To view, visit http://gerrit.cloudera.org:8080/3596 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4c5fd46f9c1a0ff2a4c30ea5a712fbae17c68f92 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
