Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3845: Split up hdfs-parquet-scanner.cc into more 
files/components.
......................................................................


Patch Set 2:

(9 comments)

Looks good. I checked the recent parquet patches to make sure that the changes 
were included in this.

http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 66: const int64_t HdfsParquetScanner::FOOTER_SIZE = 100 * 1024;
Let's put these in the header so that the constant value is visible at all 
points of use. You can actually just move the = <value>; into the header 
declaration, and keep the definitions here without the assignments so that 
storage is allocated in the data section.


Line 230: ColumnReader* HdfsParquetScanner::CreateReader(
Thought: if we move this function into the parquet-column-reader module, maybe 
we don't need to have all of the column readers defined in a .h. I think if you 
did that, parquet-column-reader.h would only need to expose "class 
ColumnReader" and this CreateReader() function.


Line 372:       TParquetFallbackSchemaResolution::NAME;
Why not pass the enum into ParquetSchemaResolver? Seems like it makes the 
interface clearer compared with a bool.


Line 982:         PrintPath(*scan_node_->hdfs_table(), parent_path), 
filename()));
Not your change but the # of args don't match up.


http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

PS2, Line 45: ScratchTupleBatch
Consider moving this into hdfs-parquet-scanner-internal.h. I think this header 
is big enough already.


http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

PS2, Line 73: LevelDecoder
ParquetLevelDecoder?


PS2, Line 277: ColumnReader
I think this is too generic - ParquetColumnReader?


http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/exec/parquet-schema-resolver.cc
File be/src/exec/parquet-schema-resolver.cc:

I didn't look at this file in detail since it looked like you were just moving 
the functions and updating the class names.


http://gerrit.cloudera.org:8080/#/c/3596/2/be/src/util/debug-util.h
File be/src/util/debug-util.h:

Line 75: std::string PrintPath(const TableDescriptor& tbl_desc, const 
SchemaPath& path,
I think it would be better to avoid the default argument and have two methods; 
PrintPath() and PrintSubPath(). It looks like this is replacing an old method 
that wrapped PrintPath(), so this would be more-or-less the old approach.


-- 
To view, visit http://gerrit.cloudera.org:8080/3596
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4c5fd46f9c1a0ff2a4c30ea5a712fbae17c68f92
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to