Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3227: generate test TPC data sets during data load
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3761/2//COMMIT_MSG
Commit Message:

PS2, Line 7: IMPALA-3227
> Does this make dataloading faster, more reliable, or more separable from Cl
Yes, those tarballs mentioned below aren't available externally and we had no 
docs on how to recreate them.

I actually tried to create a subtask of IMPALA-3227 for this but JIRA doesn't 
allow subtasks of subtasks.


http://gerrit.cloudera.org:8080/#/c/3761/2/bin/load-data.py
File bin/load-data.py:

PS2, Line 160: if os.path.exists(dataset_preload_script)
> Should there be some sort of check that preload DOES run for tpc*?
I'm not really sure I understand what I can do to ensure that's the case beyond 
testing the change on a machine without the pregenerated tpc tarballs and 
making sure tpc data loaded correctly (which it did).  If it doesn't run the 
data won't be there to load.

I don't think this script should know about the details of how we intend each 
data set to set up.


http://gerrit.cloudera.org:8080/#/c/3761/2/testdata/datasets/tpcds/preload
File testdata/datasets/tpcds/preload:

PS2, Line 32: TPC_DS_DIRNAME
> How do you know this path will be short enough? Could you use /tmp/ (proper
Because it's a relative path from the current directory - see how the path 
TPC_DS_DIRNAME is constructed above.


http://gerrit.cloudera.org:8080/#/c/3761/2/testdata/datasets/tpch/preload
File testdata/datasets/tpch/preload:

> There is enough similarity between this and the tpcds one that it would be 
I made the call that it was easier to follow if it was two single-purpose 
scripts. You can save maybe 10-20 lines of code but it's harder to follow code 
that's trying to abstract around the quirks of two executables at once.

The data generation executables are quirky so it seems easier to deal with 
their quirks separately now and going forward rather than try to share code.


-- 
To view, visit http://gerrit.cloudera.org:8080/3761
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieccfbd7d8d4a91bffddbe35abb7f5572e71a71cf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: David Knupp <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to