Jim Apple has posted comments on this change. Change subject: IMPALA-3227: generate test TPC data sets during data load ......................................................................
Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/3761/2//COMMIT_MSG Commit Message: PS2, Line 7: IMPALA-3227 Does this make dataloading faster, more reliable, or more separable from Cloudera infrastructure? http://gerrit.cloudera.org:8080/#/c/3761/2/bin/load-data.py File bin/load-data.py: PS2, Line 160: if os.path.exists(dataset_preload_script) > I think doing nothing is the intended behaviour if there isn't a preload sc Should there be some sort of check that preload DOES run for tpc*? http://gerrit.cloudera.org:8080/#/c/3761/2/testdata/datasets/tpcds/preload File testdata/datasets/tpcds/preload: PS2, Line 32: TPC_DS_DIRNAME How do you know this path will be short enough? Could you use /tmp/ (properly genericized for wherever people keep their temp dir)? http://gerrit.cloudera.org:8080/#/c/3761/2/testdata/datasets/tpch/preload File testdata/datasets/tpch/preload: There is enough similarity between this and the tpcds one that it would be nice if they were one script, properly parameterized. -- To view, visit http://gerrit.cloudera.org:8080/3761 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieccfbd7d8d4a91bffddbe35abb7f5572e71a71cf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
