Jim Apple has posted comments on this change.

Change subject: IMPALA-3227: generate test TPC data sets during data load
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/3761/2//COMMIT_MSG
Commit Message:

PS2, Line 7: IMPALA-3227
Does this make dataloading faster, more reliable, or more separable from 
Cloudera infrastructure?


http://gerrit.cloudera.org:8080/#/c/3761/2/bin/load-data.py
File bin/load-data.py:

PS2, Line 160: if os.path.exists(dataset_preload_script)
> I think doing nothing is the intended behaviour if there isn't a preload sc
Should there be some sort of check that preload DOES run for tpc*?


http://gerrit.cloudera.org:8080/#/c/3761/2/testdata/datasets/tpcds/preload
File testdata/datasets/tpcds/preload:

PS2, Line 32: TPC_DS_DIRNAME
How do you know this path will be short enough? Could you use /tmp/ (properly 
genericized for wherever people keep their temp dir)?


http://gerrit.cloudera.org:8080/#/c/3761/2/testdata/datasets/tpch/preload
File testdata/datasets/tpch/preload:

There is enough similarity between this and the tpcds one that it would be nice 
if they were one script, properly parameterized.


-- 
To view, visit http://gerrit.cloudera.org:8080/3761
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ieccfbd7d8d4a91bffddbe35abb7f5572e71a71cf
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: David Knupp <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to