Tim Armstrong has posted comments on this change. Change subject: IMPALA-3227: generate test TPC data sets during data load ......................................................................
Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/3761/2//COMMIT_MSG Commit Message: PS2, Line 7: IMPALA-3227 > Does this make dataloading faster, more reliable, or more separable from Cl Yes, those tarballs mentioned below aren't available externally and we had no docs on how to recreate them. I actually tried to create a subtask of IMPALA-3227 for this but JIRA doesn't allow subtasks of subtasks. http://gerrit.cloudera.org:8080/#/c/3761/2/bin/load-data.py File bin/load-data.py: PS2, Line 160: if os.path.exists(dataset_preload_script) > Should there be some sort of check that preload DOES run for tpc*? I'm not really sure I understand what I can do to ensure that's the case beyond testing the change on a machine without the pregenerated tpc tarballs and making sure tpc data loaded correctly (which it did). If it doesn't run the data won't be there to load. I don't think this script should know about the details of how we intend each data set to set up. http://gerrit.cloudera.org:8080/#/c/3761/2/testdata/datasets/tpcds/preload File testdata/datasets/tpcds/preload: PS2, Line 32: TPC_DS_DIRNAME > How do you know this path will be short enough? Could you use /tmp/ (proper Because it's a relative path from the current directory - see how the path TPC_DS_DIRNAME is constructed above. http://gerrit.cloudera.org:8080/#/c/3761/2/testdata/datasets/tpch/preload File testdata/datasets/tpch/preload: > There is enough similarity between this and the tpcds one that it would be I made the call that it was easier to follow if it was two single-purpose scripts. You can save maybe 10-20 lines of code but it's harder to follow code that's trying to abstract around the quirks of two executables at once. The data generation executables are quirky so it seems easier to deal with their quirks separately now and going forward rather than try to share code. -- To view, visit http://gerrit.cloudera.org:8080/3761 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ieccfbd7d8d4a91bffddbe35abb7f5572e71a71cf Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
