Dan Hecht has posted comments on this change. Change subject: IMPALA-3453: S3: Uneven split sizes are generated for Parquet causing execution skew ......................................................................
Patch Set 2: > (1 comment) > > > (1 comment) > > > > Why didn't the multiple row group test find this issue? Is that > > test being skipped on S3? I think we should figure out a way to > > enable that test (or a variant) to test this. > > > > Did you manually test it, or will Mostafa? > > I tested it as mentioned in the JIRA. > > The multiple row group test is disabled for S3. The problem with > setting up a variant of that test for S3 is that we don't know the > S3A block size in the pytests and so we won't know on what basis to > check if the test succeeded. > > The only way now to find the S3A block size is to read the > core-site.xml file from python (which might be messy). > > Another way is to assume that the block size will not be manually > altered for the tests and so assume that the block size is 32MB. > Could we read it from the /catalog webpage? If that is too tough, I think just assuming 32MB and putting a comment explaining this should match the default block size would be okay. -- To view, visit http://gerrit.cloudera.org:8080/2968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib1518ad0c89ef35a3b0567c3902e85a41e34bc3d Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: No
