Sailesh Mukil has posted comments on this change. Change subject: IMPALA-3453: S3: Uneven split sizes are generated for Parquet causing execution skew ......................................................................
Patch Set 1: (1 comment) > (1 comment) > > Why didn't the multiple row group test find this issue? Is that > test being skipped on S3? I think we should figure out a way to > enable that test (or a variant) to test this. > > Did you manually test it, or will Mostafa? I tested it as mentioned in the JIRA. The multiple row group test is disabled for S3. The problem with setting up a variant of that test for S3 is that we don't know the S3A block size in the pytests and so we won't know on what basis to check if the test succeeded. The only way now to find the S3A block size is to read the core-site.xml file from python (which might be messy). Another way is to assume that the block size will not be manually altered for the tests and so assume that the block size is 32MB. A third option is to have the block size configurable by an environment variable. In that case we can read the env var from python and use that as a basis for the test. http://gerrit.cloudera.org:8080/#/c/2968/1//COMMIT_MSG Commit Message: Line 23: is governed by "fs.s3a.block.size". Its default value is 32MB. > we should probably explain this in the Impala+S3 documentation. can you add Done -- To view, visit http://gerrit.cloudera.org:8080/2968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib1518ad0c89ef35a3b0567c3902e85a41e34bc3d Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
