Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-3453: S3: Uneven split sizes are generated for Parquet 
causing execution skew
......................................................................


Patch Set 1:

(1 comment)

> (1 comment)
 > 
 > Why didn't the multiple row group test find this issue? Is that
 > test being skipped on S3? I think we should figure out a way to
 > enable that test (or a variant) to test this.
 > 
 > Did you manually test it, or will Mostafa?

I tested it as mentioned in the JIRA.

The multiple row group test is disabled for S3. The problem with setting up a 
variant of that test for S3 is that we don't know the S3A block size in the 
pytests and so we won't know on what basis to check if the test succeeded.

The only way now to find the S3A block size is to read the core-site.xml file 
from python (which might be messy).

Another way is to assume that the block size will not be manually altered for 
the tests and so assume that the block size is 32MB.

A third option is to have the block size configurable by an environment 
variable. In that case we can read the env var from python and use that as a 
basis for the test.

http://gerrit.cloudera.org:8080/#/c/2968/1//COMMIT_MSG
Commit Message:

Line 23: is governed by "fs.s3a.block.size". Its default value is 32MB.
> we should probably explain this in the Impala+S3 documentation. can you add
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/2968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1518ad0c89ef35a3b0567c3902e85a41e34bc3d
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: Yes

Reply via email to