Dan Hecht has posted comments on this change.

Change subject: IMPALA-3453: S3: Uneven split sizes are generated for Parquet 
causing execution skew
......................................................................


Patch Set 2:

> (1 comment)
 > 
 > > (1 comment)
 > >
 > > Why didn't the multiple row group test find this issue? Is that
 > > test being skipped on S3? I think we should figure out a way to
 > > enable that test (or a variant) to test this.
 > >
 > > Did you manually test it, or will Mostafa?
 > 
 > I tested it as mentioned in the JIRA.
 > 
 > The multiple row group test is disabled for S3. The problem with
 > setting up a variant of that test for S3 is that we don't know the
 > S3A block size in the pytests and so we won't know on what basis to
 > check if the test succeeded.
 > 
 > The only way now to find the S3A block size is to read the
 > core-site.xml file from python (which might be messy).
 > 
 > Another way is to assume that the block size will not be manually
 > altered for the tests and so assume that the block size is 32MB.
 > 
 
Could we read it from the /catalog webpage? If that is too tough, I think just 
assuming 32MB and putting a comment explaining this should match the default 
block size would be okay.

-- 
To view, visit http://gerrit.cloudera.org:8080/2968
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1518ad0c89ef35a3b0567c3902e85a41e34bc3d
Gerrit-PatchSet: 2
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Sailesh Mukil <[email protected]>
Gerrit-HasComments: No

Reply via email to