[ https://issues.apache.org/jira/browse/IMPALA-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18022922#comment-18022922 ]
ASF subversion and git services commented on IMPALA-14462: ---------------------------------------------------------- Commit 775f73f03ea59401ca2752383182185599b9777d in impala's branch refs/heads/master from Joe McDonnell [ https://gitbox.apache.org/repos/asf?p=impala.git;h=775f73f03 ] IMPALA-14462: Fix tie-breaking for sorting scan ranges oldest to newest TestTupleCacheFullCluster.test_scan_range_distributed is flaky on s3 builds. The addition of a single file is changing scheduling significantly even with scan ranges sorted oldest to newest. This is because modification times on S3 have a granularity of one second. Multiple files have the same modification time, and the fix for IMPALA-13548 did not properly break ties for sorting. This adds logic to break ties for files with the same modification time. It compares the path (absolute path or relative path + partition) as well as the offset within the file. These should be enough to break all conceivable ties, as it is not possible to have two scan ranges with the same file at the same offset. In debug builds, this does additional validation to make sure that when a != b, comp(a, b) != comp(b, a). The test requires that adding a single file to the table changes exactly one cache key. If that final file has the same modification time as an existing file, scheduling may still mix up the files and change more than one cache key, even with tie-breaking. This adds a sleep just before generating the final file to guarantee that it gets a newer modification time. Testing: - Ran TestTupleCacheFullCluster.test_scan_range_distributed for 15 iterations on S3 Change-Id: I3f2e40d3f975ee370c659939da0374675a28cd38 Reviewed-on: http://gerrit.cloudera.org:8080/23458 Tested-by: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Reviewed-by: Michael Smith <michael.sm...@cloudera.com> Reviewed-by: Riza Suminto <riza.sumi...@cloudera.com> > TestTupleCacheFullCluster.test_scan_range_distributed fails on S3 > ----------------------------------------------------------------- > > Key: IMPALA-14462 > URL: https://issues.apache.org/jira/browse/IMPALA-14462 > Project: IMPALA > Issue Type: Bug > Components: Backend, Test > Affects Versions: Impala 5.0.0 > Reporter: Joe McDonnell > Assignee: Joe McDonnell > Priority: Blocker > Fix For: Impala 5.0.0 > > > TestTupleCacheFullCluster.test_scan_range_distributed is expecting that > inserting a single file doesn't change tuple cache's runtime hash for more > than a single executor. This should be true due to the modification to > schedule scan ranges oldest to newest. This is failing on S3: > {noformat} > custom_cluster/test_tuple_cache.py:905: in test_scan_range_distributed > assert len(after_insert_unique_cache_keys - unique_cache_keys) == 1 > E assert 3 == 1 > E + where 3 = len(({'6e2682fb793acd7b689a8d69aab01675_1266802730', > '6e2682fb793acd7b689a8d69aab01675_2730281323', > '6e2682fb793acd7b689a8d69aab01675_3027502829'} - > {'6e2682fb793acd7b689a8d69aab01675_1885657991', > '6e2682fb793acd7b689a8d69aab01675_2939791479', > '6e2682fb793acd7b689a8d69aab01675_3685468122'})){noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org