Yi Zhang created TEZ-4577: ----------------------------- Summary: SortSpan could be created real small, resulting in eventual job failure Key: TEZ-4577 URL: https://issues.apache.org/jira/browse/TEZ-4577 Project: Apache Tez Issue Type: Bug Affects Versions: 0.10.4 Reporter: Yi Zhang
we run into a issue with overflow as in TEZ-4542, with TEZ-4542 applied, it then run into an issue of real small sortspan (per record in this case), eventually the job failed due to timeout from sample logs it looks like SortSpan(ByteBuffer source, int maxItems, int perItem, RawComparator comparator) once it get into a situation of maxItems=1, then it persists with maxItems=1 sample logs: 2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: Span260.length = 1, perItem = 139 2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: reserved.remaining()=268396925, reserved.metasize=16 2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: New Span261.length = 1, perItem = 139, counter:5307003 2024-08-19 19:02:28,157 [INFO] [Sorter \{scope_302 -> scope_308} #1] |impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=260, length=1, time=0 2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: Span261.length = 1, perItem = 128 2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: reserved.remaining()=268396781, reserved.metasize=16 2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: New Span262.length = 1, perItem = 128, counter:5307004 2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #0] |impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=261, length=1, time=0 2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: Span262.length = 1, perItem = 145 2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: reserved.remaining()=268396620, reserved.metasize=16 2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: New Span263.length = 1, perItem = 145, counter:5307005 2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #1] |impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=262, length=1, time=0 2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: Span263.length = 1, perItem = 139 2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: reserved.remaining()=268396465, reserved.metasize=16 2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: New Span264.length = 1, perItem = 139, counter:5307006 2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #0] |impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=263, length=1, time=0 2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: Span264.length = 1, perItem = 129 2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: reserved.remaining()=268396320, reserved.metasize=16 2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> scope-308: New Span265.length = 1, perItem = 129, counter:5307007 2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #1] |impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=264, length=1, time=0 -- This message was sent by Atlassian Jira (v8.20.10#820010)