Yi Zhang created TEZ-4577:
-----------------------------

             Summary: SortSpan could be created real small, resulting in 
eventual job failure
                 Key: TEZ-4577
                 URL: https://issues.apache.org/jira/browse/TEZ-4577
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.10.4
            Reporter: Yi Zhang


we run into a issue with overflow as in TEZ-4542, with TEZ-4542 applied, it 
then run into an issue of real small sortspan (per record in this case), 
eventually the job failed due to timeout

from sample logs it looks like 
 
SortSpan(ByteBuffer source, int maxItems, int perItem, RawComparator comparator)
 
once it get into a situation of maxItems=1, then it persists with maxItems=1
 
sample logs:

2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: Span260.length = 1, perItem = 139
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: reserved.remaining()=268396925, reserved.metasize=16
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: New Span261.length = 1, perItem = 139, counter:5307003
2024-08-19 19:02:28,157 [INFO] [Sorter \{scope_302 -> scope_308} #1] 
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=260, 
length=1, time=0
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: Span261.length = 1, perItem = 128
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: reserved.remaining()=268396781, reserved.metasize=16
2024-08-19 19:02:28,157 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: New Span262.length = 1, perItem = 128, counter:5307004
2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #0] 
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=261, 
length=1, time=0
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: Span262.length = 1, perItem = 145
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: reserved.remaining()=268396620, reserved.metasize=16
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: New Span263.length = 1, perItem = 145, counter:5307005
2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #1] 
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=262, 
length=1, time=0
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: Span263.length = 1, perItem = 139
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: reserved.remaining()=268396465, reserved.metasize=16
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: New Span264.length = 1, perItem = 139, counter:5307006
2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #0] 
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=263, 
length=1, time=0
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: Span264.length = 1, perItem = 129
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: reserved.remaining()=268396320, reserved.metasize=16
2024-08-19 19:02:28,158 [INFO] [TezChild] |impl.PipelinedSorter|: scope-302 -> 
scope-308: New Span265.length = 1, perItem = 129, counter:5307007
2024-08-19 19:02:28,158 [INFO] [Sorter \{scope_302 -> scope_308} #1] 
|impl.PipelinedSorter|: scope-302 -> scope-308: done sorting span=264, 
length=1, time=0

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to