[jira] [Commented] (TEZ-3732) Reduce Object size of InputAttemptIdentifier and MapOutput for large jobs

Gopal V (JIRA) Tue, 16 May 2017 23:07:36 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16013579#comment-16013579
 ]


Gopal V commented on TEZ-3732:
------------------------------

Before:

{code}
org.apache.tez.runtime.library.common.InputAttemptIdentifier object internals:
 OFFSET  SIZE                                                                   
   TYPE DESCRIPTION                               VALUE
      0     4                                                                   
        (object header)                           01 00 00 00 (00000001 
00000000 00000000 00000000) (1)
      4     4                                                                   
        (object header)                           00 00 00 00 (00000000 
00000000 00000000 00000000) (0)
      8     4                                                                   
        (object header)                           18 8e 08 00 (00011000 
10001110 00001000 00000000) (560664)
     12     4                                                                   
    int InputAttemptIdentifier.inputIdentifier    0
     16     4                                                                   
    int InputAttemptIdentifier.attemptNumber      0
     20     4                                                                   
    int InputAttemptIdentifier.spillEventId       -1
     24     1                                                                   
boolean InputAttemptIdentifier.shared             false
     25     3                                                                   
        (alignment/padding gap)                  
     28     4                                                          
java.lang.String InputAttemptIdentifier.pathComponent      null
     32     4   
org.apache.tez.runtime.library.common.InputAttemptIdentifier.SPILL_INFO 
InputAttemptIdentifier.fetchTypeInfo      (object)
     36     4                                                                   
        (loss due to the next object alignment)
Instance size: 40 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
{code}

{code}

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput object 
internals:
 OFFSET  SIZE                                                                   
       TYPE DESCRIPTION                               VALUE
      0     4                                                                   
            (object header)                           01 00 00 00 (00000001 
00000000 00000000 00000000) (1)
      4     4                                                                   
            (object header)                           00 00 00 00 (00000000 
00000000 00000000 00000000) (0)
      8     4                                                                   
            (object header)                           18 8e 08 00 (00011000 
10001110 00001000 00000000) (560664)
     12     4                                                                   
        int MapOutput.id                              1
     16     8                                                                   
       long MapOutput.size                            0
     24     1                                                                   
    boolean MapOutput.primaryMapOutput                false
     25     3                                                                   
            (alignment/padding gap)                  
     28     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput.Type 
MapOutput.type                            null
     32     4                  
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               null
     36     4     
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager 
MapOutput.merger                          null
     40     4                                                                   
     byte[] MapOutput.memory                          null
     44     4                             
org.apache.hadoop.io.BoundedByteArrayOutputStream MapOutput.byteStream          
            null
     48     4                                               
org.apache.hadoop.fs.FileSystem MapOutput.localFS                         null
     52     4                                                     
org.apache.hadoop.fs.Path MapOutput.tmpOutputPath                   null
     56     4                                                
org.apache.hadoop.io.FileChunk MapOutput.outputPath                      null
     60     4                                                          
java.io.OutputStream MapOutput.disk                            null
Instance size: 64 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}



> Reduce Object size of InputAttemptIdentifier and MapOutput for large jobs
> -------------------------------------------------------------------------
>
>                 Key: TEZ-3732
>                 URL: https://issues.apache.org/jira/browse/TEZ-3732
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3732.1.patch
>
>
> Objects in 64bit java are 12bytes + member size aligned to 8 bytes
> InputAttemptIdentifier -> 33Bytes gets aligned up to 40 bytes
> This class is just one byte over the 32 byte alignment. Reducing object size 
> by one byte can save 8 bytes per object.
> This is ~8MB savings for 1,000,000 inputs and ~80 MB savings for tasks with 
> 10,000,000 inputs to fetch (Yes this is a real job)
> MapOutput -> 45 bytes gets aligned to 48 bytes
> This class can be sub-classed to avoid all sub-classes paying the object size 
> cost for the other sub-classes
> Wait InMemory and DiskDirect -> 32 bytes
> Disk -> 40 bytes
> Total savings is harder to account for but more than the above case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (TEZ-3732) Reduce Object size of InputAttemptIdentifier and MapOutput for large jobs

Reply via email to