[ 
https://issues.apache.org/jira/browse/TEZ-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014274#comment-16014274
 ] 

Jonathan Eagles edited comment on TEZ-3732 at 5/17/17 7:48 PM:
---------------------------------------------------------------

After:
{code}
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 0x0000000800000000 base address and 0-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
{code}

{code}
Instantiated the sample instance via public 
org.apache.tez.runtime.library.common.InputAttemptIdentifier(int,int)

org.apache.tez.runtime.library.common.InputAttemptIdentifier object internals:
 OFFSET  SIZE               TYPE DESCRIPTION                               VALUE
      0     4                    (object header)                           01 
00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                    (object header)                           00 
00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                    (object header)                           28 
c0 0c 00 (00101000 11000000 00001100 00000000) (835624)
     12     4                int InputAttemptIdentifier.inputIdentifier    0
     16     4                int InputAttemptIdentifier.attemptNumber      0
     20     4                int InputAttemptIdentifier.spillEventId       -1
     24     1            boolean InputAttemptIdentifier.shared             false
     25     1               byte InputAttemptIdentifier.fetchTypeInfo      0
     26     2                    (alignment/padding gap)
     28     4   java.lang.String InputAttemptIdentifier.pathComponent      null
Instance size: 32 bytes
Space losses: 2 bytes internal + 0 bytes external = 2 bytes total
{code}

{code}
Failed to find matching constructor, falling back to class-only introspection.

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput object 
internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0    12                                                                   
                                 (object header)                           N/A
     12     4                                                                   
                             int MapOutput.id                              N/A
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                N/A
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               N/A
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        N/A
     28     4                                                                   
                                 (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
{code}

{code}
Instantiated the sample instance via 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$WaitMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$WaitMapOutput
 object internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0     4                                                                   
                                 (object header)                           01 
00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                                                                   
                                 (object header)                           00 
00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                                                                   
                                 (object header)                           b8 
dc 0c 00 (10111000 11011100 00001100 00000000) (842936)
     12     4                                                                   
                             int MapOutput.id                              1
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                false
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               null
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        null
     28     4                                                                   
                                 (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
{code}

{code}
Instantiated the sample instance via 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 object internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0     4                                                                   
                                 (object header)                           01 
00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                                                                   
                                 (object header)                           00 
00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                                                                   
                                 (object header)                           70 
da 0c 00 (01110000 11011010 00001100 00000000) (842352)
     12     4                                                                   
                             int MapOutput.id                              2
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                false
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               null
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        null
     28     4                                                  
org.apache.hadoop.io.BoundedByteArrayOutputStream InMemoryMapOutput.byteStream  
            (object)
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}

{code}
Failed to find matching constructor, falling back to class-only introspection.

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskDirectMapOutput
 object internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0    12                                                                   
                                 (object header)                           N/A
     12     4                                                                   
                             int MapOutput.id                              N/A
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                N/A
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               N/A
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        N/A
     28     4                                                                   
  org.apache.hadoop.io.FileChunk DiskDirectMapOutput.outputPath            N/A
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}

{code}
Instantiated the sample instance via private 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,org.apache.hadoop.fs.Path,long,boolean,org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path)

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskMapOutput
 object internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0     4                                                                   
                                 (object header)                           01 
00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                                                                   
                                 (object header)                           00 
00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                                                                   
                                 (object header)                           78 
d5 0c 00 (01111000 11010101 00001100 00000000) (841080)
     12     4                                                                   
                             int MapOutput.id                              5
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                false
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               null
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        null
     28     4                                                                   
       org.apache.hadoop.fs.Path DiskMapOutput.tmpOutputPath               null
     32     4                                                                   
  org.apache.hadoop.io.FileChunk DiskMapOutput.outputPath                  
(object)
     36     4                                                                   
            java.io.OutputStream DiskMapOutput.disk                        null
Instance size: 40 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}


was (Author: jeagles):
{code}
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 0x0000000800000000 base address and 0-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
{code}

{code}
Instantiated the sample instance via public 
org.apache.tez.runtime.library.common.InputAttemptIdentifier(int,int)

org.apache.tez.runtime.library.common.InputAttemptIdentifier object internals:
 OFFSET  SIZE               TYPE DESCRIPTION                               VALUE
      0     4                    (object header)                           01 
00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                    (object header)                           00 
00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                    (object header)                           28 
c0 0c 00 (00101000 11000000 00001100 00000000) (835624)
     12     4                int InputAttemptIdentifier.inputIdentifier    0
     16     4                int InputAttemptIdentifier.attemptNumber      0
     20     4                int InputAttemptIdentifier.spillEventId       -1
     24     1            boolean InputAttemptIdentifier.shared             false
     25     1               byte InputAttemptIdentifier.fetchTypeInfo      0
     26     2                    (alignment/padding gap)
     28     4   java.lang.String InputAttemptIdentifier.pathComponent      null
Instance size: 32 bytes
Space losses: 2 bytes internal + 0 bytes external = 2 bytes total
{code}

{code}
Failed to find matching constructor, falling back to class-only introspection.

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput object 
internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0    12                                                                   
                                 (object header)                           N/A
     12     4                                                                   
                             int MapOutput.id                              N/A
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                N/A
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               N/A
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        N/A
     28     4                                                                   
                                 (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
{code}

{code}
Instantiated the sample instance via 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$WaitMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$WaitMapOutput
 object internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0     4                                                                   
                                 (object header)                           01 
00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                                                                   
                                 (object header)                           00 
00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                                                                   
                                 (object header)                           b8 
dc 0c 00 (10111000 11011100 00001100 00000000) (842936)
     12     4                                                                   
                             int MapOutput.id                              1
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                false
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               null
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        null
     28     4                                                                   
                                 (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
{code}

{code}
Instantiated the sample instance via 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
 object internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0     4                                                                   
                                 (object header)                           01 
00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                                                                   
                                 (object header)                           00 
00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                                                                   
                                 (object header)                           70 
da 0c 00 (01110000 11011010 00001100 00000000) (842352)
     12     4                                                                   
                             int MapOutput.id                              2
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                false
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               null
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        null
     28     4                                                  
org.apache.hadoop.io.BoundedByteArrayOutputStream InMemoryMapOutput.byteStream  
            (object)
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}

{code}
Failed to find matching constructor, falling back to class-only introspection.

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskDirectMapOutput
 object internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0    12                                                                   
                                 (object header)                           N/A
     12     4                                                                   
                             int MapOutput.id                              N/A
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                N/A
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               N/A
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        N/A
     28     4                                                                   
  org.apache.hadoop.io.FileChunk DiskDirectMapOutput.outputPath            N/A
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}

{code}
Instantiated the sample instance via private 
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,org.apache.hadoop.fs.Path,long,boolean,org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path)

org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskMapOutput
 object internals:
 OFFSET  SIZE                                                                   
                            TYPE DESCRIPTION                               VALUE
      0     4                                                                   
                                 (object header)                           01 
00 00 00 (00000001 00000000 00000000 00000000) (1)
      4     4                                                                   
                                 (object header)                           00 
00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                                                                   
                                 (object header)                           78 
d5 0c 00 (01111000 11010101 00001100 00000000) (841080)
     12     4                                                                   
                             int MapOutput.id                              5
     16     1                                                                   
                         boolean MapOutput.primaryMapOutput                false
     17     3                                                                   
                                 (alignment/padding gap)
     20     4                                       
org.apache.tez.runtime.library.common.InputAttemptIdentifier 
MapOutput.attemptIdentifier               null
     24     4   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
 MapOutput.callback                        null
     28     4                                                                   
       org.apache.hadoop.fs.Path DiskMapOutput.tmpOutputPath               null
     32     4                                                                   
  org.apache.hadoop.io.FileChunk DiskMapOutput.outputPath                  
(object)
     36     4                                                                   
            java.io.OutputStream DiskMapOutput.disk                        null
Instance size: 40 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}

> Reduce Object size of InputAttemptIdentifier and MapOutput for large jobs
> -------------------------------------------------------------------------
>
>                 Key: TEZ-3732
>                 URL: https://issues.apache.org/jira/browse/TEZ-3732
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3732.1.patch, TEZ-3732.2.patch
>
>
> Objects in 64bit java are 12bytes + member size aligned to 8 bytes
> InputAttemptIdentifier -> 33Bytes gets aligned up to 40 bytes
> This class is just one byte over the 32 byte alignment. Reducing object size 
> by one byte can save 8 bytes per object.
> This is ~8MB savings for 1,000,000 inputs and ~80 MB savings for tasks with 
> 10,000,000 inputs to fetch (Yes this is a real job)
> MapOutput -> 45 bytes gets aligned to 48 bytes
> This class can be sub-classed to avoid all sub-classes paying the object size 
> cost for the other sub-classes
> Wait InMemory and DiskDirect -> 32 bytes
> Disk -> 40 bytes
> Total savings is harder to account for but more than the above case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to