[
https://issues.apache.org/jira/browse/TEZ-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014274#comment-16014274
]
Jonathan Eagles edited comment on TEZ-3732 at 5/17/17 7:48 PM:
---------------------------------------------------------------
After:
{code}
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 0x0000000800000000 base address and 0-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
{code}
{code}
Instantiated the sample instance via public
org.apache.tez.runtime.library.common.InputAttemptIdentifier(int,int)
org.apache.tez.runtime.library.common.InputAttemptIdentifier object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01
00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00
00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 28
c0 0c 00 (00101000 11000000 00001100 00000000) (835624)
12 4 int InputAttemptIdentifier.inputIdentifier 0
16 4 int InputAttemptIdentifier.attemptNumber 0
20 4 int InputAttemptIdentifier.spillEventId -1
24 1 boolean InputAttemptIdentifier.shared false
25 1 byte InputAttemptIdentifier.fetchTypeInfo 0
26 2 (alignment/padding gap)
28 4 java.lang.String InputAttemptIdentifier.pathComponent null
Instance size: 32 bytes
Space losses: 2 bytes internal + 0 bytes external = 2 bytes total
{code}
{code}
Failed to find matching constructor, falling back to class-only introspection.
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput object
internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 12
(object header) N/A
12 4
int MapOutput.id N/A
16 1
boolean MapOutput.primaryMapOutput N/A
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier N/A
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback N/A
28 4
(loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
{code}
{code}
Instantiated the sample instance via
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$WaitMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$WaitMapOutput
object internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 4
(object header) 01
00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4
(object header) 00
00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4
(object header) b8
dc 0c 00 (10111000 11011100 00001100 00000000) (842936)
12 4
int MapOutput.id 1
16 1
boolean MapOutput.primaryMapOutput false
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier null
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback null
28 4
(loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
{code}
{code}
Instantiated the sample instance via
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
object internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 4
(object header) 01
00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4
(object header) 00
00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4
(object header) 70
da 0c 00 (01110000 11011010 00001100 00000000) (842352)
12 4
int MapOutput.id 2
16 1
boolean MapOutput.primaryMapOutput false
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier null
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback null
28 4
org.apache.hadoop.io.BoundedByteArrayOutputStream InMemoryMapOutput.byteStream
(object)
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}
{code}
Failed to find matching constructor, falling back to class-only introspection.
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskDirectMapOutput
object internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 12
(object header) N/A
12 4
int MapOutput.id N/A
16 1
boolean MapOutput.primaryMapOutput N/A
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier N/A
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback N/A
28 4
org.apache.hadoop.io.FileChunk DiskDirectMapOutput.outputPath N/A
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}
{code}
Instantiated the sample instance via private
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,org.apache.hadoop.fs.Path,long,boolean,org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskMapOutput
object internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 4
(object header) 01
00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4
(object header) 00
00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4
(object header) 78
d5 0c 00 (01111000 11010101 00001100 00000000) (841080)
12 4
int MapOutput.id 5
16 1
boolean MapOutput.primaryMapOutput false
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier null
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback null
28 4
org.apache.hadoop.fs.Path DiskMapOutput.tmpOutputPath null
32 4
org.apache.hadoop.io.FileChunk DiskMapOutput.outputPath
(object)
36 4
java.io.OutputStream DiskMapOutput.disk null
Instance size: 40 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}
was (Author: jeagles):
{code}
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 0x0000000800000000 base address and 0-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
{code}
{code}
Instantiated the sample instance via public
org.apache.tez.runtime.library.common.InputAttemptIdentifier(int,int)
org.apache.tez.runtime.library.common.InputAttemptIdentifier object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01
00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00
00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 28
c0 0c 00 (00101000 11000000 00001100 00000000) (835624)
12 4 int InputAttemptIdentifier.inputIdentifier 0
16 4 int InputAttemptIdentifier.attemptNumber 0
20 4 int InputAttemptIdentifier.spillEventId -1
24 1 boolean InputAttemptIdentifier.shared false
25 1 byte InputAttemptIdentifier.fetchTypeInfo 0
26 2 (alignment/padding gap)
28 4 java.lang.String InputAttemptIdentifier.pathComponent null
Instance size: 32 bytes
Space losses: 2 bytes internal + 0 bytes external = 2 bytes total
{code}
{code}
Failed to find matching constructor, falling back to class-only introspection.
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput object
internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 12
(object header) N/A
12 4
int MapOutput.id N/A
16 1
boolean MapOutput.primaryMapOutput N/A
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier N/A
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback N/A
28 4
(loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
{code}
{code}
Instantiated the sample instance via
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$WaitMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$WaitMapOutput
object internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 4
(object header) 01
00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4
(object header) 00
00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4
(object header) b8
dc 0c 00 (10111000 11011100 00001100 00000000) (842936)
12 4
int MapOutput.id 1
16 1
boolean MapOutput.primaryMapOutput false
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier null
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback null
28 4
(loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 3 bytes internal + 4 bytes external = 7 bytes total
{code}
{code}
Instantiated the sample instance via
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
object internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 4
(object header) 01
00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4
(object header) 00
00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4
(object header) 70
da 0c 00 (01110000 11011010 00001100 00000000) (842352)
12 4
int MapOutput.id 2
16 1
boolean MapOutput.primaryMapOutput false
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier null
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback null
28 4
org.apache.hadoop.io.BoundedByteArrayOutputStream InMemoryMapOutput.byteStream
(object)
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}
{code}
Failed to find matching constructor, falling back to class-only introspection.
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskDirectMapOutput
object internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 12
(object header) N/A
12 4
int MapOutput.id N/A
16 1
boolean MapOutput.primaryMapOutput N/A
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier N/A
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback N/A
28 4
org.apache.hadoop.io.FileChunk DiskDirectMapOutput.outputPath N/A
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}
{code}
Instantiated the sample instance via private
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,org.apache.hadoop.fs.Path,long,boolean,org.apache.hadoop.fs.FileSystem,org.apache.hadoop.fs.Path)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$DiskMapOutput
object internals:
OFFSET SIZE
TYPE DESCRIPTION VALUE
0 4
(object header) 01
00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4
(object header) 00
00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4
(object header) 78
d5 0c 00 (01111000 11010101 00001100 00000000) (841080)
12 4
int MapOutput.id 5
16 1
boolean MapOutput.primaryMapOutput false
17 3
(alignment/padding gap)
20 4
org.apache.tez.runtime.library.common.InputAttemptIdentifier
MapOutput.attemptIdentifier null
24 4
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped
MapOutput.callback null
28 4
org.apache.hadoop.fs.Path DiskMapOutput.tmpOutputPath null
32 4
org.apache.hadoop.io.FileChunk DiskMapOutput.outputPath
(object)
36 4
java.io.OutputStream DiskMapOutput.disk null
Instance size: 40 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}
> Reduce Object size of InputAttemptIdentifier and MapOutput for large jobs
> -------------------------------------------------------------------------
>
> Key: TEZ-3732
> URL: https://issues.apache.org/jira/browse/TEZ-3732
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3732.1.patch, TEZ-3732.2.patch
>
>
> Objects in 64bit java are 12bytes + member size aligned to 8 bytes
> InputAttemptIdentifier -> 33Bytes gets aligned up to 40 bytes
> This class is just one byte over the 32 byte alignment. Reducing object size
> by one byte can save 8 bytes per object.
> This is ~8MB savings for 1,000,000 inputs and ~80 MB savings for tasks with
> 10,000,000 inputs to fetch (Yes this is a real job)
> MapOutput -> 45 bytes gets aligned to 48 bytes
> This class can be sub-classed to avoid all sub-classes paying the object size
> cost for the other sub-classes
> Wait InMemory and DiskDirect -> 32 bytes
> Disk -> 40 bytes
> Total savings is harder to account for but more than the above case.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)