shimingfei created SPARK-7389:
---------------------------------

             Summary: Tachyon integration improvement
                 Key: SPARK-7389
                 URL: https://issues.apache.org/jira/browse/SPARK-7389
             Project: Spark
          Issue Type: Improvement
          Components: Block Manager
            Reporter: shimingfei


Two main changes:

1. Add two functions in ExternalBlockManager, which are putValues and 
getValues, because the implementation may not rely on the putBytes and getBytes

2. improve Tachyon integration.
Currently, when putting data into Tachyon, Spark first serialize all data in 
one partition into a ByteBuffer, and then write into Tachyon, this will use 
much memory and increase GC overhead

when getting data from Tachyon, getValues depends on getBytes, which also read 
all data into On heap byte arry, and result in much memory usage.
This PR changes the approach of the two functions, make them read / write data 
by stream to reduce memory usage.

In our testing, when data size is huge, this patch reduces about 30% GC time 
and 70% full GC time, and total execution time reduces about 10%



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to