xuchuanyin created CARBONDATA-2018:
--------------------------------------

             Summary: Optimization in reading/writing for sort temp row during 
data loading
                 Key: CARBONDATA-2018
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2018
             Project: CarbonData
          Issue Type: Improvement
          Components: data-load
    Affects Versions: 1.3.0
            Reporter: xuchuanyin
            Assignee: xuchuanyin
             Fix For: 1.3.0


# SCENARIO

Currently in carbondata data loading, during sort process step, records will be 
sorted partially and spilled to the disk. And then carbondata will read these 
records and do merge sort.

Since sort step is CPU-tense, during writing/reading these records, we can 
optimize the serialization/deserialization for these rows and reduce CPU 
consumption in parsing the rows.

This should enhance the data loading performance.

# RESOLVE
We can pick up the un-sorted fields in the row and pack them as bytes array and 
skip paring them.

# RESULT

I've tested it in my cluster and seen about 8% performance gained (74MB/s/Node 
-> 81MB/s/Node).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to