[ 
https://issues.apache.org/jira/browse/CARBONDATA-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-2018.
----------------------------------
    Resolution: Fixed

> Optimization in reading/writing for sort temp row during data loading
> ---------------------------------------------------------------------
>
>                 Key: CARBONDATA-2018
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2018
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: data-load
>    Affects Versions: 1.3.0
>            Reporter: xuchuanyin
>            Assignee: xuchuanyin
>            Priority: Major
>             Fix For: 1.4.0
>
>          Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> # SCENARIO
> Currently in carbondata data loading, during sort process step, records will 
> be sorted partially and spilled to the disk. And then carbondata will read 
> these records and do merge sort.
> Since sort step is CPU-tense, during writing/reading these records, we can 
> optimize the serialization/deserialization for these rows and reduce CPU 
> consumption in parsing the rows.
> This should enhance the data loading performance.
> # RESOLVE
> We can pick up the un-sorted fields in the row and pack them as bytes array 
> and skip paring them.
> # RESULT
> I've tested it in my cluster and seen about 8% performance gained 
> (74MB/s/Node -> 81MB/s/Node).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to