Hi

Thank you started this discussion.
This proposal is for improving data updation performance, right ?

Regards
Liang


Linwood wrote
> *[Background]*
> Update operation will clean up delta files before update( see
> cleanUpDeltaFiles(carbonTable, false)), It's loop traversal metadata path
> and segment path many times. When there are too many files, the overhead
> will increase and update time will be longer.
> 
> *[Motivation & Goal]*
> During the update process, reduce loop traversal or remove
> cleanUpDelteFiles
> to another method.
> 
> *[Modification]*
> There are some solutions as following.
> 
> Solution 1:
> 
> In cleanUpDeltaFiles have some same points in get files method, like
> updateStatusManager.getUpdateDeltaFilesList(segment,
> false,CarbonCommonConstants.UPDATE_DELTA_FILE_EXT, true,
> allSegmentFiles,true) and
> updateStatusManager.getUpdateDeltaFilesList(segment,
> false,CarbonCommonConstants.UPDATE_INDEX_FILE_EXT, true,
> allSegmentFiles,true), They are just different file types,but loop
> traversal
> segment path twice. we can merge it.
> 
> Solution 2:
> 
> Base solution 1,Use Spark or MapReduce to hand over tasks to other nodes.
> 
> Solution 3:
> 
> Submit cleanUpDelaFiles  to another task, process them in the early
> morning
> or when the cluster is not busy.
> 
> Solution 4:
> 
> Establish a garbage collection bin, which provides some interfaces for our
> program to determine when files enter the garbage collection bin and how
> to
> deal with them.
> 
> Please vote for all solutions.
> 
> Best Regards,
> LinWood
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Reply via email to