[
https://issues.apache.org/jira/browse/CARBONDATA-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17221222#comment-17221222
]
Yahui Liu commented on CARBONDATA-4044:
---------------------------------------
I have faced this issue, and try to solve it. Currently we will call
CarbonLoaderUtil.checkAndCreateCarbonDataLocation to check and create
Segment_XXX folder(if not exist), but we didn't check wherther stale data exist
in segment folder when Segment_XXX folder already exists. My idea is to try to
remove Segment_XXX folder always before creating Segment_XXX folder again. It
will make sure there will be no stale data. Is this solution validation for all
cases? Please provide some ideas to me. Thanks.
> Fix dirty data in indexfile while IUD with stale data in segment folder
> -----------------------------------------------------------------------
>
> Key: CARBONDATA-4044
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4044
> Project: CarbonData
> Issue Type: Bug
> Reporter: Xingjun Hao
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> XX.mergecarbonindex and XX..segment records the indexfiles list of a segment.
> now, we generate xx.mergeindexfile and xx.segment based on filter out all
> indexfiles(including carbonindex and mergecarbonindex), which will leading
> dirty data when there is stale data in segment folder.
> For example, there are a stale index file in segment_0 folder,
> "0_1603763776.carbonindex".
> While loading, a new carbonindex "0_16037752342.carbonindex" is wrote, when
> merge carbonindex files, we expect to only merge 0_16037752342.carbonindex,
> But If we filter out all carbonindex in segment folder, both
> "0_1603763776.carbonindex" and 0_16037752342.carbonindex will be merged and
> recorded into segment file.
>
> While updating, there has same problem.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)