Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

David CaiQiang Thu, 09 Jul 2020 19:45:32 -0700

The merging index should be a part of loading. It is not good to extract the
merging index to an independent process, it brought the query issue (the
system can't find the index files when/after merging).


In my opinion, during loading, new  .carbonindex files should be temporary, 
we should merge them to a .carbonindexmerge file in a segment before
updating the segment status to success in tablestatus file.
When the merging index failed, loading should be failed.

for query:
1.  support reading .carbonindex files and .carbonindexmerge files

for loading: (also include the loading part of compaction/create
index/create mv/merge operations)
better to do like this.
step 1. update tablestatus file to add an in-progress segment
step 2. generate carbondata file and temporary .carbonindex files, for a
partitioned table, it also generates a temporary segment file for each
related partition.
step 3. merge .carbonindex files to a .carbonindexmerge file.
step 4. write a segment file. for a partitioned table, merge all temporary
segment files to one segment file.
step 5. update tablestatus file with final status, segment file name and
some statistics. 

So in total,
 update tablestatus file twice,
 write segment file once,
 write .carbonindex files once,
 write and delete .carbonindex files once. 

for updating:
1. Now only updating operation can keep .carbonindex file 
in the future, maybe we can change updating operations to the same with
merge operation to generate new files into a new segment.



-----
Best Regards
David Cai
--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [Discussion]Do we still need to support carbon.merge.index.in.segment property ?

Reply via email to