hi, 
1. correct. 
   one carbon file is same as one block, one block has many blocklets as well 
as one file footer which has metadata(btree index) of blocklets.
   one load makes one segment,in one segment has many blocks. 
2. carbon will sort dim column data in one blocklet,  then the row sequence 
will lost, so carbon will store  dim column data as will as row id together, 
   and dim column data sorted and row id sequence changed correspondingly , so 
the matchup(like Array: index => data) is kept.
   when query, carbon will first get  the expected dim column data (based on 
filter), then accorfing to matchup to get row id. 
   then based on the row id, we can get measure data.
   so the column data is called as inverted index, which means data => index, 
not index => data.
3. yes. 




------------------ ???????? ------------------
??????: "weijie tong";<[email protected]>;
????????: 2016??10??21??(??????) ????4:01
??????: "dev"<[email protected]>; 

????: questions about carbondata



1,what's the relation ship between these term?
 carbondata file ,block, blocklet ,carbondata file footer ? once we have a
batch job to load data into a carbondata table, does that mean the table
file will be composed by different blocks ,and each block is a carbondata
file  which is composed by many blocklets ,and one FileFooter  according to
the carbondata file format ?

2, how does the column data store as inverted index?
 invert the dim column data to what ? how does inverted index affect a
query ?

3. does all the blocklets store sequence according to the sorted mdk  key ?

hope someone can give a detail answer.

Reply via email to