tks for the reply, for 3,I still want to know that whether all the blocklets of all the blocks store sequence according to the sorted mdk key? if so , the global sequence mdk key of the carbon table would behave like what hbase rowkey does . or the sequence is block local ,the block index file manage the block level index?
On Fri, Oct 21, 2016 at 5:48 PM, 杰 <[email protected]> wrote: > hi, > 1. correct. > one carbon file is same as one block, one block has many blocklets as > well as one file footer which has metadata(btree index) of blocklets. > one load makes one segment,in one segment has many blocks. > 2. carbon will sort dim column data in one blocklet, then the row > sequence will lost, so carbon will store dim column data as will as row id > together, > and dim column data sorted and row id sequence changed correspondingly > , so the matchup(like Array: index => data) is kept. > when query, carbon will first get the expected dim column data (based > on filter), then accorfing to matchup to get row id. > then based on the row id, we can get measure data. > so the column data is called as inverted index, which means data => > index, not index => data. > 3. yes. > > > > > ------------------ 原始邮件 ------------------ > 发件人: "weijie tong";<[email protected]>; > 发送时间: 2016年10月21日(星期五) 下午4:01 > 收件人: "dev"<[email protected]>; > > 主题: questions about carbondata > > > > 1,what's the relation ship between these term? > carbondata file ,block, blocklet ,carbondata file footer ? once we have a > batch job to load data into a carbondata table, does that mean the table > file will be composed by different blocks ,and each block is a carbondata > file which is composed by many blocklets ,and one FileFooter according to > the carbondata file format ? > > 2, how does the column data store as inverted index? > invert the dim column data to what ? how does inverted index affect a > query ? > > 3. does all the blocklets store sequence according to the sorted mdk key ? > > hope someone can give a detail answer. >
