hi, 1. correct. one carbon file is same as one block, one block has many blocklets as well as one file footer which has metadata(btree index) of blocklets. one load makes one segment,in one segment has many blocks. 2. carbon will sort dim column data in one blocklet, then the row sequence will lost, so carbon will store dim column data as will as row id together, and dim column data sorted and row id sequence changed correspondingly , so the matchup(like Array: index => data) is kept. when query, carbon will first get the expected dim column data (based on filter), then accorfing to matchup to get row id. then based on the row id, we can get measure data. so the column data is called as inverted index, which means data => index, not index => data. 3. yes.
------------------ ???????? ------------------ ??????: "weijie tong";<[email protected]>; ????????: 2016??10??21??(??????) ????4:01 ??????: "dev"<[email protected]>; ????: questions about carbondata 1,what's the relation ship between these term? carbondata file ,block, blocklet ,carbondata file footer ? once we have a batch job to load data into a carbondata table, does that mean the table file will be composed by different blocks ,and each block is a carbondata file which is composed by many blocklets ,and one FileFooter according to the carbondata file format ? 2, how does the column data store as inverted index? invert the dim column data to what ? how does inverted index affect a query ? 3. does all the blocklets store sequence according to the sorted mdk key ? hope someone can give a detail answer.
