+1 Regards JB
On Dec 10, 2016, 09:33, at 09:33, "bill.zhou" <[email protected]> wrote: >+1 this modification will help all the scenario > >Kumar Vishal wrote >> Hello All, >> >> Improving carbon first time query performance >> >> Reason: >> 1. As file system cache is cleared file reading will make it slower >to >> read >> and cache >> 2. In first time query carbon will have to read the footer from file >data >> file to form the btree >> 3. Carbon reading more footer data than its required(data chunk) >> 4. There are lots of random seek is happening in carbon as column >> data(data >> page, rle, inverted index) are not stored together. >> >> Solution: >> 1. Improve block loading time. This can be done by removing data >chunk >> from >> blockletInfo and storing only offset and length of data chunk >> 2. compress presence meta bitset stored for null values for measure >column >> using snappy >> 3. Store the metadata and data of a column together and read together >this >> reduces random seek and improve IO >> >> For this I am planing to change the carbondata thrift format >> >> *Old format* >> >> >> >> *New format* >> >> >> >> ** >> >> Please vote and comment for this new format change >> >> -Regards >> Kumar Vishal > > > > > >-- >View this message in context: >http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Please-vote-and-comment-for-carbon-data-file-format-change-tp2491p4049.html >Sent from the Apache CarbonData Mailing List archive mailing list >archive at Nabble.com.
