Hi Sujith, Please see my comments inline.
Best Regards, Aniket On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko <[email protected]> wrote: > Hi Aniket, > > Its a well documented design, just want to know few points like > > a. Format of the RowID and its datatype > AA>> Following format can be used to represent a unique rowed; [<Segment ID><Block ID><Blocklet ID><Offset in Blocklet>] A simple way would be to use String data type and store it as a text file. However, more efficient way could be to use Bitsets/Bitmaps as further optimization. Compressed Bitmaps such as Roaring bitmaps can be used for better performance and efficient storage. b. Impact of this feature in select query since every time query process has to exclude each deleted records and include corresponding updated record, any optimization is considered in tackling the query performance issue since one of the major highlights of carbon is performance. AA>> Some of the optimizations would be to cache the deltas to avoid recurrent I/O, to store sorted rowids in delete delta for efficient lookup, and perform regular compaction to minimize the impact on select query performance. Additionally, we may have to explore ways to perform compaction automatically, for example, if more than 25% of rows are read from deltas. Please feel free to share if you have any ideas or suggestions. Thanks, Sujith On Nov 20, 2016 9:24 PM, "Aniket Adnaik" <[email protected]> wrote: > Hi All, > > Please find a design doc for Update/Delete support in CarbonData. > > https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view? > usp=sharing > > Best Regards, > Aniket >
