Hi Manish, Yes, I agree, we'll have to include partition id if we start supporting partitioning in future. There might be other options, such as making segment id unique enough to include partition id as a part of it. On a side note - we may need transaction id as well if we start supporting transaction semantics in future.
Best Regards, Aniket On Mon, Nov 21, 2016 at 4:00 AM, manish gupta <[email protected]> wrote: > Hi Aniket, > > I think in RowID format we should also include partitionID. Currently > carbon is not supporting partition but going forward when we support > partitioning, this format would comply with it. > > [<Partition ID><Segment ID><Block ID><Blocklet ID><Offset in Blocklet>] > > Regards > Manish Gupta > > On Mon, Nov 21, 2016 at 1:07 PM, Aniket Adnaik <[email protected]> > wrote: > > > Hi Sujith, > > > > Please see my comments inline. > > > > Best Regards, > > Aniket > > > > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko < > > [email protected]> > > wrote: > > > > > Hi Aniket, > > > > > > Its a well documented design, just want to know few points like > > > > > > a. Format of the RowID and its datatype > > > > > AA>> Following format can be used to represent a unique rowed; > > > > [<Segment ID><Block ID><Blocklet ID><Offset in Blocklet>] > > A simple way would be to use String data type and store it as a text > file. > > However, more efficient way could be to use Bitsets/Bitmaps as further > > optimization. Compressed Bitmaps such as Roaring bitmaps can be used for > > better performance and efficient storage. > > > > b. Impact of this feature in select query since every time query process > > has to exclude each deleted records and include corresponding updated > > record, any optimization is considered in tackling the query performance > > issue since one of the major highlights of carbon is performance. > > AA>> Some of the optimizations would be to cache the deltas to avoid > > recurrent I/O, > > to store sorted rowids in delete delta for efficient lookup, and perform > > regular compaction to minimize the impact on select query performance. > > Additionally, we may have to explore ways to perform compaction > > automatically, for example, if more than 25% of rows are read from > deltas. > > Please feel free to share if you have any ideas or suggestions. > > > > Thanks, > > Sujith > > > > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" <[email protected]> > wrote: > > > > > Hi All, > > > > > > Please find a design doc for Update/Delete support in CarbonData. > > > > > > https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view? > > > usp=sharing > > > > > > Best Regards, > > > Aniket > > > > > >
