Hi Aniket, I agree with Vimal opinion, but that use case will be very less.
I have one query for this update and delete feature. When we will start compaction after each update or delete operation? -Regards Kumar Vishal On Thu, Nov 24, 2016 at 12:05 AM, Aniket Adnaik <[email protected]> wrote: > Hi Vimal, > > Thanks for your suggestions. > For the 1st point, i tend to agree with Manish's comments. But, it's worth > looking into different ways to optimize the performance. > I guess, query performance may take priority over update performance. > Basically, we may need better compaction approach to merge > delta files into regular carbon files to maintain adequate performance. > For the 2nd point, CarbonData will support updating multiple rows, but not > the same row multiple times in a single update operation. It is possible > that join condition in sub-select of original update statement can result > into multiple rows from source table for the same row in the target table. > This is ambiguous condition and common ways to solve this is to error out , > or to apply first matching row, or to apply last matching row. CarbonData > will choose to error out and let user resolve the ambiguity, which a > safer/standard choice. > > Best Regards, > Aniket > > On Wed, Nov 23, 2016 at 4:54 AM, manish gupta <[email protected]> > wrote: > > > Hi Vimal, > > > > I have few queries regarding regarding the 1st suggestion. > > > > 1. Dimensions can both be dictionary and no dictionary. If we update the > > dictionary file then we will have to maintain 2 flows one for dictionary > > columns and 1 for no dictionary columns. Will that be ok? > > > > 2. We write dictionary files in append mode. Updating dictionary files > will > > be like completely rewriting the dictionary file which will also modify > the > > dictionary metadata and sort index file OR there is some other approach > > that needs to be followed like maintaining a update delta mapping for > > dictionary file. > > > > Regards > > Manish Gupta > > > > On Wed, Nov 23, 2016 at 10:47 AM, Vimal Das Kammath < > > [email protected]> wrote: > > > > > Hi Aniket, > > > > > > The design looks sound and the documentation is great. > > > I have few suggestions. > > > > > > 1) Measure update vs dimension update : In case of dimension update. > for > > > example user wants to change dept1 to dept2 for all users who are under > > > dept1. Can we just update the dictionary for faster performance? > > > 2) Update Semantics (one matching record vs multiple matching record): > I > > > could not understand this section. Wanted to confirm if we will support > > one > > > update statement updating multiple rows. > > > > > > -Vimal > > > > > > On Tue, Nov 22, 2016 at 2:30 PM, Liang Chen <[email protected]> > > > wrote: > > > > > > > Hi Aniket > > > > > > > > Thanks you finished the good design documents. A couple of inputs > from > > my > > > > side: > > > > > > > > 1.Please add the below mentioned info(Rowid definition etc.) to > design > > > > documents also. > > > > 2.In page6 :"Schema change operation can run in parallel with Update > or > > > > Delte operations, but not with another schema change operation" , can > > you > > > > explain this item ? > > > > 3.Please unify the description: use "CarbonData" to replace > "Carbon", > > > > unify the description for "destination table" and "target table". > > > > 4.The Update operation's delete delta is same with Delete operation's > > > > delete > > > > delta? > > > > > > > > BTW, it would be much better if you could provide google docs for > > review > > > in > > > > the next time, it is really difficult to give comment based on pdf > > > > documents > > > > :) > > > > > > > > Regards > > > > Liang > > > > > > > > Aniket Adnaik wrote > > > > > Hi Sujith, > > > > > > > > > > Please see my comments inline. > > > > > > > > > > Best Regards, > > > > > Aniket > > > > > > > > > > On Sun, Nov 20, 2016 at 9:11 PM, sujith chacko < > > > > > > > > > sujithchacko.2010@ > > > > > > > > > > > > > > > wrote: > > > > > > > > > >> Hi Aniket, > > > > >> > > > > >> Its a well documented design, just want to know few points > > like > > > > >> > > > > >> a. Format of the RowID and its datatype > > > > >> > > > > > AA>> Following format can be used to represent a unique rowed; > > > > > > > > > > [ > > > > > <Segment ID> > > > > > <Block ID> > > > > > <Blocklet ID> > > > > > <Offset in Blocklet> > > > > > ] > > > > > A simple way would be to use String data type and store it as a > text > > > > > file. > > > > > However, more efficient way could be to use Bitsets/Bitmaps as > > further > > > > > optimization. Compressed Bitmaps such as Roaring bitmaps can be > used > > > for > > > > > better performance and efficient storage. > > > > > > > > > > b. Impact of this feature in select query since every time query > > > process > > > > > has to exclude each deleted records and include corresponding > updated > > > > > record, any optimization is considered in tackling the query > > > performance > > > > > issue since one of the major highlights of carbon is performance. > > > > > AA>> Some of the optimizations would be to cache the deltas to > avoid > > > > > recurrent I/O, > > > > > to store sorted rowids in delete delta for efficient lookup, and > > > perform > > > > > regular compaction to minimize the impact on select query > > performance. > > > > > Additionally, we may have to explore ways to perform compaction > > > > > automatically, for example, if more than 25% of rows are read from > > > > deltas. > > > > > Please feel free to share if you have any ideas or suggestions. > > > > > > > > > > Thanks, > > > > > Sujith > > > > > > > > > > On Nov 20, 2016 9:24 PM, "Aniket Adnaik" < > > > > > > > > > aniket.adnaik@ > > > > > > > > > > wrote: > > > > > > > > > >> Hi All, > > > > >> > > > > >> Please find a design doc for Update/Delete support in CarbonData. > > > > >> > > > > >> https://drive.google.com/file/d/0B71_EuXTdDi8S2dxVjN6Z1RhWlU/view > ? > > > > >> usp=sharing > > > > >> > > > > >> Best Regards, > > > > >> Aniket > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > View this message in context: http://apache-carbondata- > > > > mailing-list-archive.1130556.n5.nabble.com/Feature-Design- > > > > Document-for-Update-Delete-support-in-CarbonData-tp3043p3093.html > > > > Sent from the Apache CarbonData Mailing List archive mailing list > > archive > > > > at Nabble.com. > > > > > > > > > >
