Hi Chris, Thanks for the feedback. Next time be sure to CC the lists so that everyone can benefit from the discussion.
There are a few reasons why I propose removing the DELETE command in favor of a more general INSERT command. First, I've always found the DELETE command to be a source of confusion. It belies the fact that deletes are handled by *inserting* tombstone records. Second, the current DELETE command is not very expressive and is also confusing. It only allows you to delete data from a single row. Also, the syntax for doing a "CELL" delete vs. a "CELL VERSION" delete is subtly confusing. The syntax looks very similar: TIMESTAMP <timestamp> VERSION <timestamp> Both clauses supply a timestamp, but the VERSION form has the additional effect of changing the flags field to FLAG_DELETE_CELL_VERSION. This doesn't feel like good API design. Lastly, we've had a number of requests for the ability to insert and delete data in a single HQL command. Currently the only way to do that via HQL is through two separate API calls which can lead to transient inconsistencies. Having said all of that, I'm open to a DELETE command, but it really needs a complete overhaul. We could introduce a new DELETE command with syntax that is modelled after the SELECT syntax. However, that's a pretty big project. We'll have to figure out how to cleanly express CELL deletes and CELL VERSION deletes. Also, a query like the following: DELETE * FROM <table> WHERE "bar" < ROW <= "foo"; would turn into two separate commands under-the-hood. First a query to figure out what rows are to be deleted, followed by a (potentially enormous) set of tombstone inserts. To handle this in a somewhat consistent way, it should probably be implemented as some sort of transaction where a single revision number is chosen and supplied consistently to both sub-operations. My inclination is to remove the DELETE command for now, but keep the documentation and change it to something along the lines of: "Deletes are handled by inserting tombstone records that describe what data is to be deleted. Cells deleted by a tombstone are not physically deleted immediately, but become invisible once the tombstone has been inserted. Tombstones and the cells in which they delete are physically removed at a later time through garbage collection. See the INSERT command for how to delete data via tombstone insertion." This will force everyone to shift away from the current broken DELETE command and get the pain of API change out of the way now. Then when we have time to revisit DELETE and make it fully expressive (including support for queries like the one shown above), we can then unveil the new command when it's ready without causing disruption. Thoughts? - Doug On Wed, Aug 8, 2012 at 11:13 PM, Christoph Rupp <[email protected]>wrote: > Hi Doug, > > a few comments: > > 1. the DELETE keyword is self-explanatory, whereas it's not really > intuitive that you have to insert something in order to delete it. That > could turn out to be a problem for evaluators and newbies. I would suggest > at least to leave DELETE in the parser, but translate it to an INSERT > internally. > > 2. We have one issue requesting more flexibility in DELETE (and this was > asked on the mailing list and in stack overflow a couple of times). DELETE > should support all the predicates that are also supported by SELECT. If you > remove the DELETE keyword then this change will become very difficult. > http://code.google.com/p/hypertable/issues/detail?id=141 > > bye > Christoph > > 2012/8/9 Doug Judd <[email protected]> > >> I'm in the process of re-writing the HQL parser using the latest >> version of Boost Spirit. As part of this process, I'd like to clean up >> HQL. I propose that we drop the DELETE command in favor of a more >> flexible INSERT command. The basic format of the INSERT command will >> remain the same: >> >> INSERT INTO <table> VALUES '(' <value-tuple> ')' [ '(' <value-tuple> ')' >> ... ] >> >> But the format of <value-tuple> would change to be more flexible, >> allowing for *deletes* as well as *inserts*. >> >> *Timestamps* - As is the case now, timestamps will be optional in the >> <value-tuple>. However, I propose moving the timestamp to the end and >> not requiring quotes around the timestamp. Here are examples of the >> supported formats for the <timestamp> field: >> >> auto # auto-assign timestamp >> min # absolute minimum timestamp >> 2012 # date (year-only) >> 2012-08 # date (year+month) >> 2012-08-08 # date >> 2012-08-08 12:02:00 # date+time >> 2012-08-08 12:02:00:823432455 # date+time with nanoseconds >> 1234343454s # seconds since the epoch >> 1234343454782639847ns # nanoseconds since the epoch >> >> *Inserts* - The following are <value-tuple> examples for inserts: >> >> ("row", "c:q", "value") >> ("row", "c:q", "value", <timestamp>) >> (INS, "row", "c:q", "value") >> (INS, "row", "c:q", "value", <timestamp>) >> >> *Delete Row* - The following are <value-tuple> examples for deleting >> rows: >> >> (DEL, "row") >> (DEL, "row", <timestamp>) >> >> *Delete Column Family* - The following are <value-tuple> examples for >> deleting column families: >> >> (DEL, "row", "c") >> (DEL, "row", "c", <timestamp>) >> >> *Delete Cell* - The following are <value-tuple> examples for deleting >> cells (e.g. delete all cell versions with this timestamp or less): >> >> (DEL, "row", "c:q") >> (DEL, "row", "c:q", <timestamp>) >> >> *Delete Cell Version* - The following are <value-tuple> examples for >> deleting a specific cell version (e.g. delete the cell version with this >> exact timestamp): >> >> (DEL_VER, "row", "c:q") >> (DEL_VER, "row", "c:q", <timestamp>) >> >> The INSERT command will accept a mix of any of the above <value-tuple> >> formats, which will obviate the need for the DELETE command (which was kind >> of wacky anyway). Here are a couple of DELETE examples along with how they >> would be formulated with the new INSERT syntax: >> >> DELETE c FROM foo WHERE ROW = "row1; >> INSERT INTO foo VALUES (DEL, "row1", "c"); >> >> DELETE * FROM foo WHERE ROW = "row1 TIMESTAMP "2012-02-13"; >> INSERT INTO foo VALUES (DEL, "row1", 2012-02-13); >> >> If you have any feedback, please let me know. I'd like make these >> changes for the upcoming 0.9.6.1 release. >> >> - Doug >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Hypertable Development" group. >> To post to this group, send email to [email protected]. >> To unsubscribe from this group, send email to >> [email protected]. >> For more options, visit this group at >> http://groups.google.com/group/hypertable-dev?hl=en. >> > > -- Doug Judd CEO, Hypertable Inc. -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
