Hi all, I am pretty new to HBase so forgive me if this seems like a silly question.
Each row in my Hbase table is a geographical location that is related to other locations. For e.g. one relationship is the CONTAIN relationship. So, Europe CONTAINs England, France, Spain etc. There is an inverse relationship as well called PARENT, so England has a PARENT called Europe. However, note that, for various business reasons not pertinant to this discussion, the inverse relationship need not always be set, i.e. we may not store France with a PARENT value of Europe, even though Europe CONTAINs France. So, I store each location as a row with an id and the payload data for that location as a separate data column. This data column includes the sets of ids of the related locations. Now, I want to be able to update/delete locations consistently. So, in my example above, I might want to delete France, in which case I also want to make sure that I delete the CONTAINs relationship that Europe has with France as that is now obsolete. What is the most efficient way to do this? I want to minimise the number of writes I would have to do - on the other hand optimising read performance is more important as writes do not happen that often (this is geographic data after all). My thoughts are: I will have to do 1+n writes to do a delete - i.e. 1 write operation to delete France and n write operations to delete the relationships that n other locations may have to France. In the case of a root location like Europe that may have a large number of locations that relate to it this may be expensive, but I see no other way. So, I was wondering, how do I index this to speed this up as far as possible. So, given the location Europe, what are the fields I should include in its row and how to index them? I could create a column family for each relationship type with a label - the label being the id of the location this location is related to, so, for e.g., the Europe row would have a column called CONTAIN:England (assuming "England" is the id for the England column - in reality it would be a UUID). I would then have as many labels under the CONTAIN family for Europe as locations that Europe contains. How would I index this and ensure that when deleting France the query: "list all locations that CONTAIN France" returns with Europe (and whatever else) as quickly as possible? Thanks, Ishaaq
