On Sat, 2007-07-14 at 12:49 -0700, alakshman wrote: > I had a question about how stuff is being written to the HLogs. Each column > family that makes up a table has its own on disk representation. However > there is only one HLog for all tables.
This isn't quite true. There is one HLog per HRegionServer. > Which means on every write, the > individual HMemcache's for each column family in the row mutation are > updated but the entire row is written to the HLog. Also not quite true. The entire row is not written, only the changes are written to the HLog. > Now when a column family's HMemcache is flushed a token is written to HLog > indicating that the column family for this table has been flushed ? There > may be other column families which have not yet been flushed. Since we seem > to write the entire rows to the HLog how can one tell that the log file has > only flushed entities w/o a scan of the entire file ? When the memcache is flushed, it happens on a per-region basis. That is all the changes that apply to that region (all changed columns) are written to disk. After the changes are flushed, a flushcache-complete is written to the log indicating that all changes older than this id can be ignored. HLog maintains a couple of in-memory structures indicating for each region, what the last flushed sequence number is, and also has a map of flush id's to output files. When the log is rolled, it determines the oldest outstanding sequence number (the oldest sequence number that has not been flushed) and knows that it can discard all the files with sequence numbers older than the oldest outstanding change. If a region server crashes, the master determines which regions the region server was serving and has the hlog split into a separate part for each region, and leaves the hlog in a special location. When the master reassigns the region, part of starting up a region includes processing any log entries that were not flushed (HRegion looks for an old log file in the special location). Once the outstanding log entries have been processed, the region can be brought on line. > Is the sequential scan > unavoidable to determine if the HLog can be deleted when it is rolled away ? > > Please explain. > > THanks > Avinash -- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED]
