[
https://issues.apache.org/jira/browse/HBASE-6774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13502845#comment-13502845
]
nkeywal commented on HBASE-6774:
--------------------------------
For the master based solution
If we go for the regionserver -> master -> zookeeper solution, it's not
perfect imho, because we just add an agent in the middle.
The master could store the region information, without going to ZK
-> Faster than the solution with ZK, because we would not write to the disk
-> If we lose the master, we lose the date, but it's not an issue (just that
the recovery will be slower: we will have to read all the logs)
-> The master becomes an element of the write path (for the first write in a
memstore). I'm not at ease with that.
At the end of the day, I agree with what Stack said previously: let's not add a
new component in the write path. This is valid for both the master & ZK.
So we're left with the other options:
- specific WAL for .meta.
- adding meta data at the end of the WAL.
I'm currently looking at them.
> Immediate assignment of regions that don't have entries in HLog
> ---------------------------------------------------------------
>
> Key: HBASE-6774
> URL: https://issues.apache.org/jira/browse/HBASE-6774
> Project: HBase
> Issue Type: Improvement
> Components: master, regionserver
> Affects Versions: 0.96.0
> Reporter: nkeywal
>
> The algo is today, after a failure detection:
> - split the logs
> - when all the logs are split, assign the regions
> But some regions can have no entries at all in the HLog. There are many
> reasons for this:
> - kind of reference or historical tables. Bulk written sometimes then read
> only.
> - sequential rowkeys. In this case, most of the regions will be read only.
> But they can be in a regionserver with a lot of writes.
> - tables flushed often for safety reasons. I'm thinking about meta here.
> For meta; we can imagine flushing very often. Hence, the recovery for meta,
> in many cases, will be the failure detection time.
> There are different possible algos:
> Option 1)
> A new task is added, in parallel of the split. This task reads all the HLog.
> If there is no entry for a region, this region is assigned.
> Pro: simple
> Cons: We will need to read all the files. Add a read.
> Option 2)
> The master writes in ZK the number of log files, per region.
> When the regionserver starts the split, it reads the full block (64M) and
> decrease the log file counter of the region. If it reaches 0, the assign
> start. At the end of its split, the region server decreases the counter as
> well. This allow to start the assign even if not all the HLog are finished.
> It would allow to make some regions available even if we have an issue in one
> of the log file.
> Pro: parallel
> Cons: add something to do for the region server. Requites to read the whole
> file before starting to write.
> Option 3)
> Add some metadata at the end of the log file. The last log file won't have
> meta data, as if we are recovering, it's because the server crashed. But the
> others will. And last log file should be smaller (half a block on average).
> Option 4) Still some metadata, but in a different file. Cons: write are
> increased (but not that much, we just need to write the region once). Pros:
> if we lose the HLog files (major failure, no replica available) we can still
> continue with the regions that were not written at this stage.
> I think it should be done, even if none of the algorithm above is totally
> convincing yet. It's linked as well to locality and short circuit reads: with
> these two points reading the file twice become much less of an issue for
> example. My current preference would be to open the file twice in the region
> server, once for splitting as of today, once for a quick read looking for
> unused regions. Who knows, may be it would even be faster this way, the quick
> read thread would warm-up the different caches for the splitting thread.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira