optimization for log splitting (theory/suggestion)
--------------------------------------------------
Key: HBASE-3099
URL: https://issues.apache.org/jira/browse/HBASE-3099
Project: HBase
Issue Type: Bug
Reporter: ryan rawson
Right now log splitting is slower than we'd like. The slow pace of log
splitting is one of the reasons why we have to keep a short, bounded, limit of
the outstanding log files. It would be nice to up that limit, to allow perhaps
hundreds of logs. It would increase efficiency because we would not be
force-flushing regions at non-ideal sizes.
But more data means more to process. Except that not all of the logs for a
regionserver are actually useful. This is because some regions got flushed
before the oldest log was trimmed. So during log recovery if we read the most
recent sequenceid, we could skip, during log splitting (in the master), those
entries and avoid writing them to the per-region log recovery. It would reduce
the IO by part, and if our serialization/deser code was clever we might be able
to avoid deserializing much.
It's not clear how effective or worthwhile this might be.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.