[ANNOUNCE] Apache HBase 1.4.0 is now available for download

Andrew Purtell Tue, 19 Dec 2017 11:33:32 -0800

The HBase team is happy to announce the immediate availability of Apache
HBase 1.4.0!


Apache HBase is an open-source, distributed, versioned, non-relational
database. Apache HBase gives you low latency random access to billions of
rows with millions of columns atop non-specialized hardware. To learn more
about HBase, see https://hbase.apache.org/.

Download through an ASF mirror:

    https://www.apache.org/dyn/closer.lua/hbase/1.4.0

HBase 1.4.0 is the first release in the new HBase 1.4 line, continuing on
the theme of earlier 1.x releases of bringing a stable, reliable database
to the Apache Big Data ecosystem and beyond. As a minor release, 1.4.0
contains a number of new features and improvements that won't appear in
maintenance releases of older code lines. However complete compatibility
with data formats and interoperability with older clients is assured.
There are no special considerations for upgrade or rollback except as
noted in this release announcement.

Maintenance releases of the 1.4.0 code line will occur at roughly a
monthly cadence.

For instructions on verifying ASF release downloads, please see

    https://www.apache.org/dyn/closer.cgi#verify

Project member signature keys can be found at

    https://www.apache.org/dist/hbase/KEYS

Thanks to all the contributors who made this release possible!

The complete list of the 660 issues resolved in this release can be found
at https://s.apache.org/OErT . New developer and user-facing
incompatibilities, important issues, features, and major improvements
include:

Critical

- HBASE-15484 Correct the semantic of batch and partial

    Now setBatch doesn't mean setAllowPartialResult(true)

    Scan#setBatch is helpful in paging queries, if you just want to prevent
    OOM at client, use setAllowPartialResults(true) is better.

    We deprecated isPartial and use mayHaveMoreCellsInRow. If it returns
    false, current Result must be the last one of this row.

- HBASE-17287 Master becomes a zombie if filesystem object closes

    If filesystem is not available during log split, abort master server.

- HBASE-17471 Region Seqid will be out of order in WAL if using
  mvccPreAssign

    MVCCPreAssign is added by HBASE-16698, but pre-assign mvcc is only used
    in put/delete path. Other write paths like increment/append still assign
    mvcc in ringbuffer's consumer thread. If put and increment are used
    parallel. Then seqid in WAL may not increase monotonically. Disorder in
    wals will lead to data loss. This patch bring all mvcc/seqid event in
    wal.append, and synchronize wal append and mvcc acquirement. No disorder
    in wal will happen. Performance test shows no regression.

- HBASE-17595 Add partial result support for small/limited scan

    Now small scan and limited scan can also return partial results.

- HBASE-17717 Incorrect ZK ACL set for HBase superuser

    In previous versions of HBase, the system intended to set a ZooKeeper
    ACL on all "sensitive" ZNodes for the user specified in the
    hbase.superuser configuration property. Unfortunately, the ACL was
    malformed which resulted in the hbase.superuser being unable to access
    the sensitive ZNodes that HBase creates. HBase will automatically
    correct the ACLs on start so users do not need to manually correct the
    ACLs.

- HBASE-17887 Row-level consistency is broken for read

    Now we pass on list of memstoreScanners to the StoreScanner along with
    the new files to ensure that the StoreScanner sees the latest memstore
    after flush.

- HBASE-17931 Assign system tables to servers with highest version

    We usually keep compatibility between old client and new server so we
    can do rolling upgrade, HBase cluster first, then HBase client. But we
    don't guarantee new client can access old server. In an HBase cluster,
    we have system tables and region servers will access these tables so for
    servers they are also an HBase client. So if the system tables are in
    region servers with lower version we may get trouble because region
    servers with higher version may can not access them. After this patch,
    we will move all system regions to region servers with highest version.
    So when we do a rolling upgrade across two major or minor versions, we
    should ALWAYS UPGRADE MASTER FIRST and then upgrade region
 
   servers.
 
The new master will handle system tables correctly.

- HBASE-18035 Meta replica does not give any primaryOperationTimeout to
  primary meta region

    When a client is configured to use meta replica, it sends scan request
    to all meta replicas almost at the same time. Since meta replica
    contains stale data, if result from one of replica comes back first, the
    client may get wrong region locations. To fix this,
    "hbase.client.meta.replica.scan.timeout" is introduced, a client will
    always send to primary meta region first, wait the configured timeout
    for reply. If no result is received, it will send request to replica
    meta regions. The unit for "hbase.client.meta.replica.scan.timeout" is
    microsecond, the default value is 1000000 (1 second).

- HBASE-18137 Replication gets stuck for empty WALs

    0-length WAL files can potentially cause the replication queue to get
    stuck.  A new config "replication.source.eof.autorecovery" has been
    added: if set to true (default is false), the 0-length WAL file will be
    skipped after 1) the max number of retries has been hit, and 2) there
    are more WAL files in the queue.  The risk of enabling this is that
    there is a chance the 0-length WAL file actually has some data (e.g.
    block went missing and will come back once a datanode is recovered).

- HBASE-18164 Much faster locality cost function and candidate generator

    New locality cost function and candidate generator that use caching and
    incremental computation to allow the stochastic load balancer to
    consider ~20x more cluster configurations for big clusters.

- HBASE-18192 Replication drops recovered queues on region server shutdown

    If a region server that is processing recovered queue for another
    previously dead region server is gracefully shut down, it can drop the
    recovered queue under certain conditions. Running without this fix on a
    1.2+ release means possibility of continuing data loss in replication,
    irrespective of which WALProvider is used. If a single WAL group (or
    DefaultWALProvider) is used, running without this fix will always cause
    dataloss in replication whenever a region server processing recovered
    queues is gracefully shutdown.

- HBASE-18233 We shouldn't wait for readlock in doMiniBatchMutation in case
  of deadlock

    This patch plus the sort of mutations done in HBASE-17924 fixes a
    performance regression doing increments and/or checkAndPut-style
    operations.

- HBASE-18255 Time-Delayed HBase Performance Degradation with Java 7

    This change sets the JVM property ReservedCodeCacheSize to 256MB in the
    provided hbase-env.sh example file. The specific value for this property
    attempts to prevent performance issues seen when HBase using Java 7. The
    value set is the same as the default when using Java 8.

- HBASE-18469 Correct RegionServer metric of  totalRequestCount

    We introduced a new RegionServer metrics in name of
    "totalRowActionRequestCount" which counts in all row actions and equals
    to the sum of "readRequestCount" and "writeRequestCount". Meantime, we
    have changed "totalRequestCount" to count only once for multi request,
    while previously we will count in action number of the request. As a
    result, existing monitoring system on totalRequestCount will still work
    but see a smaller value, and we strongly recommend to change to use the
    new metrics to monitor server load.

- HBASE-18577 shaded client includes several non-relocated third party
  dependencies

    The HBase shaded artifacts (hbase-shaded-client and hbase-shaded-server)
    no longer contain several non-relocated third party dependency classes
    that were mistakenly included. Downstream users who relied on these
    classes being present will need to add a runtime dependency onto an
    appropriate third party artifact. Previously, we erroneously packaged
    several third party libs without relocating them. In some cases these
    libraries have now been relocated; in some cases they are no longer
    included at all.

    Includes:

      * jaxb
      * jetty
      * jersey
      * codahale metrics (HBase 1.4+ only)
      * commons-crypto
      * jets3t
      * junit
      * curator (HBase 1.4+)
      * netty 3 (HBase 1.1)
      * mokito-junit4 (HBase 1.1)

- HBASE-18665 ReversedScannerCallable invokes getRegionLocations
  incorrectly

    Performing reverse scan on tables used the meta cache incorrectly and
    fetched data from meta table every time. This fix solves this issue and
    which results in performance improvement for reverse scans.

- HBASE-19285 Add per-table latency histograms

    Per-RegionServer table latency histograms have been returned to HBase
    (after being removed due to impacting performance). These metrics are
    exposed via a new JMX bean "TableLatencies" with the typical naming
    conventions: namespace, table, and histogram component.

Major

- HBASE-7621 REST client doesn't support binary row keys

    RemoteHTable now supports binary row keys with any character or byte
    by properly encoding request URLs. This is a both a behavioral change
    from earlier versions and an important fix for protocol correctness.

- HBASE-11013 Clone Snapshots on Secure Cluster Should provide option to
  apply Retained User Permissions

    While creating a snapshot, it will save permissions of the original
    table into .snapshotinfo file(Backward compatibility), which is in the
    snapshot root directory.  For clone_snapshot/restore_snapshot command,
    we provide an additional option (RESTORE_ACL) to decide whether we will
    grant permissions of the origin table to the newly created table.

- HBASE-14548 Expand how table coprocessor jar and dependency path can
  be specified

    Allows a directory containing the jars or some wildcards to be
    specified, such as:

      hdfs://namenode:port/user/hadoop-user/

    or

      hdfs://namenode:port/user/hadoop-user/*.jar

    Please note that if a directory is specified, all jar files directly
    in the directory are added, but it does not search files in the
    subtree rooted in the directory. Do not use a wildcard if you would
    like to specify a directory.

- HBASE-14925 HBase shell command/tool to list table's region info through
  command line

    Added a shell command 'list_regions' for displaying the table's region
    info through command line. List all regions for a particular table as an
    array and also filter them by server name (optional) as prefix and
    maximum locality (optional). By default, it will return all the regions
    for the table with any locality. The command displays server name,
    region name, start key, end key, size of the region in MB, number of
    requests and the locality. The information can be projected out via an
    array as third parameter. By default all these information is displayed.
    Possible array values are SERVER_NAME, REGION_NAME, START_KEY,
    END_KEY, SIZE, REQ and LOCALITY. Values are not case sensitive. If you
    don't want to filter by server name, pass an empty hash or string.

- HBASE-15187 Integrate CSRF prevention filter to REST gateway

    Protection against CSRF attack can be turned on with config
    parameter, hbase.rest.csrf.enabled - default value is false.

    The custom header to be sent can be changed via config parameter,
    hbase.rest.csrf.custom.header whose default value is
    "X-XSRF-HEADER".

    The configuration parameter hbase.rest.csrf.methods.to.ignore
    controls which HTTP methods are not associated with custom header
    check.

    The config parameter hbase.rest-csrf.browser-useragents-regex is a
    comma-separated list of regular expressions used to match against an
    HTTP request's User-Agent header when protection against cross-site
    request forgery (CSRF) is enabled for REST server by setting
    hbase.rest.csrf.enabled to true.

- HBASE-15236 Inconsistent cell reads over multiple bulk-loaded HFiles

    During bulkloading, if there are multiple hfiles corresponding to
    the same region, and if they have same timestamps (which may have
    been set using importtsv.timestamp) and duplicate keys across them,
    then get and scan may return values coming from different hfiles.

- HBASE-15243 Utilize the lowest seek value when all Filters in
  MUST_PASS_ONE FilterList return SEEK_NEXT_USING_HINT

    When all filters in a MUST_PASS_ONE FilterList return a
    SEEK_USING_NEXT_HINT code, we return SEEK_NEXT_USING_HINT from
    FilterList#filterKeyValue() to utilize the lowest seek value.

- HBASE-15386 PREFETCH_BLOCKS_ON_OPEN in HColumnDescriptor is ignored

    Changes the prefetch TRACE-level loggings to include the word
    'Prefetch' in them so you know what they are about.

    Changes the cryptic logging of the CacheConfig#toString to have
    some preamble saying why and what column family is responsible.

- HBASE-15576 Scanning cursor to prevent blocking long time on
  ResultScanner.next()

    If you don't like scanning being blocked too long because of heartbeat
    and partial result, you can use Scan#setNeedCursorResult(true) to get a
    special result within scanning timeout setting time which will tell you
    where row the server is scanning. See its javadoc for more details.

- HBASE-15633 Backport HBASE-15507 to branch-1

    Adds update_peer_config to the HBase shell and ReplicationAdmin, and
    provides a callback for custom replication endpoints to be notified
    of changes to configuration and peer data

- HBASE-15686 Add override mechanism for the exempt classes when
  dynamically loading table coprocessor

    The
n
ew coprocessor table descriptor attribute,
    hbase.coprocessor.classloader.included.classes, is added. A user can
    specify class name prefixes (semicolon separated) which should be
    loaded by CoprocessorClassLoader.

- HBASE-15711 Add client side property to allow logging details for
  batch errors

    A new client side property hbase.client.log.batcherrors.details is
    introduced to allow logging the full stacktrace of exceptions for
    batch errors. It is disabled by default.

- HBASE-15816 Provide client with ability to set priority on Operations

    Added setPriority(int priority) API to Put, Delete, Increment, Append,
    Get and Scan. For all these ops, the user can provide a custom RPC
    priority level.

- HBASE-15924 Enhance hbase services autorestart capability to
  hbase-daemon.sh

    Now one can start hbase services with enabled "autostart/autorestart"
    feature in controlled fashion with the help of "--autostart-window-size"
    to define the window period and the "--autostart-window-retry-limit" to
    define the number of times the hbase services have to be restarted upon
    being killed/terminated abnormally within the provided window perioid.

    The following cases are supported with "autostart/autorestart":

    a) --autostart-window-size=0 and --autostart-window-retry-limit=0,
       indicates infinite window size and no retry limit
    b) not providing the args, will default to a)
    c) --autostart-window-size=0 and --autostart-window-retry-limit=
       <positive value> indicates the autostart process to bail out if the
       retry limit exceeds irrespective of window period
    d) --autostart-window-size=<x> and --autostart-window-retry-limit=<y>
       indicates the autostart process to bail out if the retry limit "y"
       is exceeded for the last window period "x".

- HBASE-15941 HBCK repair should not unsplit healthy splitted region

    A new option -removeParents is now available that will remove an old
    parent when two valid daughters for that parent exist and
    -fixHdfsOverlaps is used. If there is an issue trying to remove the
    parent from META or sidelining the parent from HDFS we will fallback to
    do a regular merge. For now this option only works when the overlap
    group consists only of 3 regions (a parent, daughter A and daughter B)

- HBASE-15950 Fix memstore size estimates to be tighter

    The estimates of heap usage by the memstore objects (KeyValue, object
    and array header sizes, etc) have been made more accurate for heap
    sizes up to 32G (using CompressedOops), resulting in them dropping by
    10-50% in practice. This also results in less number of flushes and
    compactions due to "fatter" flushes. As a result, the actual heap usage
    of the memstore before being flushed may increase by up to 100%. If
    configured memory limits for the region server had been tuned based on
    observed usage, this change could result in worse GC behavior or even
    OutOfMemory errors. Set the environment property (not hbase-site.xml)
    "hbase.memorylayout.use.unsafe" to false to disable.

- HBASE-15994 Allow selection of RpcSchedulers

    Adds a FifoRpcSchedulerFactory so you can try the FifoRpcScheduler by
    setting "hbase.region.server.rpc.scheduler.factory.class"

- HBASE-16052 Improve HBaseFsck Scalability

    Improves the performance and scalability of HBaseFsck, especially for
    large clusters with a small number of large tables.

    Searching for lingering reference files is now a multi-threaded
    operation.  Loading HDFS region directory information is now multi-
    threaded at the region-level instead of the table-level to maximize
    concurrency.  A performance bug in HBaseFsck that resulted in
    redundant I/O and RPCs was fixed by introducing a FileStatusFilter
    that filters FileStatus objects directly.

- HBASE-16213 A new HFileBlock structure for fast random get

    Introduces a new DataBlockEncoding in name of ROW_INDEX_V1, which
    could improve random read (get) performance especially when the
    average record size (key-value size per row) is small. To use this
    feature, please set DATA_BLOCK_ENCODING to ROW_INDEX_V1 for column
    family of newly created table, or change existing CF with below shell
    command:

      alter 'table_name',{NAME => 'cf', DATA_BLOCK_ENCODING =>
         'ROW_INDEX_V1'}.

    Please note that if we turn this DBE on, HFile block will be bigger
    than NONE encoding because it adds some metadata for binary search.

    Seek in row when random reading is one of the main consumers of CPU.
    This helps.

- HBASE-16244 LocalHBaseCluster start timeout should be configurable

    When LocalHBaseCluster is started from the command line the Master
    would give up after 30 seconds due to a hardcoded timeout meant for
    unit tests. This change allows the timeout to be configured via
    hbase-site as well as sets it to 5 minutes when LocalHBaseCluster
    is started from the command line.

- HBASE-16336 Removing peers seems to be leaving spare queues

    Add a ReplicationZKNodeCleaner periodic check and delete any useless
    replication queue belonging to a peer which does not exist.

- HBASE-16388 Prevent client threads being blocked by only one slow
  region server

    Adds a new configuration, hbase.client.perserver.requests.threshold,
    to limit the max number of concurrent request to one region server.
    If the user still create new request after reaching the limit,
    client will throw ServerTooBusyException and do not send the request
    to the server. This is a client side feature and can prevent client's
    threads being blocked by one slow region server resulting in the
    availability of client is much lower than the availability of region
    servers.

- HBASE-16540 Scan should do additional validation on start and stop row

    Scan#setStartRow() and Scan#setStopRow() now validate the argument
    passed for each row key.  If the length of the parameter passed
    exceeds Short.MAX_VALUE, an IllegalArgumentException will be thrown.

- HBASE-16584 Backport the new ipc implementation in HBASE-16432 to
  branch-1

    The netty dependency is upgraded to 4.1.1.Final. And also some
    configurations of the old AsyncRpcClient is gone. Such as
    "hbase.rpc.client.threads.max" and "hbase.rpc.client.nativetransport".

- HBASE-16653 Backport HBASE-11393 to all branches which support namespace

    During HBASE-11393, we have done two things:

      1.  unify tableCFs with peerConfig
      2.  Fix ns not support issue for replication.

    This issue is to backport it to branch-1

    How to rolling update if the replication peer have old table-cfs
    config? Due to we modify proto object of ReplicationPeerConfig (add
    tableCFs field), so when we do rolling update, we have to update
    original ReplicationPeerConfig data on ZK firstly.

      1. Make sure the master have the permission to modify replication
         peer znode.
      2. Disable the replication peer.
      3. Rolling update master first. The master will copy the table-cfs
         config from old table-cfs znode and add it to the new proto
         object of ReplicationPeerConfig.
      4. Rolling update regionservers.
      5. Enable the replication peer.

    If you can't change the replication peer znode permission, you can
    use the TableCFsUpdater tool to copy the table-cfs config.

      1. Disable the replication peer.
      2. bin/hbase
org.apache.hadoop.hbase.replication.master.TableCFsUpdater
         update
      3. Rolling update master and regionservers.
      4. Enable the replication peer.

- HBASE-16672 Add option for bulk load to always copy hfile(s) instead of
  renaming

    This issue adds a config, always.copy.files, to LoadIncrementalHFiles.
    When set to true, source hfiles would be copied. Meaning source hfiles
    would be kept after bulk load is done. Default value is false.

- HBASE-16698] Performance issue: handlers stuck waiting for CountDownLatch
  inside WALKey#getWriteEntry under high writing workload

    Assign sequenceid to an edit before we go on the ringbuffer; undoes
    contention on WALKey latch. Adds a new config
    "hbase.hregion.mvcc.preassign" which defaults to true: i.e. this speedup
    is enabled.

- HBASE-16755 Honor flush policy under global memstore pressure

    Prior to this change, when the memstore low water mark is exceeded on a
    regionserver, the regionserver will force flush all stores on the
    regions selected for flushing until we drop below the low water mark.
    With this change, the regionserver will continue to force flush regions
    when above the memstore low water mark, but will only flush the stores
    returned by the configured FlushPolicy.

- HBASE-16993 BucketCache throw java.io.IOException: Invalid HFile block
  magic when configuring hbase.bucketcache.bucket.sizes

    Any value for hbase.bucketcache.bucket.sizes configuration must be a
    multiple of 256.  If that is not the case, instantiation of L2 Bucket
    cache itself will fail throwing IllegalArgumentException.

- HBASE-17112 Prevent setting timestamp of delta operations the same as
  previous value

    Before this issue, two concurrent Increments/Appends done in same
    millisecond or RS's clock going back will result in two results having
    same TS, which is not friendly to versioning and will get wrong result
    in sink cluster if the replication is disordered. After this issue, the
    result of Increment/Append will always have an incremental TS. There is
    no longer any inconsistency in replication for these operations.

- HBASE-17178 Add region balance throttling

    Add region balance throttling. Master execute every region balance plan
    per balance interval, which is equals to divide max balancing time by
    the size of region balance plan. And Introduce a new config
    hbase.master.balancer.maxRitPercent to protect availability. If config
    this to 0.01, then the max percent of regions in transition is 1% when
    balancing. Then the cluster's availability is at least 99% when
    balancing.

- HBASE-17280 Add mechanism to control cleaner chore behavior

    The HBase cleaner chore process cleans up old WAL files and archived
    HFiles. Cleaner operation can affect query performance when running
    heavy workloads, so disable the cleaner during peak hours. The cleaner
    has the following HBase shell commands:

    - cleaner_chore_enabled: Queries whether cleaner chore is enabled/
      disabled.
    - cleaner_chore_run: Manually runs the cleaner to remove files.
    - cleaner_chore_switch: enables or disables the cleaner and returns
      the previous state of the cleaner. For example, cleaner-switch true
      enables the cleaner.

    Following APIs are added in Admin:

    - setCleanerChoreRunning(boolean on): Enable/Disable the cleaner chore
    - runCleanerChore(): Ask for cleaner chore to run
    - isCleanerChoreEnabled(): Query whether cleaner chore is enabled/
      disabled.

- HBASE-17296 Provide per peer throttling for replication

    Provide per peer throttling for replication. Add the bandwidth upper
    limit to ReplicationPeerConfig and a new shell cmd set_peer_bandwidth
    to update the bandwidth as needed.

- HBASE-17426 Inconsistent environment variable names for enabling JMX

    In bin/hbase-config.sh, if value for HBASE_JMX_BASE is empty, keep
    current behavior. If HBASE_JMX_OPTS is not empty, keep current
    behavior. Otherwise use the value of HBASE_JMX_BASE

- HBASE-17437 Support specifying a WAL directory outside of the root
  directory

    This patch adds support for specifying a WAL directory outside of the
    HBase root directory.

    Multiple configuration variables were added to accomplish this:

    hbase.wal.dir: used to configure where the root WAL directory is
    located. Could be on a different FileSystem than the root directory. WAL
    directory can not be set to a subdirectory of the root directory. The
    default value of this is the root directory if unset.

    hbase.rootdir.perms: Configures FileSystem permissions to set on the
    root directory. This is '700' by default.

    hbase.wal.dir.perms: Configures FileSystem permissions to set on the WAL
    directory FileSystem. This is '700' by default.

- HBASE-17472 Correct the semantic of permission grant

    Before this patch, later granted permissions will override previous
    granted permissions, and previous granted permissions will be lost. This
    issue re-defines the grant semantic: for master branch, later granted
    permissions will merge with previous granted permissions. For
    branch-1.4, grant keep override behavior for compatibility purpose, and
    a grant with mergeExistingPermission flag provided.

- HBASE-17508 Unify the implementation of small scan and regular scan for
  sync client

    Now the scan.setSmall method is deprecated. Consider using scan.setLimit
    and scan.setReadType in the future. And we will open scanner lazily when
    you call scanner.next. This is an incompatible change which delays the
    table existence check and permission check.

- HBASE-17578 Thrift per-method metrics should still update in the case of
  exceptions

    In prior versions, the HBase Thrift handlers failed to increment per-
    method metrics when an exception was encountered.  These metrics will
    now always be incremented, whether an exception is encountered or not.
    This change also adds exception-type metrics, similar to those exposed
    in regionservers, for individual exceptions which are received by the
    Thrift handlers.

- HBASE-17583 Add inclusive/exclusive support for startRow and endRow of
  scan for sync client

    Now you can include or exclude the startRow and stopRow for a scan. The
    new methods to specify startRow and stopRow are withStartRow and
    withStopRow. The old methods to specify startRow and Row(include
    constructors) are marked as deprecated as in the old time if startRow
    and stopRow are equal then we will consider it as a get scan and include
    the stopRow implicitly. This is strange after we can set inclusiveness
    explicitly so we add new methods and depredate the old methods. The
    deprecated methods will be removed in the future.

- HBASE-17584 Expose ScanMetrics with ResultScanner rather than Scan

    Now you can use ResultScanner.getScanMetrics to get the scan metrics at
    any time during the scan operation. The old Scan.getScanMetrics is
    deprecated and still work, but if you use ResultScanner.getScanMetrics
    to get the scan metrics and reset it, then the metrics published to the
    Scan instaince will be messed up.

- HBASE-17599 Use mayHaveMoreCellsInRow instead of isPartial

    The word 'isPartial' is ambiguous so we introduce a new method
    'mayHaveMoreCellsInRow' to replace it. And the old meaning of
    'isPartial' is not the same with 'mayHaveMoreCellsInRow' as for batched
    scan, if the number of returned cells equals to the batch, isPartial
    will be false. After this change the meaning of 'isPartial' will be same
    with 'mayHaveMoreCellsInRow'. This is an incompatible change but it is
    not likely to break a lot of things as for batched scan the old
    'isPartial' is just a redundant information, i.e, if the number of
    returned cells reaches the batch limit. You have already know the number
    of returned cells and the value of batch.

- HBASE-17737 Thrift2 proxy should support scan timeRange per column
  family

    Thrift2 proxy now supports scan timeRange per column family.

- HBASE-17757 Unify blocksize after encoding to decrease memory fragment

    Blocksize is set in columnfamily's atrributes. It is used to control
    block sizes when generating blocks. But, it doesn't take encoding into
    count. If you set encoding to blocks, after encoding, the block size
    varies. Since blocks will be cached in memory after encoding (default),
    it will cause memory fragment if using blockcache, or decrease the pool
    efficiency if using bucketCache. This issue introduced a new config
    named 'hbase.writer.unified.encoded.blocksize.ratio'. The default value
    of this config is 1, meaning doing nothing. If this value is set to a
    smaller value like 0.5, and the blocksize is set to 64KB (default value
    of blocksize). It will unify the blocksize after encoding to 64KB * 0.5
    = 32KB. Unified blocksize will relieve the memory problems mentioned
    above.

- HBASE-17817 Make Regionservers log which tables it removed coprocessors
  from when aborting

    Adds table name to exception logging when a coprocessor is removed from
    a table by the region server.

- HBASE-17861 Regionserver down when checking the permission of staging dir
  if hbase.rootdir is on S3

    Some object store does not support unix style permission. This fixes the
    permission check issue when specify staging dir in different file
    system. Currently it covers s3, wasb, swift.

- HBASE-17877 Improve HBase's byte[] comparator

    Updated the lexicographic byte array comparator to use a slightly more
    optimized version similar to the one available in the guava library that
    compares only the first index where left[index] != right[index]. The
    comparator also returns the diff directly instead of mapping it to -1,
    0, +1 range as was being done in the earlier version. We have seen
    significant performance gains, calculated in terms of throughput (ops/
    ms) with these changes ranging from approx 20% for smaller byte arrays
    up to 200 bytes and almost 100% for large byte array sizes that are in
    few KBs. We benchmarked with up to 16KB arrays and the general trend
    indicates that the performance improvement increases as the size of the
    byte array increases.

- HBASE-17956 Raw scan should ignore TTL

    Now raw scan can also read expired cells.

- HBASE-18023 Log multi-requests for more than threshold number of rows

    Introduces a warning message in the RegionServer log when an RPC is
    received from a client that has more than 5000 "actions" (where an
    "action" is a collection of mutations for a specific row) in a single
    RPC. Misbehaving clients who send large RPCs to RegionServers can be
    malicious, causing temporary pauses via garbage collection or denial of
    service via crashes. The threshold of 5000 actions per RPC is defined by
    the property "hbase.rpc.rows.warning.threshold" in hbase-site.xml.

- HBASE-18090 Improve TableSnapshotInputFormat to allow multiple mappers
  per region

    In this task, we make it possible to run multiple mappers per region in
    the table snapshot.

- HBASE-18122 Scanner id should include ServerName of region server

    The first 32 bits are MurmurHash32 of ServerName string "host,port,ts".
    The ServerName contains both host, port, and start timestamp so it can
    prevent collision. The lowest 32bit is generated by atomic int.

- HBASE-18149 The setting rules for table-scope attributes and family-scope
  attributes should keep consistent

    If the table-scope attributes value is false, you need not to enclose
    'false' in single quotation. Both COMPACTION_ENABLED => false and
    COMPACTION_ENABLED => 'false' will take effect.

- HBASE-18226 Disable reverse DNS lookup at HMaster and use the hostname
  provided by RegionServer

    The following config is added:

    
    hbase.regionserver.hostname.disable.master.reversedns

    This config is for experts: don't set its value unless you really know
    what you are doing. When set to true, regionserver will use the current
    node hostname for the servername and HMaster will skip reverse DNS
    lookup and use the hostname sent by regionserver instead. Note that this
    config and hbase.regionserver.hostname are mutually exclusive. See
    https://issues.apache.org/jira/browse/HBASE-18226 for more details.

    Caution: please make sure rolling upgrade succeeds before turning on
    this feature.

- HBASE-18247 Hbck to fix the case that replica region shows as key in the
  meta table

    The hbck tool can now correct the meta table should it get an entry for
    a read replica region.

- HBASE-18374 RegionServer Metrics improvements

    This change adds the latency metrics checkAndPut, checkAndDelete,
    putBatch and deleteBatch . Also the previous regionserver "mutate"
    latency metrics are renamed to "put" metrics. Batch metrics capture the
    latency of the entire batch containing put/delete whereas put/delete
    metrics capture latency per operation. Note this change will break
    existing monitoring based on regionserver "mutate" latency metric.

- HBASE-18520 Add jmx value to determine true Master Start time

    Adds a JMX value to track when the Master has finished initializing.
    The jmx config is 'masterFinishedInitializationTime' and details the
    time in millis that the Master is fully usable and ready to serve
    requests.

- HBASE-18533 Expose BucketCache values to be configured

    This patch exposes configuration for Bucketcache. These configs are very
    similar to those for the LRU cache, but are described below:

    "hbase.bucketcache.single.factor"; /** Single access bucket size */
    "hbase.bucketcache.multi.factor"; /** Multiple access bucket size */
    "hbase.bucketcache.memory.factor"; /** In-memory bucket size */
    "hbase.bucketcache.extrafreefactor"; /** Free this floating point
      factor of extra blocks when evicting. For example free the number of
      blocks requested * (1 + extraFreeFactor) */
    "hbase.bucketcache.acceptfactor"; /** Acceptable size of cache (no
       evictions if size < acceptable) */
    "hbase.bucketcache.minfactor"; /** Minimum threshold of cache (when
       evicting, evict until size < min) */

- HBASE-18675 Making {max,min}SessionTimeout configurable for
  MiniZooKeeperCluster

    Standalone clusters and minicluster instances can now configure the
    session timeout for our embedded ZooKeeper quorum using
    "hbase.zookeeper.property.minSessionTimeout" and
    "hbase.zookeeper.property.maxSessionTimeout".

- HBASE-18786 FileNotFoundException should not be silently handled for
  primary region replicas

    FileNotFoundException opening a StoreFile in a primary replica now
    causes a RegionServer to crash out where before it would be ignored (or
    optionally handled via close/reopen).

- HBASE-18993 Backport patches in HBASE-18410 to branch-1.x branches

    This change fixes bugs in FilterList, and also does a code refactor
     which ensures interface compatibility.

    The primary bug fixes are :

    1. For sub-filter in FilterList with MUST_PASS_ONE, if previous
       filterKeyValue() of sub-filter returns NEXT_COL, we cannot make sure
       that the next cell will be the first cell in next column, because
       FilterList choose the minimal forward step among sub-filters, and it
       may return a SKIP. so here we add an extra check to ensure that the
       next cell will match previous return code for sub-filters.

    2. Previous logic about transforming cell of FilterList is incorrect, we
       should set the previous transform result (rather than the given cell
       in question) as the initial value of transform cell before call
       filterKeyValue() of FilterList.

    3. Handle the ReturnCodes which the previous code did not handle.

    About code refactor, we divided the FilterList into two separated sub-
    classes: FilterListWithOR and FilterListWithAND, The FilterListWithOR
    has been optimized to choose the next minimal step to seek cell rather
    than SKIP cell one by one, and the FilterListWithAND has been optimized
    to choose the next maximal key to seek among sub-filters in filter list.
    All in all, The code in FilterList is clean and easier to follow now.

    Note that ReturnCode NEXT_ROW has been redefined as skipping to next
    row in current family, not to next row in all family. it’s more
    reasonable, because ReturnCode is a concept in store level, not in
    region level.

- HBASE-19035 Miss metrics when coprocessor use region scanner to read
  data

    Move read requests count to region level. Because RegionScanner is
    exposed to CP. Update write requests count in processRowsWithLocks.
    Remove requestRowActionCount in RSRpcServices. This metric can be
    computed by region's readRequestsCount and writeRequestsCount.

- HBASE-19051 Add new split algorithm for num string

    Add new split algorithm DecimalStringSplit，row are decimal-encoded long
    values in the range "00000000" => "99999999".

- HBASE-19131 Add the ClusterStatus hook and cleanup other hooks which can
  be replaced by ClusterStatus hook**

    1) Add preGetClusterStatus() and postGetClusterStatus() hooks
    2) add preGetClusterStatus() to access control check - an admin action

- HBASE-19144 Retry assignments in FAILED_OPEN state when servers (re)join
  the cluster

     When regionserver placement groups (RSGroups) is active, as servers
     join the cluster the Master will attempt to reassign regions in
     FAILED_OPEN state.

- HBASE-19419 Remove hbase-native-client from branch-1

    Removed the hbase-native-client module from branch-1 (it is still in
    Master). It is not complete. Look for a finished C++ client in the near
    future which may be backported to branch-1 at that point.

---
Cheers,
The HBase Dev Team

[ANNOUNCE] Apache HBase 1.4.0 is now available for download

Reply via email to