Mingliang Liu created HADOOP-16355:
--------------------------------------
Summary: ZookeeperMetadataStore: Use Zookeeper as S3Guard backend
store
Key: HADOOP-16355
URL: https://issues.apache.org/jira/browse/HADOOP-16355
Project: Hadoop Common
Issue Type: Sub-task
Components: fs
Reporter: Mingliang Liu
When S3Guard was proposed, there are a couple of valid reasons to choose
DynamoDB as its default backend store: 0) seamless integration as part of AWS
ecosystem e.g. client library 1) it's a managed web service which is zero
operational cost, highly available and infinitely scalable 2) it's performant
with single digit latency 3) it's proven by Netflix's S3mper (not actively
maintained) and EMRFS (closed source and usage). As it's pluggable, it's
possible to implement {{MetadataStore}} with other backend store without
changing semantics, besides null and in-memory local ones.
Here we propose {{ZookeeperMetadataStore}} which uses Zookeeper as S3Guard
backend store. Its main motivation is to provide a new MetadataStore option
which:
# can be easily integrated as Zookeeper is heavily used in Hadoop community
# affordable performance as both client and Zookeeper ensemble are usually
"local" in a Hadoop cluster (ZK/HBase/Hive etc)
# removes DynamoDB dependency
Obviously all use cases will not prefer this to default DynamoDB store. For
e.g. ZK might not scale well if there are dozens of S3 buckets and each has
millions of objects.
Our use case is targeting HBase to store HFiles on S3 instead of HDFS. A total
solution for HBase on S3 must be HBOSS (see HBASE-22149) for recovering
atomicity of metadata operations like rename, and S3Guard for consistent
enumeration and access to object store bucket metadata. We would like to use
Zookeeper as backend store for both.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]