[jira] [Updated] (HADOOP-16355) ZookeeperMetadataStore: Use Zookeeper as S3Guard backend store

Mingliang Liu (JIRA) Fri, 07 Jun 2019 11:31:37 -0700


     [ 
https://issues.apache.org/jira/browse/HADOOP-16355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mingliang Liu updated HADOOP-16355:
-----------------------------------
    Description: 
When S3Guard was proposed, there are a couple of valid reasons to choose 
DynamoDB as its default backend store: 0) seamless integration as part of AWS 
ecosystem e.g. client library 1) it's a managed web service which is zero 
operational cost, highly available and infinitely scalable 2) it's performant 
with single digit latency 3) it's proven by Netflix's S3mper (not actively 
maintained) and EMRFS (closed source and usage). As it's pluggable, it's 
possible to implement {{MetadataStore}} with other backend store without 
changing semantics, besides null and in-memory local ones.

Here we propose {{ZookeeperMetadataStore}} which uses Zookeeper as S3Guard 
backend store. Its main motivation is to provide a new MetadataStore option 
which:
 # can be easily integrated as Zookeeper is heavily used in Hadoop community
 # affordable performance as both client and Zookeeper ensemble are usually 
"local" in a Hadoop cluster (ZK/HBase/Hive etc)
 # removes DynamoDB dependency

Obviously all use cases will not prefer this to default DynamoDB store. For 
e.g. ZK might not scale well if there are dozens of S3 buckets and each has 
millions of objects. Our use case is targeting HBase to store HFiles on S3 
instead of HDFS. A total solution for HBase on S3 must be HBOSS (see 
HBASE-22149) for recovering atomicity of metadata operations like rename, and 
S3Guard for consistent enumeration and access to object store bucket metadata. 
We would like to use Zookeeper as backend store for both.

  was:
When S3Guard was proposed, there are a couple of valid reasons to choose 
DynamoDB as its default backend store: 0) seamless integration as part of AWS 
ecosystem e.g. client library 1) it's a managed web service which is zero 
operational cost, highly available and infinitely scalable 2) it's performant 
with single digit latency 3) it's proven by Netflix's S3mper (not actively 
maintained) and EMRFS (closed source and usage). As it's pluggable, it's 
possible to implement {{MetadataStore}} with other backend store without 
changing semantics, besides null and in-memory local ones.

Here we propose {{ZookeeperMetadataStore}} which uses Zookeeper as S3Guard 
backend store. Its main motivation is to provide a new MetadataStore option 
which:
 # can be easily integrated as Zookeeper is heavily used in Hadoop community
 # affordable performance as both client and Zookeeper ensemble are usually 
"local" in a Hadoop cluster (ZK/HBase/Hive etc)
 # removes DynamoDB dependency

Obviously all use cases will not prefer this to default DynamoDB store. For 
e.g. ZK might not scale well if there are dozens of S3 buckets and each has 
millions of objects.

Our use case is targeting HBase to store HFiles on S3 instead of HDFS. A total 
solution for HBase on S3 must be HBOSS (see HBASE-22149) for recovering 
atomicity of metadata operations like rename, and S3Guard for consistent 
enumeration and access to object store bucket metadata. We would like to use 
Zookeeper as backend store for both.


> ZookeeperMetadataStore: Use Zookeeper as S3Guard backend store
> --------------------------------------------------------------
>
>                 Key: HADOOP-16355
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16355
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>            Reporter: Mingliang Liu
>            Priority: Major
>
> When S3Guard was proposed, there are a couple of valid reasons to choose 
> DynamoDB as its default backend store: 0) seamless integration as part of AWS 
> ecosystem e.g. client library 1) it's a managed web service which is zero 
> operational cost, highly available and infinitely scalable 2) it's performant 
> with single digit latency 3) it's proven by Netflix's S3mper (not actively 
> maintained) and EMRFS (closed source and usage). As it's pluggable, it's 
> possible to implement {{MetadataStore}} with other backend store without 
> changing semantics, besides null and in-memory local ones.
> Here we propose {{ZookeeperMetadataStore}} which uses Zookeeper as S3Guard 
> backend store. Its main motivation is to provide a new MetadataStore option 
> which:
>  # can be easily integrated as Zookeeper is heavily used in Hadoop community
>  # affordable performance as both client and Zookeeper ensemble are usually 
> "local" in a Hadoop cluster (ZK/HBase/Hive etc)
>  # removes DynamoDB dependency
> Obviously all use cases will not prefer this to default DynamoDB store. For 
> e.g. ZK might not scale well if there are dozens of S3 buckets and each has 
> millions of objects. Our use case is targeting HBase to store HFiles on S3 
> instead of HDFS. A total solution for HBase on S3 must be HBOSS (see 
> HBASE-22149) for recovering atomicity of metadata operations like rename, and 
> S3Guard for consistent enumeration and access to object store bucket 
> metadata. We would like to use Zookeeper as backend store for both.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-16355) ZookeeperMetadataStore: Use Zookeeper as S3Guard backend store

Reply via email to