[ 
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303336#comment-14303336
 ] 

Steve Loughran commented on HADOOP-9565:
----------------------------------------

h3. Semantics directly off {{FileContext}} and {{FileSystem}}?

# Having a clear separation between object store and FS tells the world that if 
something doesn't say {{extends ObjectStore}} then its an FS with all the 
normal expectations of consistency, atomicity, durability, etc. Having that 
extra subclass can exist to warn that something may be wrong. 
# making {{getSemantics()}} abstract forces everyone to look at what their 
semantics really are and declare them, rather than take a possibly incorrect 
default. (we couldn't make it abstract and would have to default to POSIX)

That said:

# those object stores that can replace HDFS are effectively filesystems. The 
{{ObjectStore}} extension would then only be needed if/when we added more 
features (e.g PUT?)
# having it everywhere makes it easier to chain filesystems together; some 
wrapper FS client (like a performance counter) could relay the probe without 
caring about FS type; callers would know it is there too.
# we could add something alongside querying capabilities. Today we have 
filesystems that don't support append (checksum FS), seek on streams (FTP), 
truncate, extended attributes, encryption flags, etc. There's no cue that they 
are missing other than exceptions when you try to use them.

I do fear that trying to add semantics and feature flags to the FS API itself 
is going to prove more controversial. We could start with ObjectStore and then 
decide whether to pull up at a later date.

h3. enumset vs bitmask?

Bitmask.

It's easier to manipulate during chaining. Something like Netflix S3mper 
injects consistency atop s3, so could do

{code}
long getSemantics() {
  return inner.getSemantics() | STORE_CONSISTENCY_COMPLETE;
}
{code}

or —and this is the hard one in enumset —, something that removed a feature
{code}
long getSemantics() {
  return inner.getSemantics()  & ! CONSISTENT_CREATE ;
}
{code}

we could also use the operation in reporting error messages, such as 
highlighting which requirements weren't met in the exception text:
{code}
long s = store.getSemantics();
if ( (s & STORE_POSIX_WRITE_SEMANTICS) != STORE_POSIX_WRITE_SEMANTICS) {
  throw new IOException("Missing semantics:" + ( s & 
STORE_POSIX_WRITE_SEMANTICS) + " see 
https://wiki.apache.org/hadoop/ObjectStore";);
} 
{code}

Where it really excels though, is the fact that a numeric value can be defined 
in a hadoop configuration XML. As a hex value. 

Thus someone could say
{code}
 <property>
  <name>fs.s3a.semantics</name>
  <value>0x0f</value>
</property>
{code}

I think we will need precisely that for S3 clients, because some S3-API 
endpoints (e.g what Amplidata are doing) do offer stricter semantics, and even 
amazon itself varies between "nothing", 0x0 on US-East, to create, 0x01 , 
everywhere else.

The only way we could let people configure it in the XML file is to use 
integers, ideally with the values (including common aggregated values) listed 
somewhere. The javadocs will do this,  —automatically for the decimal values, 
manually for the hex ones if we add that (I've postponed it until the patch is 
ready & the values are fixed)

Therefore while I agree with anyone who thinks it is a low-level C/C++ view of 
the world, in the hands of the competent, it is more powerful than the Java 
work that tries to wrap it all in set theory.


> Add a Blobstore interface to add to blobstore FileSystems
> ---------------------------------------------------------
>
>                 Key: HADOOP-9565
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9565
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs, fs/s3, fs/swift
>    Affects Versions: 2.6.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch, 
> HADOOP-9565-003.patch
>
>
> We can make the fact that some {{FileSystem}} implementations are really 
> blobstores, with different atomicity and consistency guarantees, by adding a 
> {{Blobstore}} interface to add to them. 
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that 
> all blobstores implement at server-side copy operation as a substitute for 
> rename.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to