[
https://issues.apache.org/jira/browse/HADOOP-9565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303336#comment-14303336
]
Steve Loughran commented on HADOOP-9565:
----------------------------------------
h3. Semantics directly off {{FileContext}} and {{FileSystem}}?
# Having a clear separation between object store and FS tells the world that if
something doesn't say {{extends ObjectStore}} then its an FS with all the
normal expectations of consistency, atomicity, durability, etc. Having that
extra subclass can exist to warn that something may be wrong.
# making {{getSemantics()}} abstract forces everyone to look at what their
semantics really are and declare them, rather than take a possibly incorrect
default. (we couldn't make it abstract and would have to default to POSIX)
That said:
# those object stores that can replace HDFS are effectively filesystems. The
{{ObjectStore}} extension would then only be needed if/when we added more
features (e.g PUT?)
# having it everywhere makes it easier to chain filesystems together; some
wrapper FS client (like a performance counter) could relay the probe without
caring about FS type; callers would know it is there too.
# we could add something alongside querying capabilities. Today we have
filesystems that don't support append (checksum FS), seek on streams (FTP),
truncate, extended attributes, encryption flags, etc. There's no cue that they
are missing other than exceptions when you try to use them.
I do fear that trying to add semantics and feature flags to the FS API itself
is going to prove more controversial. We could start with ObjectStore and then
decide whether to pull up at a later date.
h3. enumset vs bitmask?
Bitmask.
It's easier to manipulate during chaining. Something like Netflix S3mper
injects consistency atop s3, so could do
{code}
long getSemantics() {
return inner.getSemantics() | STORE_CONSISTENCY_COMPLETE;
}
{code}
or —and this is the hard one in enumset —, something that removed a feature
{code}
long getSemantics() {
return inner.getSemantics() & ! CONSISTENT_CREATE ;
}
{code}
we could also use the operation in reporting error messages, such as
highlighting which requirements weren't met in the exception text:
{code}
long s = store.getSemantics();
if ( (s & STORE_POSIX_WRITE_SEMANTICS) != STORE_POSIX_WRITE_SEMANTICS) {
throw new IOException("Missing semantics:" + ( s &
STORE_POSIX_WRITE_SEMANTICS) + " see
https://wiki.apache.org/hadoop/ObjectStore");
}
{code}
Where it really excels though, is the fact that a numeric value can be defined
in a hadoop configuration XML. As a hex value.
Thus someone could say
{code}
<property>
<name>fs.s3a.semantics</name>
<value>0x0f</value>
</property>
{code}
I think we will need precisely that for S3 clients, because some S3-API
endpoints (e.g what Amplidata are doing) do offer stricter semantics, and even
amazon itself varies between "nothing", 0x0 on US-East, to create, 0x01 ,
everywhere else.
The only way we could let people configure it in the XML file is to use
integers, ideally with the values (including common aggregated values) listed
somewhere. The javadocs will do this, —automatically for the decimal values,
manually for the hex ones if we add that (I've postponed it until the patch is
ready & the values are fixed)
Therefore while I agree with anyone who thinks it is a low-level C/C++ view of
the world, in the hands of the competent, it is more powerful than the Java
work that tries to wrap it all in set theory.
> Add a Blobstore interface to add to blobstore FileSystems
> ---------------------------------------------------------
>
> Key: HADOOP-9565
> URL: https://issues.apache.org/jira/browse/HADOOP-9565
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs, fs/s3, fs/swift
> Affects Versions: 2.6.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-9565-001.patch, HADOOP-9565-002.patch,
> HADOOP-9565-003.patch
>
>
> We can make the fact that some {{FileSystem}} implementations are really
> blobstores, with different atomicity and consistency guarantees, by adding a
> {{Blobstore}} interface to add to them.
> This could also be a place to add a {{Copy(Path,Path)}} method, assuming that
> all blobstores implement at server-side copy operation as a substitute for
> rename.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)