On 15 May 2018, at 17:30, Thomas Marquardt 
<tm...@microsoft.com<mailto:tm...@microsoft.com>> wrote:

A feature branch seems reasonable to me too.  Note that the WASB connector will 
continue to exist, and live side-by-side with the new Azure Blob Filesystem 
(ABFS) connector.  We will encourage users to move to the new ABFS connector, 
and all of our new feature and performance improvements will target the ABFS 
connector.  ABFS will perform better at no additional cost, so I expect current 
users to migrate in time.  The two connectors are compatible for mainline 
scenarios, but there are some uncommon features in WASB that we chose not to 
carry over in the initial implementation.

So we hope ABFS will replace the usage of WASB, but the WASB connector itself 
will continue to exist.  Maybe we can remove WASB in the future some day, if 
nobody is using it.


migration strategies of connectors are interesting.

When the new S3 connector was first proposed (HADOOP-10400) we opted for a new 
name, "s3a" to allow things to go side-by-side until we were happy. For Hadoop 
2.6-2.7, this worked well, as stuff stabilised. Now things are good we've cut 
it from branch-3 entirely, with a stub entry telling people to migrate 
(HADOOP-14738). It's needed so that if anyone explicitly declares a mapping of 
schema -> FS (as people do in Spark, more for superstition than need), they'll 
get a better message than Class not found.

We could have tried to silently forward to the S3A FS, but that adds two issues
* configuration options are all different
* it gets confusing when you return URLs from listings.

The strategy taken stops the switch being magic, but does seem to work. It also 
has a nice little side effect: if ever someone files a bugrep with an s3n:// 
URL, we know to close it as invalid, or at least say "move to s3a then retry". 
If the schema had stayed the same, you'd need to know the actual version number 
of hadoop underneath to know whether this was with current or removed code.

given the filesystems will work with two different service endpoints, they 
should be isolated. We will need to keep the work on WASB alive though, if not 
for new features, but for: security, regression tests & bug fixes.


I can confirm that nobody ever gets seek() right. :)

that and rename(), obviously, —though the fact that nobody knows what rename() 
is meant to do makes that it's easy for all to argue their interpretation is 
correct. Certainly I do

-Steve

Reply via email to