[
https://issues.apache.org/jira/browse/HADOOP-12666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Douglas updated HADOOP-12666:
-----------------------------------
Attachment: HADOOP-12666-012.patch
CachedRefreshTokenBasedAccessTokenProvider
- Since the AccessTokenProvider is only created by reflection, the Timer cstr
is for testing and does not require an override in this subclass
- The static instance should be final and created during class initialization,
but...
- {{ConfRefreshTokenBasedAccessTokenProvider}} is not threadsafe. {{setConf}}
will update the static instance without synchronization, which is shared by
every instance of {{CachedRTBATP}}. This could cause undefined behavior. The
intent is to be to pool clients with the same parameters? Would it make sense
to add a small cache (v12)?
PrivateCachedRefreshTokenBasedAccessTokenProvider
- The override doesn't seem to serve a purpose. Since it's a workaround, adding
audience/visibility annotations (HADOOP-5073) would emphasize that this is
temporary.
PrivateAzureDataLakeFileSystem
- catching {{ArrayIndexOutOfBoundsException}} instead of performing proper
bounds checking in {{BufferManager::get}} is not efficient:
{code:title=PrivateAzureDataLakeFileSystem.java}
synchronized (BufferManager.getLock()) {
if (bm.hasData(fsPath.toString(), fileOffset, len)) {
try {
bm.get(data, fileOffset);
validDataHoldingSize = data.length;
currentFileOffset = fileOffset;
} catch (ArrayIndexOutOfBoundsException e) {
fetchDataOverNetwork = true;
}
} else {
fetchDataOverNetwork = true;
}
}
{code}
{code:title=BufferManager.java}
void get(byte[] data, long offset) {
System.arraycopy(buffer.data, (int) (offset - buffer.offset), data, 0,
data.length);
}
{code}
The BufferManager/PrivateAzureDataLakeFileSystem synchronization is unorthodox,
and verifying its correctness is not straightforward. Layering that complexity
on top of the readahead logic without simplifying abstractions makes it very
difficult to review. I hope subsequent revisions will replace this code with a
clearer model, because the current code will be very difficult to maintain.
> Support Microsoft Azure Data Lake - as a file system in Hadoop
> --------------------------------------------------------------
>
> Key: HADOOP-12666
> URL: https://issues.apache.org/jira/browse/HADOOP-12666
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs, fs/azure, tools
> Reporter: Vishwajeet Dusane
> Assignee: Vishwajeet Dusane
> Attachments: Create_Read_Hadoop_Adl_Store_Semantics.pdf,
> HADOOP-12666-002.patch, HADOOP-12666-003.patch, HADOOP-12666-004.patch,
> HADOOP-12666-005.patch, HADOOP-12666-006.patch, HADOOP-12666-007.patch,
> HADOOP-12666-008.patch, HADOOP-12666-009.patch, HADOOP-12666-010.patch,
> HADOOP-12666-011.patch, HADOOP-12666-012.patch, HADOOP-12666-1.patch
>
> Original Estimate: 336h
> Time Spent: 336h
> Remaining Estimate: 0h
>
> h2. Description
> This JIRA describes a new file system implementation for accessing Microsoft
> Azure Data Lake Store (ADL) from within Hadoop. This would enable existing
> Hadoop applications such has MR, HIVE, Hbase etc.., to use ADL store as
> input or output.
>
> ADL is ultra-high capacity, Optimized for massive throughput with rich
> management and security features. More details available at
> https://azure.microsoft.com/en-us/services/data-lake-store/
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]