jackjlli commented on a change in pull request #6531:
URL: https://github.com/apache/incubator-pinot/pull/6531#discussion_r572487116
##########
File path: pom.xml
##########
@@ -142,7 +142,7 @@
<dropwizard-metrics.version>4.1.2</dropwizard-metrics.version>
<snappy-java.version>1.1.1.7</snappy-java.version>
<log4j.version>2.11.2</log4j.version>
- <netty.version>4.1.42.Final</netty.version>
+ <netty.version>4.1.54.Final</netty.version>
Review comment:
Not sure whether it is possible that we stick to the current netty
version and jackson version here, as any of these version bump could make big
effect on the whole pinot cluster; it could bring in backward incompatibility
between different pinot components during deployment.
##########
File path:
pinot-plugins/pinot-file-system/pinot-adls/src/main/java/org/apache/pinot/plugin/filesystem/ADLSGen2PinotFS.java
##########
@@ -106,24 +118,75 @@ public void init(PinotConfiguration config) {
// TODO: consider to add the encryption of the following config
String accessKey = config.getProperty(ACCESS_KEY);
String fileSystemName = config.getProperty(FILE_SYSTEM_NAME);
+ String clientId = config.getProperty(CLIENT_ID);
+ String clientSecret = config.getProperty(CLIENT_SECRET);
+ String tenantId = config.getProperty(TENANT_ID);
String dfsServiceEndpointUrl = HTTPS_URL_PREFIX + accountName +
AZURE_STORAGE_DNS_SUFFIX;
String blobServiceEndpointUrl = HTTPS_URL_PREFIX + accountName +
AZURE_BLOB_DNS_SUFFIX;
- StorageSharedKeyCredential sharedKeyCredential = new
StorageSharedKeyCredential(accountName, accessKey);
+ DataLakeServiceClientBuilder dataLakeServiceClientBuilder = new
DataLakeServiceClientBuilder().endpoint(dfsServiceEndpointUrl);
+ BlobServiceClientBuilder blobServiceClientBuilder = new
BlobServiceClientBuilder().endpoint(blobServiceEndpointUrl);
+
+ if (accountName!= null && accessKey != null) {
+ LOGGER.info("Authenticating using the access key to the account.");
+ StorageSharedKeyCredential sharedKeyCredential = new
StorageSharedKeyCredential(accountName, accessKey);
+ dataLakeServiceClientBuilder.credential(sharedKeyCredential);
+ blobServiceClientBuilder.credential(sharedKeyCredential);
+ } else if (clientId != null && clientSecret != null && tenantId != null) {
+ LOGGER.info("Authenticating using Azure Active Directory");
+ ClientSecretCredential clientSecretCredential = new
ClientSecretCredentialBuilder()
+ .clientId(clientId)
+ .clientSecret(clientSecret)
+ .tenantId(tenantId)
+ .build();
+ dataLakeServiceClientBuilder.credential(clientSecretCredential);
+ blobServiceClientBuilder.credential(clientSecretCredential);
+ } else {
+ // Error out as at least one mode of auth info needed
+ throw new IllegalArgumentException("Expecting either (accountName,
accessKey) or (clientId, clientSecret, tenantId)");
+ }
- DataLakeServiceClient serviceClient = new
DataLakeServiceClientBuilder().credential(sharedKeyCredential)
- .endpoint(dfsServiceEndpointUrl)
- .buildClient();
+ _blobServiceClient = blobServiceClientBuilder.buildClient();
+ DataLakeServiceClient serviceClient =
dataLakeServiceClientBuilder.buildClient();
+ _fileSystemClient = getOrCreateClientWithFileSystem(serviceClient,
fileSystemName);
- _blobServiceClient =
- new
BlobServiceClientBuilder().credential(sharedKeyCredential).endpoint(blobServiceEndpointUrl).buildClient();
- _fileSystemClient = serviceClient.getFileSystemClient(fileSystemName);
LOGGER.info("ADLSGen2PinotFS is initialized (accountName={},
fileSystemName={}, dfsServiceEndpointUrl={}, "
+ "blobServiceEndpointUrl={}, enableChecksum={})", accountName,
fileSystemName, dfsServiceEndpointUrl,
blobServiceEndpointUrl, _enableChecksum);
}
+ /**
+ * Returns the DataLakeFileSystemClient to the specified file system
creating if it doesn't exist.
+ *
+ * @param serviceClient authenticated data lake service client to an account
+ * @param fileSystemName name of the file system (blob container)
+ * @return DataLakeFileSystemClient with the specified fileSystemName.
+ */
+ @VisibleForTesting
+ public DataLakeFileSystemClient
getOrCreateClientWithFileSystem(DataLakeServiceClient serviceClient,
+ String fileSystemName) {
+ try {
+ DataLakeFileSystemClient fileSystemClient =
serviceClient.getFileSystemClient(fileSystemName);
+ // The return value is irrelevant. This is to test if the filesystem
exists.
+ fileSystemClient.getProperties();
+ return fileSystemClient;
+ } catch (DataLakeStorageException e) {
+ if (e.getStatusCode() == NOT_FOUND_STATUS_CODE &&
e.getErrorCode().equals(FILE_SYSTEM_NOT_FOUND_ERROR_CODE)) {
+ LOGGER.info("FileSystem with name {} does not exist. Creating one with
the same name.", fileSystemName);
+ return serviceClient.createFileSystem(fileSystemName);
+ } else {
+ throw e;
Review comment:
It'd be good to log one more message here saying we've tried but also
failed.
##########
File path: pinot-plugins/pinot-file-system/pinot-adls/pom.xml
##########
@@ -39,12 +39,46 @@
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-data-lake-store-sdk</artifactId>
- <version>2.1.5</version>
+ <version>2.3.9</version>
</dependency>
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-storage-file-datalake</artifactId>
- <version>12.0.0-beta.12</version>
+ <version>12.4.0</version>
+ </dependency>
+ <dependency>
+ <groupId>com.azure</groupId>
+ <artifactId>azure-identity</artifactId>
+ <version>1.2.2</version>
</dependency>
</dependencies>
+ <dependencyManagement>
+ <dependencies>
+ <dependency>
+ <groupId>org.codehaus.woodstox</groupId>
Review comment:
What's the minimal do you think it's needed for your change? Are the
following dependencies going to be used in the real code, or they are used in
the test? If latter, maybe we can restrict the scope of those dependencies.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]