turcsanyip commented on code in PR #5916:
URL: https://github.com/apache/nifi/pull/5916#discussion_r841468038
##########
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/ListAzureBlobStorage_v12.java:
##########
@@ -199,34 +201,44 @@ protected boolean isListingResetNecessary(final
PropertyDescriptor property) {
}
@Override
- protected List<BlobInfo> performListing(ProcessContext context, Long
minTimestamp, ListingMode listingMode) throws IOException {
- String containerName =
context.getProperty(CONTAINER).evaluateAttributeExpressions().getValue();
- String prefix =
context.getProperty(BLOB_NAME_PREFIX).evaluateAttributeExpressions().getValue();
+ protected List<BlobInfo> performListing(final ProcessContext context,
final Long minTimestamp, final ListingMode listingMode) throws IOException {
+ final String containerName =
context.getProperty(CONTAINER).evaluateAttributeExpressions().getValue();
+ final String prefix =
context.getProperty(BLOB_NAME_PREFIX).evaluateAttributeExpressions().getValue();
+ final long minimumTimestamp = minTimestamp == null ? 0 : minTimestamp;
try {
- List<BlobInfo> listing = new ArrayList<>();
+ final List<BlobInfo> listing = new ArrayList<>();
- BlobContainerClient containerClient =
storageClient.getBlobContainerClient(containerName);
+ final BlobContainerClient containerClient =
storageClient.getBlobContainerClient(containerName);
- ListBlobsOptions options = new ListBlobsOptions()
+ final ListBlobsOptions options = new ListBlobsOptions()
.setPrefix(prefix);
- for (BlobItem blob : containerClient.listBlobs(options, null)) {
- BlobItemProperties properties = blob.getProperties();
-
- Builder builder = new Builder()
- .containerName(containerName)
- .blobName(blob.getName())
- .primaryUri(String.format("%s/%s",
containerClient.getBlobContainerUrl(), blob.getName()))
- .etag(properties.getETag())
- .blobType(properties.getBlobType().toString())
- .contentType(properties.getContentType())
- .contentLanguage(properties.getContentLanguage())
-
.lastModifiedTime(properties.getLastModified().toInstant().toEpochMilli())
- .length(properties.getContentLength());
-
- listing.add(builder.build());
- }
+ final Iterator<PagedResponse<BlobItem>> result =
containerClient.listBlobs(options, null).iterableByPage().iterator();
+ String continuationToken;
+
+ do {
+ final PagedResponse<BlobItem> pagedResult = result.next();
+ continuationToken = pagedResult.getContinuationToken();
Review Comment:
It seems to me that the items can be processed without tracking the
continuation token and using a simple iterator (instead of the paged iterator).
Based on this documentation:
https://docs.microsoft.com/en-us/azure/developer/java/sdk/pagination
> Make it possible to easily iterate over each element in the collection
individually, ignoring any need for manual pagination or tracking of
continuation tokens.
> Regardless of whether you iterate by page or by each item, there's no
difference in performance or the number of calls made to the service.
It is applicable to the v12 Blob and the ADLS processors as they use the new
SDK.
So I think only the timestamp filtering needs to be added here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]