turcsanyip commented on code in PR #5916:
URL: https://github.com/apache/nifi/pull/5916#discussion_r841468038


##########
nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/azure/storage/ListAzureBlobStorage_v12.java:
##########
@@ -199,34 +201,44 @@ protected boolean isListingResetNecessary(final 
PropertyDescriptor property) {
     }
 
     @Override
-    protected List<BlobInfo> performListing(ProcessContext context, Long 
minTimestamp, ListingMode listingMode) throws IOException {
-        String containerName = 
context.getProperty(CONTAINER).evaluateAttributeExpressions().getValue();
-        String prefix = 
context.getProperty(BLOB_NAME_PREFIX).evaluateAttributeExpressions().getValue();
+    protected List<BlobInfo> performListing(final ProcessContext context, 
final Long minTimestamp, final ListingMode listingMode) throws IOException {
+        final String containerName = 
context.getProperty(CONTAINER).evaluateAttributeExpressions().getValue();
+        final String prefix = 
context.getProperty(BLOB_NAME_PREFIX).evaluateAttributeExpressions().getValue();
+        final long minimumTimestamp = minTimestamp == null ? 0 : minTimestamp;
 
         try {
-            List<BlobInfo> listing = new ArrayList<>();
+            final List<BlobInfo> listing = new ArrayList<>();
 
-            BlobContainerClient containerClient = 
storageClient.getBlobContainerClient(containerName);
+            final BlobContainerClient containerClient = 
storageClient.getBlobContainerClient(containerName);
 
-            ListBlobsOptions options = new ListBlobsOptions()
+            final ListBlobsOptions options = new ListBlobsOptions()
                     .setPrefix(prefix);
 
-            for (BlobItem blob : containerClient.listBlobs(options, null)) {
-                BlobItemProperties properties = blob.getProperties();
-
-                Builder builder = new Builder()
-                        .containerName(containerName)
-                        .blobName(blob.getName())
-                        .primaryUri(String.format("%s/%s", 
containerClient.getBlobContainerUrl(), blob.getName()))
-                        .etag(properties.getETag())
-                        .blobType(properties.getBlobType().toString())
-                        .contentType(properties.getContentType())
-                        .contentLanguage(properties.getContentLanguage())
-                        
.lastModifiedTime(properties.getLastModified().toInstant().toEpochMilli())
-                        .length(properties.getContentLength());
-
-                listing.add(builder.build());
-            }
+            final Iterator<PagedResponse<BlobItem>> result = 
containerClient.listBlobs(options, null).iterableByPage().iterator();
+            String continuationToken;
+
+            do {
+                final PagedResponse<BlobItem> pagedResult = result.next();
+                continuationToken = pagedResult.getContinuationToken();

Review Comment:
   It seems to me that the items can be processed without tracking the 
continuation token and using a simple iterator (instead of the paged iterator).
   Based on this documentation:
   https://docs.microsoft.com/en-us/azure/developer/java/sdk/pagination
   
   > Make it possible to easily iterate over each element in the collection 
individually, ignoring any need for manual pagination or tracking of 
continuation tokens.
   
   > Regardless of whether you iterate by page or by each item, there's no 
difference in performance or the number of calls made to the service.
   
   It is applicable to the v12 Blob and the ADLS processors as they use the new 
SDK.
   So I think only the timestamp filtering needs to be added here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to