lakshmi-manasa-g opened a new pull request #1358:
URL: https://github.com/apache/samza/pull/1358


   **Feature:** This PR extends the existing AzureBlobSystemProducer by adding 
the ability to enhance an azure storage blob with metadata - basically 
key-value pairs to the blob's metadata (sits outside of the blob's actual 
content). This aids tremendously in ingesting these blobs into Kusto (Azure's 
data explorer). 
   
   **Changes:** New interfaces BlobMetadataGeneratorFactory, 
BlobMetadataGenerator and class BlobMetadataContext added where the user needs 
to implement BlobMetadataGeneratorFactory and BlobMetadataGenerator which takes 
in a BlobMetadataContext containing the stream name and the size of the current 
blob being committed. An instance of the generator will be created when the 
blob is about to be committed. Generator is invoked to get metadata and the 
returned metadata is attached to the blob.
   Any additional configs (key:value pairs) needed for this generator can be 
passed as with prefix 
systems.<system-name>.azureblob.metadataGeneratorConfig.\<key\> with value 
\<value\>. 
   
   A default impl is provided which does not add anything to the blob.
   
   **API changes:** New interfaces BlobMetadataGeneratorFactory, 
BlobMetadataGenerator and class BlobMetadataContext added. New configs with 
prefix systems.<system-name>.azureblob.metadataGeneratorConfig. The factory is 
wired in through 
systems.<system-name>.azureblob.metadataPropertiesGeneratorFactory.
   
   **Upgrade instructions:** Backwards compatible. If an impl is not provided 
the default no-op impl is used.
   
   **Usage instructions:** Wire in the generator factory through 
systems.<system-name>.azureblob.metadataPropertiesGeneratorFactory and pass in 
additional configs with prefix 
systems.<system-name>.azureblob.metadataGeneratorConfig 
   
   **Tests:** Existing tests updated to assert for metadata attached.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to