lakshmi-manasa-g opened a new pull request #1358: URL: https://github.com/apache/samza/pull/1358
**Feature:** This PR extends the existing AzureBlobSystemProducer by adding the ability to enhance an azure storage blob with metadata - basically key-value pairs to the blob's metadata (sits outside of the blob's actual content). This aids tremendously in ingesting these blobs into Kusto (Azure's data explorer). **Changes:** New interfaces BlobMetadataGeneratorFactory, BlobMetadataGenerator and class BlobMetadataContext added where the user needs to implement BlobMetadataGeneratorFactory and BlobMetadataGenerator which takes in a BlobMetadataContext containing the stream name and the size of the current blob being committed. An instance of the generator will be created when the blob is about to be committed. Generator is invoked to get metadata and the returned metadata is attached to the blob. Any additional configs (key:value pairs) needed for this generator can be passed as with prefix systems.<system-name>.azureblob.metadataGeneratorConfig.\<key\> with value \<value\>. A default impl is provided which does not add anything to the blob. **API changes:** New interfaces BlobMetadataGeneratorFactory, BlobMetadataGenerator and class BlobMetadataContext added. New configs with prefix systems.<system-name>.azureblob.metadataGeneratorConfig. The factory is wired in through systems.<system-name>.azureblob.metadataPropertiesGeneratorFactory. **Upgrade instructions:** Backwards compatible. If an impl is not provided the default no-op impl is used. **Usage instructions:** Wire in the generator factory through systems.<system-name>.azureblob.metadataPropertiesGeneratorFactory and pass in additional configs with prefix systems.<system-name>.azureblob.metadataGeneratorConfig **Tests:** Existing tests updated to assert for metadata attached. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
