[ https://issues.apache.org/jira/browse/NIFI-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563459#comment-15563459 ]
James Wing commented on NIFI-2809: ---------------------------------- Thanks for the updated patch. I did a build, ran the integration tests, a few ad-hoc tests, and some code review. The functionality seems pretty good so far, most of my comments are about code design and styling: *1. Build* The build has a few RAT and contrib-check issues for a file missing a license and some minor code formatting. You can see these if you run the following from the root of the repository: {code} mvn clean install -Pcontrib-check {code} *2. Properties* A few comments on the processors' PropertyDescriptors: * PropertyDescriptors should be defined with a {{name}} that is a key or ID, and a {{displayName}} nicely formatted for the UI. Display names are typically capitalized like "Bucket" and "Filename". * Default values should be provided wherever possible. Filename jumped out as the most relevant, where "$\{filename\}" can be provided as a sensible default. "$\{mime.type\}" might be a good choice for the Content Type (also from NiFi's CoreAttributes enum). * Values read from properties like filename and bucket should not be listed in @ReadsAttributes. For properties, the user explicitly matches attributes to the property with expression language. The declared @ReadsAttribute implies that the processor will read that exact attribute name without asking or giving an opportunity to edit the expression. * Did you consider using "Object Name" over "Filename"? I don't have a strong opinion here. Filename is fairly standard for NiFi, but typically implies the name of the leaf file rather than a directory path and file name (NiFi also has a standard "path" attribute for the directories). To GCS, it's all one big key. The S3 processors use AWS-like "Key" in the UI, but write the whole key to the Filename attribute after a fetch, for whatever that's worth. *3. StorageFactory* The StorageFactory class looks OK, but I'm a bit concerned about each onTrigger calling a synchronized method, however briefly. Would it be possible to give the processors (or AbstractGCSProcessor) an @OnScheduled method that assigns the storage client once on processor start? AbstractAWSProcessor does this, for example. *4. Unit Testing* I'd like to upsell you on some unit tests. The integration tests work, but most people will be too lazy to run them (not me, of course, but other people :)). But the precedent is a powerful thing, and even basic unit tests will get things going in the right direction. I know unit tests for cloud services are tricky. It can be easy when there is a clear SDK method that converts requests into responses, and we can move that call into a separate processor class method that is easily overridden or mocked. That seems promising for PutGCSObject, for example, where there is a nice {{StorageObject obj = insertRequest.execute()}}. It didn't look quite so easy for FetchGCSObject. Did you try this already? I confess I'm not very familiar with the GCS SDK and typical mocking strategies there. I would be happy to give it a shot and see if I'm talking nonsense here. > Add processors for Google Cloud Storage Fetch/Put/Delete > -------------------------------------------------------- > > Key: NIFI-2809 > URL: https://issues.apache.org/jira/browse/NIFI-2809 > Project: Apache NiFi > Issue Type: New Feature > Affects Versions: 1.1.0 > Reporter: Frank Maritato > Attachments: gcp_bundle.patch, gcs_bundle.patch > > > Creating a ticket to add nifi processors for Google Cloud Storage similar to > how the AWS S3 processors are done: > * FetchGCSObject > * PutGCSObject > * DeleteGCSObject > I have the code mostly written and will attach the patch file when complete. -- This message was sent by Atlassian JIRA (v6.3.4#6332)