[
https://issues.apache.org/jira/browse/NIFI-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563459#comment-15563459
]
James Wing commented on NIFI-2809:
----------------------------------
Thanks for the updated patch. I did a build, ran the integration tests, a few
ad-hoc tests, and some code review. The functionality seems pretty good so
far, most of my comments are about code design and styling:
*1. Build*
The build has a few RAT and contrib-check issues for a file missing a license
and some minor code formatting. You can see these if you run the following
from the root of the repository:
{code}
mvn clean install -Pcontrib-check
{code}
*2. Properties*
A few comments on the processors' PropertyDescriptors:
* PropertyDescriptors should be defined with a {{name}} that is a key or ID,
and a {{displayName}} nicely formatted for the UI. Display names are typically
capitalized like "Bucket" and "Filename".
* Default values should be provided wherever possible. Filename jumped out as
the most relevant, where "$\{filename\}" can be provided as a sensible default.
"$\{mime.type\}" might be a good choice for the Content Type (also from NiFi's
CoreAttributes enum).
* Values read from properties like filename and bucket should not be listed in
@ReadsAttributes. For properties, the user explicitly matches attributes to
the property with expression language. The declared @ReadsAttribute implies
that the processor will read that exact attribute name without asking or giving
an opportunity to edit the expression.
* Did you consider using "Object Name" over "Filename"? I don't have a strong
opinion here. Filename is fairly standard for NiFi, but typically implies the
name of the leaf file rather than a directory path and file name (NiFi also has
a standard "path" attribute for the directories). To GCS, it's all one big
key. The S3 processors use AWS-like "Key" in the UI, but write the whole key
to the Filename attribute after a fetch, for whatever that's worth.
*3. StorageFactory*
The StorageFactory class looks OK, but I'm a bit concerned about each onTrigger
calling a synchronized method, however briefly. Would it be possible to give
the processors (or AbstractGCSProcessor) an @OnScheduled method that assigns
the storage client once on processor start? AbstractAWSProcessor does this,
for example.
*4. Unit Testing*
I'd like to upsell you on some unit tests. The integration tests work, but
most people will be too lazy to run them (not me, of course, but other people
:)). But the precedent is a powerful thing, and even basic unit tests will
get things going in the right direction.
I know unit tests for cloud services are tricky. It can be easy when there is
a clear SDK method that converts requests into responses, and we can move that
call into a separate processor class method that is easily overridden or
mocked. That seems promising for PutGCSObject, for example, where there is a
nice {{StorageObject obj = insertRequest.execute()}}. It didn't look quite so
easy for FetchGCSObject.
Did you try this already? I confess I'm not very familiar with the GCS SDK and
typical mocking strategies there. I would be happy to give it a shot and see
if I'm talking nonsense here.
> Add processors for Google Cloud Storage Fetch/Put/Delete
> --------------------------------------------------------
>
> Key: NIFI-2809
> URL: https://issues.apache.org/jira/browse/NIFI-2809
> Project: Apache NiFi
> Issue Type: New Feature
> Affects Versions: 1.1.0
> Reporter: Frank Maritato
> Attachments: gcp_bundle.patch, gcs_bundle.patch
>
>
> Creating a ticket to add nifi processors for Google Cloud Storage similar to
> how the AWS S3 processors are done:
> * FetchGCSObject
> * PutGCSObject
> * DeleteGCSObject
> I have the code mostly written and will attach the patch file when complete.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)