[ 
https://issues.apache.org/jira/browse/NIFI-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563459#comment-15563459
 ] 

James Wing commented on NIFI-2809:
----------------------------------

Thanks for the updated patch.  I did a build, ran the integration tests, a few 
ad-hoc tests, and some code review.  The functionality seems pretty good so 
far, most of my comments are about code design and styling:

*1.  Build*
The build has a few RAT and contrib-check issues for a file missing a license 
and some minor code formatting.  You can see these if you run the following 
from the root of the repository:

{code}
mvn clean install -Pcontrib-check
{code}

*2. Properties*
A few comments on the processors' PropertyDescriptors:
 * PropertyDescriptors should be defined with a {{name}} that is a key or ID, 
and a {{displayName}} nicely formatted for the UI.  Display names are typically 
capitalized like "Bucket" and "Filename".
 * Default values should be provided wherever possible.  Filename jumped out as 
the most relevant, where "$\{filename\}" can be provided as a sensible default. 
 "$\{mime.type\}" might be a good choice for the Content Type (also from NiFi's 
CoreAttributes enum).
 * Values read from properties like filename and bucket should not be listed in 
@ReadsAttributes.  For properties, the user explicitly matches attributes to 
the property with expression language.  The declared @ReadsAttribute implies 
that the processor will read that exact attribute name without asking or giving 
an opportunity to edit the expression.
 * Did you consider using "Object Name" over "Filename"?  I don't have a strong 
opinion here.  Filename is fairly standard for NiFi, but typically implies the 
name of the leaf file rather than a directory path and file name (NiFi also has 
a standard "path" attribute for the directories).  To GCS, it's all one big 
key.  The S3 processors use AWS-like "Key" in the UI, but write the whole key 
to the Filename attribute after a fetch, for whatever that's worth.

*3. StorageFactory*
The StorageFactory class looks OK, but I'm a bit concerned about each onTrigger 
calling a synchronized method, however briefly.  Would it be possible to give 
the processors (or AbstractGCSProcessor) an @OnScheduled method that assigns 
the storage client once on processor start?  AbstractAWSProcessor does this, 
for example.

*4. Unit Testing*
I'd like to upsell you on some unit tests.  The integration tests work, but 
most people will be too lazy to run them (not me, of course, but other people 
:)).  But the precedent is a powerful thing,   and even basic unit tests will 
get things going in the right direction.

I know unit tests for cloud services are tricky.  It can be easy when there is 
a clear SDK method that converts requests into responses, and we can move that 
call into a separate processor class method that is easily overridden or 
mocked.  That seems promising for PutGCSObject, for example, where there is a 
nice {{StorageObject obj = insertRequest.execute()}}.  It didn't look quite so 
easy for FetchGCSObject.

Did you try this already?  I confess I'm not very familiar with the GCS SDK and 
typical mocking strategies there.  I would be happy to give it a shot and see 
if I'm talking nonsense here.

> Add processors for Google Cloud Storage Fetch/Put/Delete
> --------------------------------------------------------
>
>                 Key: NIFI-2809
>                 URL: https://issues.apache.org/jira/browse/NIFI-2809
>             Project: Apache NiFi
>          Issue Type: New Feature
>    Affects Versions: 1.1.0
>            Reporter: Frank Maritato
>         Attachments: gcp_bundle.patch, gcs_bundle.patch
>
>
> Creating a ticket to add nifi processors for Google Cloud Storage similar to 
> how the AWS S3 processors are done:
> * FetchGCSObject
> * PutGCSObject
> * DeleteGCSObject
> I have the code mostly written and will attach the patch file when complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to