[GitHub] [nifi] tpalfy commented on a diff in pull request #6794: NIFI-10966: Add option to QuerySalesforceObject to run custom query

via GitHub Thu, 16 Feb 2023 09:48:43 -0800


tpalfy commented on code in PR #6794:
URL: https://github.com/apache/nifi/pull/6794#discussion_r1108665694



##########
nifi-nar-bundles/nifi-salesforce-bundle/nifi-salesforce-processors/src/main/java/org/apache/nifi/processors/salesforce/QuerySalesforceObject.java:
##########
@@ -224,6 +282,8 @@ protected List<PropertyDescriptor> 
getSupportedPropertyDescriptors() {
         return Collections.unmodifiableList(Arrays.asList(
                 API_URL,

Review Comment:
   When switching between PROPERTY_BASED and CUSTOM_QUERY the list of 
properties on the processor changes in a confusing manner because dependent and 
non-dependent properties are intermingled.
   
   Can we group them a bit better and move the non-dependent properties either 
before or after the dependent ones?
   See example below.
   
   Also CREATE_ZERO_RECORD_FILES doesn't work with CUSTOM_QUERY.
   
   ```java
                   API_URL,
                   API_VERSION,
                   QUERY_TYPE,
                   CUSTOM_SOQL_QUERY,
                   SOBJECT_NAME,
                   FIELD_NAMES,
                   RECORD_WRITER,
                   AGE_FIELD,
                   INITIAL_AGE_FILTER,
                   AGE_DELAY,
                   CUSTOM_WHERE_CONDITION,
                   READ_TIMEOUT,
                   CREATE_ZERO_RECORD_FILES,
                   TOKEN_PROVIDER
   ```



##########
nifi-nar-bundles/nifi-salesforce-bundle/nifi-salesforce-processors/src/main/java/org/apache/nifi/processors/salesforce/QuerySalesforceObject.java:
##########
@@ -106,13 +111,55 @@
 @DefaultSchedule(strategy = SchedulingStrategy.TIMER_DRIVEN, period = "1 min")
 public class QuerySalesforceObject extends AbstractProcessor {
 
+    static final AllowableValue PROPERTY_BASED_QUERY = new 
AllowableValue("property-based-query", "Property Based Query", "Provide query 
by properties.");
+    static final AllowableValue CUSTOM_QUERY = new 
AllowableValue("custom-query", "Custom Query", "Provide custom SOQL query.");
+
+    static final PropertyDescriptor API_URL = new PropertyDescriptor.Builder()

Review Comment:
   Why can't we use `API_URL` and `API_VERSION` from 
`CommonSalesforceProperties`?



##########
nifi-nar-bundles/nifi-salesforce-bundle/nifi-salesforce-processors/src/main/java/org/apache/nifi/processors/salesforce/QuerySalesforceObject.java:
##########
@@ -261,6 +321,23 @@ protected Collection<ValidationResult> 
customValidate(ValidationContext validati
 
     @Override
     public void onTrigger(final ProcessContext context, final ProcessSession 
session) throws ProcessException {
+        boolean isCustomQuery = 
CUSTOM_QUERY.getValue().equals(context.getProperty(QUERY_TYPE).getValue());
+        AtomicReference<String> nextRecordsUrl = new AtomicReference<>();
+        AtomicReference<String> totalSize = new AtomicReference<>();

Review Comment:
   These `AtomicReference` objects are not used here. I don't think their 
creation should be the responsibility of `onTrigger`, rather that of the 
downstream methods.
   
   The `nextRecordsUrl` constructor would be explicitly called both in those 
methods but I wouldn't consider that a duplication. These being closer to the 
`do...while` block they are used in would help understand their purpose much 
better.



##########
nifi-nar-bundles/nifi-salesforce-bundle/nifi-salesforce-processors/src/main/java/org/apache/nifi/processors/salesforce/QuerySalesforceObject.java:
##########
@@ -387,11 +461,63 @@ public void onTrigger(final ProcessContext context, final 
ProcessSession session
         } while (nextRecordsUrl.get() != null);
     }
 
-    private InputStream getResultInputStream(AtomicReference<String> 
nextRecordsUrl, String querySObject) {
-        if (nextRecordsUrl.get() == null) {
+    private void processCustomQuery(ProcessContext context, ProcessSession 
session, FlowFile flowFile, AtomicReference<String> nextRecordsUrl, 
AtomicReference<String> totalSize) {
+        String customQuery = 
context.getProperty(CUSTOM_SOQL_QUERY).evaluateAttributeExpressions(flowFile).getValue();
+        do {
+            FlowFile outgoingFlowFile;
+            if (flowFile != null) {
+                outgoingFlowFile = session.create(flowFile);
+            } else {
+                outgoingFlowFile = session.create();
+            }
+            Map<String, String> attributes = new HashMap<>();
+            try (InputStream response = 
getResultInputStream(nextRecordsUrl.get(), customQuery)) {
+                outgoingFlowFile = session.write(outgoingFlowFile, 
parseHttpResponse(response, nextRecordsUrl, totalSize));
+                int recordCount = nextRecordsUrl.get() != null ? 2000 : 
Integer.parseInt(totalSize.get()) % 2000;
+                attributes.put(CoreAttributes.MIME_TYPE.key(), 
"application/json");
+                attributes.put(TOTAL_RECORD_COUNT, 
String.valueOf(recordCount));
+                session.adjustCounter("Salesforce records processed", 
recordCount, false);
+                session.putAllAttributes(outgoingFlowFile, attributes);
+                session.transfer(outgoingFlowFile, REL_SUCCESS);
+            } catch (IOException e) {
+                session.remove(outgoingFlowFile);
+                throw new ProcessException("Couldn't get Salesforce records", 
e);
+            }
+        } while (nextRecordsUrl.get() != null);
+        if (flowFile != null) {
+            session.remove(flowFile);

Review Comment:
   Since we _can_ have incoming flowfiles, we should consider adding an 
ORIGINAL and a FAILURE relationship.



##########
nifi-nar-bundles/nifi-salesforce-bundle/nifi-salesforce-processors/src/main/java/org/apache/nifi/processors/salesforce/QuerySalesforceObject.java:
##########
@@ -387,11 +461,63 @@ public void onTrigger(final ProcessContext context, final 
ProcessSession session
         } while (nextRecordsUrl.get() != null);
     }
 
-    private InputStream getResultInputStream(AtomicReference<String> 
nextRecordsUrl, String querySObject) {
-        if (nextRecordsUrl.get() == null) {
+    private void processCustomQuery(ProcessContext context, ProcessSession 
session, FlowFile flowFile, AtomicReference<String> nextRecordsUrl, 
AtomicReference<String> totalSize) {
+        String customQuery = 
context.getProperty(CUSTOM_SOQL_QUERY).evaluateAttributeExpressions(flowFile).getValue();
+        do {
+            FlowFile outgoingFlowFile;
+            if (flowFile != null) {
+                outgoingFlowFile = session.create(flowFile);
+            } else {
+                outgoingFlowFile = session.create();
+            }
+            Map<String, String> attributes = new HashMap<>();
+            try (InputStream response = 
getResultInputStream(nextRecordsUrl.get(), customQuery)) {
+                outgoingFlowFile = session.write(outgoingFlowFile, 
parseHttpResponse(response, nextRecordsUrl, totalSize));
+                int recordCount = nextRecordsUrl.get() != null ? 2000 : 
Integer.parseInt(totalSize.get()) % 2000;
+                attributes.put(CoreAttributes.MIME_TYPE.key(), 
"application/json");
+                attributes.put(TOTAL_RECORD_COUNT, 
String.valueOf(recordCount));
+                session.adjustCounter("Salesforce records processed", 
recordCount, false);
+                session.putAllAttributes(outgoingFlowFile, attributes);
+                session.transfer(outgoingFlowFile, REL_SUCCESS);
+            } catch (IOException e) {
+                session.remove(outgoingFlowFile);
+                throw new ProcessException("Couldn't get Salesforce records", 
e);
+            }
+        } while (nextRecordsUrl.get() != null);
+        if (flowFile != null) {
+            session.remove(flowFile);
+        }
+    }
+
+    private OutputStreamCallback parseHttpResponse(final InputStream in, 
AtomicReference<String> nextRecordsUrl, AtomicReference<String> totalSize) {
+        nextRecordsUrl.set(null);
+        return out -> {
+            try (JsonParser jsonParser = JSON_FACTORY.createParser(in);
+                 JsonGenerator jsonGenerator = 
JSON_FACTORY.createGenerator(out, JsonEncoding.UTF8)) {
+                while (jsonParser.nextToken() != null) {
+                    if (jsonParser.getCurrentToken() == JsonToken.FIELD_NAME 
&& jsonParser.getCurrentName()
+                            .equals(TOTAL_SIZE)) {
+                        jsonParser.nextToken();
+                        totalSize.set(jsonParser.getValueAsString());
+                    } else if (jsonParser.getCurrentToken() == 
JsonToken.FIELD_NAME && jsonParser.getCurrentName()
+                            .equals(NEXT_RECORDS_URL)) {
+                        jsonParser.nextToken();
+                        nextRecordsUrl.set(jsonParser.getValueAsString());
+                    } else if (jsonParser.getCurrentToken() == 
JsonToken.FIELD_NAME && jsonParser.getCurrentName()
+                            .equals(RECORDS)) {
+                        jsonParser.nextToken();
+                        jsonGenerator.copyCurrentStructure(jsonParser);
+                    }

Review Comment:
   Bit of a code duplication here.
   I would consider something like
   ```suggestion
                   while (jsonParser.nextToken() != null) {
                       if (nextTokenIs(jsonParser, JsonToken.FIELD_NAME, 
TOTAL_SIZE)) {
                           totalSize.set(jsonParser.getValueAsString());
                       } else if (nextTokenIs(jsonParser, JsonToken.FIELD_NAME, 
NEXT_RECORDS_URL)) {
                           nextRecordsUrl.set(jsonParser.getValueAsString());
                       } else if (nextTokenIs(jsonParser, JsonToken.FIELD_NAME, 
RECORDS)) {
                           jsonGenerator.copyCurrentStructure(jsonParser);
                       }
   ```
   with
   ```java
       private boolean nextTokenIs(JsonParser jsonParser, JsonToken fieldName, 
String value) throws IOException {
           return jsonParser.getCurrentToken() == fieldName && 
jsonParser.getCurrentName()
                   .equals(value) && jsonParser.nextToken() != null;
       }
   ```
   I do something uncommon here because this boolean method not only checks 
something but also moves to the next token if the corresponding check evaluates 
to true. But this type of behaviour in parsers are usually acceptable.



##########
nifi-nar-bundles/nifi-salesforce-bundle/nifi-salesforce-processors/src/main/java/org/apache/nifi/processors/salesforce/QuerySalesforceObject.java:
##########
@@ -162,6 +212,7 @@ public class QuerySalesforceObject extends 
AbstractProcessor {
             
.expressionLanguageSupported(ExpressionLanguageScope.VARIABLE_REGISTRY)
             .addValidator(StandardValidators.TIME_PERIOD_VALIDATOR)
             .dependsOn(AGE_FIELD)
+            .dependsOn(QUERY_TYPE, PROPERTY_BASED_QUERY)

Review Comment:
   Currently CUSTOM_QUERY does not support incremental loading. That is 
probably intentional but in that case we should emphasize this in the 
`additionalDetails.html` page and maybe even in the `@CapabilityDescription`.



##########
nifi-nar-bundles/nifi-salesforce-bundle/nifi-salesforce-processors/src/main/resources/docs/org.apache.nifi.processors.salesforce.QuerySalesforceObject/additionalDetails.html:
##########
@@ -34,8 +34,8 @@ <h3>Description</h3>
 
 <p>
     Objects in Salesforce are database tables, their rows are known as 
records, and their columns are called fields. The QuerySalesforceObject 
processor queries Salesforce objects and retrieves their records.
-    The processor constructs the query using SOQL (Salesforce Object Query 
Language) and retrieves the result record dataset using the Salesforce REST API.
-    The processor utilizes streams and NiFi record-based processing to be able 
to handle a large number of records and to allow arbitrary output format.
+    The processor constructs the query from parameters or executes a custom 
SOQL (Salesforce Object Query Language) query and retrieves the result record 
dataset using the Salesforce REST API.

Review Comment:
   ```suggestion
       The processor constructs the query from processor properties or executes 
a custom SOQL (Salesforce Object Query Language) query and retrieves the result 
record dataset using the Salesforce REST API.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [nifi] tpalfy commented on a diff in pull request #6794: NIFI-10966: Add option to QuerySalesforceObject to run custom query

Reply via email to