exceptionfactory commented on code in PR #8011:
URL: https://github.com/apache/nifi/pull/8011#discussion_r1393158889


##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -89,14 +89,17 @@
 @Tags({"compression", "gzip", "bzip2", "zip", "MIME", "mime.type", "file", 
"identify"})
 @CapabilityDescription("Attempts to identify the MIME Type used for a 
FlowFile. If the MIME Type can be identified, "
         + "an attribute with the name 'mime.type' is added with the value 
being the MIME Type. If the MIME Type cannot be determined, "
-        + "the value will be set to 'application/octet-stream'. In addition, 
the attribute mime.extension will be set if a common file "
-        + "extension for the MIME Type is known.")
+        + "the value will be set to 'application/octet-stream'. In addition, 
the attribute 'mime.extension' will be set if a common file "
+        + "extension for the MIME Type is known. If the MIME Type detected is 
of type text/*, attempts to identify the charset used " +
+        "and an attribute with the name 'mime.charset' is added with the value 
being the charset.")
 @WritesAttributes({
-@WritesAttribute(attribute = "mime.type", description = "This Processor sets 
the FlowFile's mime.type attribute to the detected MIME Type. "
-        + "If unable to detect the MIME Type, the attribute's value will be 
set to application/octet-stream"),
-@WritesAttribute(attribute = "mime.extension", description = "This Processor 
sets the FlowFile's mime.extension attribute to the file "
-        + "extension associated with the detected MIME Type. "
-        + "If there is no correlated extension, the attribute's value will be 
empty")
+        @WritesAttribute(attribute = "mime.type", description = "This 
Processor sets the FlowFile's mime.type attribute to the detected MIME Type. "
+                + "If unable to detect the MIME Type, the attribute's value 
will be set to application/octet-stream"),
+        @WritesAttribute(attribute = "mime.extension", description = "This 
Processor sets the FlowFile's mime.extension attribute to the file "
+                + "extension associated with the detected MIME Type. "
+                + "If there is no correlated extension, the attribute's value 
will be empty"),
+        @WritesAttribute(attribute = "mime.charset", description = "This 
Processor sets the FlowFile's mime.charset attribute to the detected charset. "
+                + "If unable to detect the charset or the detected MIME type 
is not of type text/*, the attribute's value will be empty")

Review Comment:
   As noted below, it seems better to avoid setting the attribute instead of 
setting an empty value for null.



##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -233,53 +239,62 @@ public void onTrigger(final ProcessContext context, final 
ProcessSession session
         }
 
         final ComponentLog logger = getLogger();
-        final AtomicReference<String> mimeTypeRef = new 
AtomicReference<>(null);
-        final String filename = 
flowFile.getAttribute(CoreAttributes.FILENAME.key());
-
-        session.read(flowFile, new InputStreamCallback() {
-            @Override
-            public void process(final InputStream stream) throws IOException {
-                try (final InputStream in = new BufferedInputStream(stream);
-                     final TikaInputStream tikaStream = 
TikaInputStream.get(in)) {
-                    Metadata metadata = new Metadata();
-
-                    if (filename != null && 
context.getProperty(USE_FILENAME_IN_DETECTION).asBoolean()) {
-                        metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, 
filename);
-                    }
-                    // Get mime type
-                    MediaType mediatype = detector.detect(tikaStream, 
metadata);
-                    mimeTypeRef.set(mediatype.toString());
-                }
+
+        final String mediaTypeString;
+        final String extension;
+        final Charset charset;
+
+        try (final InputStream flowFileStream = session.read(flowFile);
+             final TikaInputStream tikaStream = 
TikaInputStream.get(flowFileStream)) {
+            final String filename = 
flowFile.getAttribute(CoreAttributes.FILENAME.key());
+
+            Metadata metadata = new Metadata();
+            if (filename != null && 
context.getProperty(USE_FILENAME_IN_DETECTION).asBoolean()) {
+                metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, filename);
             }
-        });
 
-        String mimeType = mimeTypeRef.get();
+            final MediaType mediaType = detector.detect(tikaStream, metadata);
+            mediaTypeString = mediaType.getBaseType().toString();
+            extension = lookupExtension(mediaTypeString, logger);
+            charset = identifyCharset(tikaStream, metadata, mediaType);
+        } catch (IOException e) {
+            throw new ProcessException("IOException thrown identifying 
mime-type of FlowFile content", e);

Review Comment:
   It is not necessary to repeat `IOException` in the message since that will 
be part of the stack trace. Recommend the following:
   ```suggestion
               throw new ProcessException("Failed to identify MIME type from 
content stream", e);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to