exceptionfactory commented on code in PR #8011:
URL: https://github.com/apache/nifi/pull/8011#discussion_r1393158889
##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -89,14 +89,17 @@
@Tags({"compression", "gzip", "bzip2", "zip", "MIME", "mime.type", "file",
"identify"})
@CapabilityDescription("Attempts to identify the MIME Type used for a
FlowFile. If the MIME Type can be identified, "
+ "an attribute with the name 'mime.type' is added with the value
being the MIME Type. If the MIME Type cannot be determined, "
- + "the value will be set to 'application/octet-stream'. In addition,
the attribute mime.extension will be set if a common file "
- + "extension for the MIME Type is known.")
+ + "the value will be set to 'application/octet-stream'. In addition,
the attribute 'mime.extension' will be set if a common file "
+ + "extension for the MIME Type is known. If the MIME Type detected is
of type text/*, attempts to identify the charset used " +
+ "and an attribute with the name 'mime.charset' is added with the value
being the charset.")
@WritesAttributes({
-@WritesAttribute(attribute = "mime.type", description = "This Processor sets
the FlowFile's mime.type attribute to the detected MIME Type. "
- + "If unable to detect the MIME Type, the attribute's value will be
set to application/octet-stream"),
-@WritesAttribute(attribute = "mime.extension", description = "This Processor
sets the FlowFile's mime.extension attribute to the file "
- + "extension associated with the detected MIME Type. "
- + "If there is no correlated extension, the attribute's value will be
empty")
+ @WritesAttribute(attribute = "mime.type", description = "This
Processor sets the FlowFile's mime.type attribute to the detected MIME Type. "
+ + "If unable to detect the MIME Type, the attribute's value
will be set to application/octet-stream"),
+ @WritesAttribute(attribute = "mime.extension", description = "This
Processor sets the FlowFile's mime.extension attribute to the file "
+ + "extension associated with the detected MIME Type. "
+ + "If there is no correlated extension, the attribute's value
will be empty"),
+ @WritesAttribute(attribute = "mime.charset", description = "This
Processor sets the FlowFile's mime.charset attribute to the detected charset. "
+ + "If unable to detect the charset or the detected MIME type
is not of type text/*, the attribute's value will be empty")
Review Comment:
As noted below, it seems better to avoid setting the attribute instead of
setting an empty value for null.
##########
nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/IdentifyMimeType.java:
##########
@@ -233,53 +239,62 @@ public void onTrigger(final ProcessContext context, final
ProcessSession session
}
final ComponentLog logger = getLogger();
- final AtomicReference<String> mimeTypeRef = new
AtomicReference<>(null);
- final String filename =
flowFile.getAttribute(CoreAttributes.FILENAME.key());
-
- session.read(flowFile, new InputStreamCallback() {
- @Override
- public void process(final InputStream stream) throws IOException {
- try (final InputStream in = new BufferedInputStream(stream);
- final TikaInputStream tikaStream =
TikaInputStream.get(in)) {
- Metadata metadata = new Metadata();
-
- if (filename != null &&
context.getProperty(USE_FILENAME_IN_DETECTION).asBoolean()) {
- metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY,
filename);
- }
- // Get mime type
- MediaType mediatype = detector.detect(tikaStream,
metadata);
- mimeTypeRef.set(mediatype.toString());
- }
+
+ final String mediaTypeString;
+ final String extension;
+ final Charset charset;
+
+ try (final InputStream flowFileStream = session.read(flowFile);
+ final TikaInputStream tikaStream =
TikaInputStream.get(flowFileStream)) {
+ final String filename =
flowFile.getAttribute(CoreAttributes.FILENAME.key());
+
+ Metadata metadata = new Metadata();
+ if (filename != null &&
context.getProperty(USE_FILENAME_IN_DETECTION).asBoolean()) {
+ metadata.add(TikaCoreProperties.RESOURCE_NAME_KEY, filename);
}
- });
- String mimeType = mimeTypeRef.get();
+ final MediaType mediaType = detector.detect(tikaStream, metadata);
+ mediaTypeString = mediaType.getBaseType().toString();
+ extension = lookupExtension(mediaTypeString, logger);
+ charset = identifyCharset(tikaStream, metadata, mediaType);
+ } catch (IOException e) {
+ throw new ProcessException("IOException thrown identifying
mime-type of FlowFile content", e);
Review Comment:
It is not necessary to repeat `IOException` in the message since that will
be part of the stack trace. Recommend the following:
```suggestion
throw new ProcessException("Failed to identify MIME type from
content stream", e);
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]