[
https://issues.apache.org/jira/browse/NIFI-4004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141875#comment-16141875
]
ASF GitHub Bot commented on NIFI-4004:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/1877#discussion_r135305805
--- Diff:
nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-service-api/src/main/java/org/apache/nifi/serialization/RecordSetWriterFactory.java
---
@@ -45,20 +51,23 @@
*/
public interface RecordSetWriterFactory extends ControllerService {
+ InputStream EMPTY_INPUT_STREAM = new ByteArrayInputStream(new byte[0]);
+
/**
* <p>
- * Returns the Schema that will be used for writing Records. Note that
the FlowFile and InputStream that are given
- * may well be different than the FlowFile that the writer will write
to. The given FlowFile and InputStream are
+ * Returns the Schema that will be used for writing Records. Note that
the InputStream that are given
+ * may well be different than the content that the writer will write.
The given variables and InputStream are
* intended to be used for determining the schema that should be used
when writing records.
* </p>
*
- * @param flowFile the FlowFile from which the schema should be
determined.
+ * @param variables the variables which is used to resolve Record
Schema via Expression Language, can be null or empty
+ * @param content the contents of the input data from which to
determine the schema
* @param readSchema the schema that was read from the incoming
FlowFile, or <code>null</code> if there is no input schema
*
* @return the Schema that should be used for writing Records
* @throws SchemaNotFoundException if unable to find the schema
*/
- RecordSchema getSchema(FlowFile flowFile, RecordSchema readSchema)
throws SchemaNotFoundException, IOException;
+ RecordSchema getSchema(Map<String, String> variables, InputStream
content, RecordSchema readSchema) throws SchemaNotFoundException, IOException;
--- End diff --
@ijokarumawak can you explain the reasoning here for providing a
Map<String, String> and an InputStream? Previously, with just the FlowFile, the
Writer had access only to the attributes, not the content (because it had no
Process Session). I believe that is the correct abstraction. The RecordSchema
from the reader already is passed in, and the FlowFile being written to will
have no content, generally, so reading from the destination FlowFile wouldn't
make sense. It's possible that I'm just misunderstanding the idea here, though.
> Refactor RecordReaderFactory and SchemaAccessStrategy to be used without
> incoming FlowFile
> ------------------------------------------------------------------------------------------
>
> Key: NIFI-4004
> URL: https://issues.apache.org/jira/browse/NIFI-4004
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Extensions
> Affects Versions: 1.2.0
> Reporter: Koji Kawamura
> Assignee: Koji Kawamura
>
> Current RecordReaderFactory and SchemaAccessStrategy implementation assumes
> there's always an incoming FlowFile available, and use it to resolve Record
> Schema.
> That is fine for components those convert or update incoming FlowFiles,
> however there are other components those does not have any incoming
> FlowFiles, for example, ConsumeKafkaRecord_0_10. Typically, ones fetches data
> from external system do not have incoming FlowFile. And current API doesn't
> fit well with these as it requires a FlowFile.
> In fact, [ConsumeKafkaRecord creates a temporal
> FlowFile|https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-kafka-bundle/nifi-kafka-0-10-processors/src/main/java/org/apache/nifi/processors/kafka/pubsub/ConsumerLease.java#L426]
> only to get RecordSchema. This should be avoided as we expect more
> components start using Record reader mechanism.
> This JIRA proposes refactoring current API to allow accessing RecordReaders
> without needing an incoming FlowFile.
> Additionally, since there's Schema Access Strategy that requires incoming
> FlowFile containing attribute values to access schema registry, it'd be
> useful if we could tell user when such RecordReader is specified that it
> can't be used.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)