[
https://issues.apache.org/jira/browse/NIFI-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736813#comment-14736813
]
Sean Busbey commented on NIFI-912:
----------------------------------
{code}
+ @WritesAttribute(attribute = "schema.fingerprint", description = "The
fingerprint of the schema as determined by the Fingerprint Algorithm.")
{code}
Note that the fingerprint will be a hex-string?
{code}
+ static final PropertyDescriptor COUNT_RECORDS = new
PropertyDescriptor.Builder()
+ .name("Count Records")
+ .description("If true the number of records in the datafile will
be counted and stored in a FlowFile attribute 'record.count'.")
+ .addValidator(StandardValidators.BOOLEAN_VALIDATOR)
+ .allowableValues("true", "false")
+ .defaultValue("false")
+ .required(true)
+ .build();
+
{code}
nit: worth noting that we do this by looking at metadata within the datafile
format and not by e.g. deserializing the records?
{code}
+ @Test
+ public void testExtractionWithNonRecordSchema() throws IOException {
+ final TestRunner runner = TestRunners.newTestRunner(new
ExtractAvroMetadata());
+ final Schema schema = new Schema.Parser().parse(new
File("src/test/resources/array.avsc"));
+
+ final GenericData.Array<String> data = new GenericData.Array<>(schema,
Arrays.asList("one", "two", "three"));
+ final DatumWriter<GenericData.Array<String>> datumWriter = new
GenericDatumWriter<>(schema);
+
+ final ByteArrayOutputStream out = new ByteArrayOutputStream();
+ final DataFileWriter<GenericData.Array<String>> dataFileWriter = new
DataFileWriter<>(datumWriter);
+ dataFileWriter.create(schema, out);
+ dataFileWriter.append(data);
+ dataFileWriter.close();
+
+ runner.enqueue(out.toByteArray());
+ runner.run();
+
+ runner.assertAllFlowFilesTransferred(ConvertAvroToJSON.REL_SUCCESS, 1);
+
+ final MockFlowFile flowFile =
runner.getFlowFilesForRelationship(ExtractAvroMetadata.REL_SUCCESS).get(0);
+
flowFile.assertAttributeExists(ExtractAvroMetadata.SCHEMA_FINGERPRINT_ATTR);
+ flowFile.assertAttributeEquals(ExtractAvroMetadata.SCHEMA_TYPE_ATTR,
Schema.Type.ARRAY.getName());
+ flowFile.assertAttributeEquals(ExtractAvroMetadata.SCHEMA_NAME_ATTR,
"array");
+ }
{code}
Maybe "record count" was a bad choice of name on my part? We should be able to
get the count of data in this flow too, right?
> Support extracting metadata from Avro file headers
> --------------------------------------------------
>
> Key: NIFI-912
> URL: https://issues.apache.org/jira/browse/NIFI-912
> Project: Apache NiFi
> Issue Type: Improvement
> Reporter: Bryan Bende
> Assignee: Bryan Bende
> Priority: Minor
> Fix For: 0.4.0
>
> Attachments: NIFI-912-2.patch, NIFI-912.patch
>
>
> Extract metadata from Avro file headers to FlowFile attributes so that
> downstream processors can make decisions, such as merging together records of
> compatible schemas (i.e. the correlation attribute).
> Information to extract:
> - Schema definition (full, not fp)
> - Schema fingerprint
> - Schema root record name (if schema is a record)
> - Key/value metadata, like compression codec
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)