[ https://issues.apache.org/jira/browse/NIFI-5213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508944#comment-16508944 ]
ASF GitHub Bot commented on NIFI-5213: -------------------------------------- Github user MikeThomsen commented on a diff in the pull request: https://github.com/apache/nifi/pull/2718#discussion_r194575493 --- Diff: nifi-nar-bundles/nifi-standard-services/nifi-record-serialization-services-bundle/nifi-record-serialization-services/src/test/java/org/apache/nifi/avro/TestAvroReaderWithExplicitSchema.java --- @@ -0,0 +1,62 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.avro; + +import org.apache.avro.Schema; +import org.apache.nifi.serialization.SimpleRecordSchema; +import org.apache.nifi.serialization.record.RecordSchema; +import org.junit.Test; + +import java.io.File; +import java.io.FileInputStream; +import java.io.IOException; + +public class TestAvroReaderWithExplicitSchema { + + @Test + public void testAvroExplicitReaderWithSchemalessFile() throws Exception { + File avroFileWithEmbeddedSchema = new File("src/test/resources/avro/avro_schemaless.avro"); + FileInputStream fileInputStream = new FileInputStream(avroFileWithEmbeddedSchema); + Schema dataSchema = new Schema.Parser().parse(new File("src/test/resources/avro/avro_schemaless.avsc")); + RecordSchema recordSchema = new SimpleRecordSchema(dataSchema.toString(), AvroTypeUtil.AVRO_SCHEMA_FORMAT, null); + + AvroReaderWithExplicitSchema avroReader = new AvroReaderWithExplicitSchema(fileInputStream, recordSchema, dataSchema); + avroReader.nextAvroRecord(); --- End diff -- As-is, this wouldn't test anything that I can tell with that method because all of the errors are caught in that method. Should have some assertions around it IMO. > Allow AvroReader with explicit schema to read files with embedded schema > ------------------------------------------------------------------------ > > Key: NIFI-5213 > URL: https://issues.apache.org/jira/browse/NIFI-5213 > Project: Apache NiFi > Issue Type: Improvement > Components: Extensions > Reporter: Matt Burgess > Assignee: Matt Burgess > Priority: Minor > > AvroReader allows the choice of schema access strategy from such options as > Use Embedded Schema, Use Schema Name, Use Schema Text, etc. If the incoming > Avro files will have embedded schemas, then Use Embedded Schema is best > practice for the Avro Reader. However it is not intuitive that if the same > schema that is embedded in the file is specified by name (using a schema > registry) or explicitly via Schema Text, that errors can occur. This has been > noticed in QueryRecord for example, and the error is also not intuitive or > descriptive (it is often an ArrayIndexOutOfBoundsException). > To provide a better user experience, it would be an improvement for > AvroReader to be able to successfully process Avro files with embedded > schemas, even when the Schema Access Strategy is not "Use Embedded Schema". > Of course, the explicit schema would have to match the embedded schema, or an > error would be reported (and rightfully so). -- This message was sent by Atlassian JIRA (v7.6.3#76005)