KalleOlaviNiemitalo commented on PR #1748: URL: https://github.com/apache/avro/pull/1748#issuecomment-1172042027
Could you add tests for trailing content in a UTF-8 file parsed using Schema.parse(File file)? [JsonFactory.createParser(File f)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1025-L1031) apparently creates an InputStream for that, and [JsonFactory._createParser(InputStream in, IOContext ctxt)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1653-L1668) calls [ByteSourceJsonBootstrapper.constructParser](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/ByteSourceJsonBootstrapper.java#L251-L271), which creates an UTF8StreamJsonParser in that case. UTF8StreamJsonParser is a "byte-based" parser and inherits [JsonParser.releaseBuffered(Writer w)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91 e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonParser.java#L836), which just returns -1, but [UTF8StreamJsonParser.releaseBuffered(OutputStream out)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/UTF8StreamJsonParser.java#L224-L236) can write something to the OutputStream. I think this means that, to correctly detect trailing content in a UTF-8 file, Schema.parse would have to call not only JsonParser.releaseBuffered(Writer) but also JsonParser.releaseBuffered(OutputStream), or first call JsonParser.getInputSource() and then use the type of the result to guess whether the parser is byte-based or char-based. If parse(InputStream) parses a stream that has trailing content, and JsonParser buffers that content, it would be best to return that content to the InputStream so that the caller can then read it. However I don't see how to do that. Is it possible that JsonParser buffers some content from a Reader, and the buffered content is all space characters and thus ignored by Avro.Schema, but the Reader has more content that JsonContent did not even read because its buffer filled up? In which case, Avro.Schema would have to check the Reader as well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
