KalleOlaviNiemitalo commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1172042027

   Could you add tests for trailing content in a UTF-8 file parsed using 
Schema.parse(File file)? [JsonFactory.createParser(File 
f)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1025-L1031)
 apparently creates an InputStream for that, and 
[JsonFactory._createParser(InputStream in, IOContext 
ctxt)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1653-L1668)
 calls 
[ByteSourceJsonBootstrapper.constructParser](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/ByteSourceJsonBootstrapper.java#L251-L271),
 which creates an UTF8StreamJsonParser in that case. UTF8StreamJsonParser is a 
"byte-based" parser and inherits [JsonParser.releaseBuffered(Writer 
w)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91
 
e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonParser.java#L836),
 which just returns -1, but [UTF8StreamJsonParser.releaseBuffered(OutputStream 
out)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/UTF8StreamJsonParser.java#L224-L236)
 can write something to the OutputStream. I think this means that, to correctly 
detect trailing content in a UTF-8 file, Schema.parse would have to call not 
only JsonParser.releaseBuffered(Writer) but also 
JsonParser.releaseBuffered(OutputStream), or first call 
JsonParser.getInputSource() and then use the type of the result to guess 
whether the parser is byte-based or char-based.
   
   If parse(InputStream) parses a stream that has trailing content, and 
JsonParser buffers that content, it would be best to return that content to the 
InputStream so that the caller can then read it. However I don't see how to do 
that.
   
   Is it possible that JsonParser buffers some content from a Reader, and the 
buffered content is all space characters and thus ignored by Avro.Schema, but 
the Reader has more content that JsonContent did not even read because its 
buffer filled up? In which case, Avro.Schema would have to check the Reader as 
well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to