[
https://issues.apache.org/jira/browse/AVRO-3560?focusedWorklogId=786974&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-786974
]
ASF GitHub Bot logged work on AVRO-3560:
----------------------------------------
Author: ASF GitHub Bot
Created on: 01/Jul/22 07:42
Start Date: 01/Jul/22 07:42
Worklog Time Spent: 10m
Work Description: KalleOlaviNiemitalo commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1172042027
Could you add tests for trailing content in a UTF-8 file parsed using
Schema.parse(File file)? [JsonFactory.createParser(File
f)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1025-L1031)
apparently creates an InputStream for that, and
[JsonFactory._createParser(InputStream in, IOContext
ctxt)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1653-L1668)
calls
[ByteSourceJsonBootstrapper.constructParser](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/ByteSourceJsonBootstrapper.java#L251-L271),
which creates an UTF8StreamJsonParser in that case. UTF8StreamJsonParser is a
"byte-based" parser and inherits [JsonParser.releaseBuffered(Writer
w)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonParser.java#L836),
which just returns -1, but [UTF8StreamJsonParser.releaseBuffered(OutputStream
out)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/UTF8StreamJsonParser.java#L224-L236)
can write something to the OutputStream. I think this means that, to correctly
detect trailing content in a UTF-8 file, Schema.parse would have to call not
only JsonParser.releaseBuffered(Writer) but also
JsonParser.releaseBuffered(OutputStream), or first call
JsonParser.getInputSource() and then use the type of the result to guess
whether the parser is byte-based or char-based.
If parse(InputStream) parses a stream that has trailing content, and
JsonParser buffers that content, it would be best to return that content to the
InputStream so that the caller can then read it. However I don't see how to do
that.
Is it possible that JsonParser buffers some content from a Reader, and the
buffered content is all space characters and thus ignored by Avro.Schema, but
the Reader has more content that JsonContent did not even read because its
buffer filled up? In which case, Avro.Schema would have to check the Reader as
well.
Issue Time Tracking
-------------------
Worklog Id: (was: 786974)
Time Spent: 50m (was: 40m)
> avro ignores input after end of avsc json
> -----------------------------------------
>
> Key: AVRO-3560
> URL: https://issues.apache.org/jira/browse/AVRO-3560
> Project: Apache Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.11.0
> Reporter: Radai Rosenblatt
> Assignee: Radai Rosenblatt
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> try the following unit test:
> {code}
> @Test
> public void littleBobbySchemas() throws Exception {
> Schema.Parser parser = new Schema.Parser();
> parser.setValidate(true);
> parser.setValidateDefaults(true);
> Schema schema = parser.parse("{\"type\": \"string\"}; DROP TABLE
> STUDENTS");
> Assert.assertNotNull(schema);
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)