Romster commented on a change in pull request #13513:
URL: https://github.com/apache/beam/pull/13513#discussion_r544087349
##########
File path:
sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlSourceTest.java
##########
@@ -873,6 +881,46 @@ public void testSplitAtFractionExhaustiveSingleByte()
throws Exception {
assertSplitAtFractionExhaustive(source, options);
}
+ @Test
+ public void testNoBufferOverflowThrown() throws IOException {
+ // The magicNumber was found imperatively and will be different for
different xml content.
+ // Test with the current setup causes BufferOverflow in
+ // XMLReader#getFirstOccurenceOfRecordElement method,
+ // if the specific corner case is not handled
+ final int magicNumber = 183;
+ StringBuilder sb = new StringBuilder();
Review comment:
This is an artificial example, but the main condition is that you have
some amout of `<recordBlahBlah>` tags
The real case looks like
```
<root>
<record>
<recordSomething>
</recordSomething>
</record>
<record>
<recordSomething>
</recordSomething>
</record>
...
<record>
<recordSomething>
</recordSomething>
</record>
</root>
```
The behaviour seems to be environment-dependent, so I'm not sure if even my
example will be reproduced in another environment (it depends also on how many
bytes we are reading from the channel)
##########
File path:
sdks/java/io/xml/src/test/java/org/apache/beam/sdk/io/xml/XmlSourceTest.java
##########
@@ -873,6 +881,46 @@ public void testSplitAtFractionExhaustiveSingleByte()
throws Exception {
assertSplitAtFractionExhaustive(source, options);
}
+ @Test
+ public void testNoBufferOverflowThrown() throws IOException {
+ // The magicNumber was found imperatively and will be different for
different xml content.
+ // Test with the current setup causes BufferOverflow in
+ // XMLReader#getFirstOccurenceOfRecordElement method,
+ // if the specific corner case is not handled
+ final int magicNumber = 183;
+ StringBuilder sb = new StringBuilder();
Review comment:
This is an artificial example, but the main condition is that you have
some amount of `<recordBlahBlah>` tags
The real case looks like
```
<root>
<record>
<recordSomething>
</recordSomething>
</record>
<record>
<recordSomething>
</recordSomething>
</record>
...
<record>
<recordSomething>
</recordSomething>
</record>
</root>
```
The behaviour seems to be environment-dependent, so I'm not sure if even my
example will be reproduced in another environment (it depends also on how many
bytes we are reading from the channel)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]