[
https://issues.apache.org/jira/browse/AVRO-793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thiruvalluvan M. G. updated AVRO-793:
-------------------------------------
Attachment: AVRO-793-test.patch
AVRO-793.patch
Very subtle bug. If there is an array needs to be skipped and that happens to
be the last field of a record, and the record is contained in an outer array,
it does not get skipped properly.
The test patch has the test that catches the bug and the main patch has the
solution.
> A strange problem when I am trying to read avro record with a subset of the
> schema.
> -----------------------------------------------------------------------------------
>
> Key: AVRO-793
> URL: https://issues.apache.org/jira/browse/AVRO-793
> Project: Avro
> Issue Type: Bug
> Components: java
> Affects Versions: 1.5.0
> Environment: Avro1.5,Windows xp/Ubuntu 10.0.4
> Reporter: Yingzhong Xu
> Assignee: Thiruvalluvan M. G.
> Priority: Critical
> Labels: Avro, Reading, Schema, Write
> Fix For: 1.5.1
>
> Attachments: AVRO-793-test.patch, AVRO-793.patch
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> Hi, all. When I am trying to read avro file with a subset of that
> schema(because I do not need all the details).I meet a strange problem.
> 1.I write data using this schema:
> {
> "name": "relation",
> "type": "record",
> "fields": [
> {
> "name": "timestamp",
> "type": "long"
> },
> {
> "name": "type",
> "type": {
> "type": "map",
> "values":{
> "type" : "array",
> "items": {
> "type":"record",
> "name":"sdf",
> "fields": [
> {
> "name": "device",
> "type": "string"
> },
> {
> "name": "children",
> "type": {
> "type": "array",
> "items": "string"
> }
> }
> ]
> }
> }
> }
> }
> ]
> }
> 2.Here is a JSONObject for that schema.
> {
> "timestamp":1234567890,
> "type":{
> "WMA":[
> {
> "device":"WMA1",
> "children":["WMB1","WMB2"]
> },
> {
> "device":"WMA2",
> "children":["WMB1","WMB2"]
> }
> ]
> }
> }
> 3.I write that record succefully.And it is okay if I use this schema for
> reading:
> {
> "name": "relation",
> "type": "record",
> "fields": [
> {
> "name": "timestamp",
> "type": "long"
> },
> {
> "name": "type",
> "type": {
> "type": "map",
> "values":{
> "type" : "array",
> "items": {
> "type":"record",
> "name":"sdf",
> "fields": [
> {
> "name": "children",
> "type": {
> "type": "array",
> "items": "string"
> }
> }
> ]
> }
> }
> }
> }
> ]
> }
> the result is :
> {
> "timestamp":1234567890,
> "type":{
> "WMA":[
> {
> "children":["WMB1","WMB2"]
> },
> {
> "children":["WMB1","WMB2"]
> }
> ]
> }
> }
> 4.But if i want to igonre the "children" part instead of "device", I use
> this schema for reading:
> {
> "name": "relation",
> "type": "record",
> "fields": [
> {
> "name": "timestamp",
> "type": "long"
> },
> {
> "name": "type",
> "type": {
> "type": "map",
> "values":{
> "type" : "array",
> "items": {
> "type":"record",
> "name":"sdf",
> "fields": [
> {
> "name": "device",
> "type": "string"
> }
> ]
> }
> }
> }
> }
> ]
> }
> Unfortunately,I get exception:
> java.lang.ArrayIndexOutOfBoundsException: -8
> cause:java.lang.ArrayIndexOutOfBoundsException
> at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:122)
> at org.apache.avro.io.BinaryDecoder.skipString(BinaryDecoder.java:262)
> at org.apache.avro.io.ValidatingDecoder.skipString(ValidatingDecoder.java:113)
> at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:60)
> at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
> at org.apache.avro.io.parsing.SkipParser.skipRepeater(SkipParser.java:83)
> at org.apache.avro.io.ValidatingDecoder.skipArray(ValidatingDecoder.java:195)
> at org.apache.avro.io.ParsingDecoder.skipTopSymbol(ParsingDecoder.java:70)
> at org.apache.avro.io.parsing.SkipParser.skipTo(SkipParser.java:71)
> at org.apache.avro.io.parsing.SkipParser.skipSymbol(SkipParser.java:93)
> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:226)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> at
> org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:127)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:162)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:196)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:140)
> at
> org.apache.avro.generic.GenericDatumReader.readMap(GenericDatumReader.java:233)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:141)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:167)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:236)
> at org.apache.avro.file.DataFileStream.next(DataFileStream.java:223)
> at AvroUtilTest.read(AvroUtilTest.java:77)
> at AvroUtilTest.main(AvroUtilTest.java:61)
> As Scott Carey said,I did like this and it worked.How to fix this bug?
> Scott Carey:
> 2: If you change the schema you write with by making reversing the order of
> the fields of "sdf" (array, then string), are the results the same?
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira