[ 
https://issues.apache.org/jira/browse/PARQUET-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483098#comment-14483098
 ] 

Alosh Bennett commented on PARQUET-244:
---------------------------------------

Code to reproduce the bug. This will create a Parquet file that should contain 
a UTF-8 column split across multiple pages. The bug can be reproduced by 
reading this file from a sample program or via spark-shell.
{code:title=Bug.java|borderStyle=solid}
    public static void main(String[] args) throws IOException {
        String parquetFile = "file:///home/abennett/parquet/bug/sample.par";
        String schema = "message Document { required binary message (UTF8); }";
        WriteSupport<Document> writeSupport = new WriteSupport<Document>() {
            RecordConsumer rec;
            @Override
            public WriteContext init(Configuration configuration) {
                return new 
WriteContext(MessageTypeParser.parseMessageType(schema), new HashMap<>());
            }

            @Override
            public void prepareForWrite(RecordConsumer recordConsumer) {
                rec = recordConsumer;
            }

            @Override
            public void write(Document document) {
                rec.startMessage();
                rec.startField("message", 0);
                rec.addBinary(Binary.fromString(document.message));
                rec.endField("message", 0);
                rec.endMessage();

            }
        };
        ParquetWriter<Document> writer = new ParquetWriter<Document>(new 
Path(parquetFile), writeSupport, CompressionCodecName.SNAPPY,
                ParquetWriter.DEFAULT_BLOCK_SIZE, 
ParquetWriter.DEFAULT_PAGE_SIZE,
                ParquetWriter.DEFAULT_PAGE_SIZE, true, false, 
ParquetProperties.WriterVersion.PARQUET_2_0);
        Document doc = new Document();
        for(int i = 0; i < 100000; i++) {
            doc.message = UUID.randomUUID().toString();
            writer.write(doc);
        }
        writer.close();
    }

    private static class Document {
        String message;
    }
{code}

> DeltaByteArrayReader fails with ArrayIndexOutOfBoundsException when moving 
> across pages
> ---------------------------------------------------------------------------------------
>
>                 Key: PARQUET-244
>                 URL: https://issues.apache.org/jira/browse/PARQUET-244
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: parquet-mr_1.6.0
>            Reporter: Alosh Bennett
>
> DeltaByteArrayReader.readBytes() fails with  ArrayIndexOutOfBoundsException 
> soon after it has processed a new page via initFromPage(). This is happening 
> because 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to