[ https://issues.apache.org/jira/browse/PARQUET-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483098#comment-14483098 ]
Alosh Bennett commented on PARQUET-244: --------------------------------------- Code to reproduce the bug. This will create a Parquet file that should contain a UTF-8 column split across multiple pages. The bug can be reproduced by reading this file from a sample program or via spark-shell. {code:title=Bug.java|borderStyle=solid} public static void main(String[] args) throws IOException { String parquetFile = "file:///home/abennett/parquet/bug/sample.par"; String schema = "message Document { required binary message (UTF8); }"; WriteSupport<Document> writeSupport = new WriteSupport<Document>() { RecordConsumer rec; @Override public WriteContext init(Configuration configuration) { return new WriteContext(MessageTypeParser.parseMessageType(schema), new HashMap<>()); } @Override public void prepareForWrite(RecordConsumer recordConsumer) { rec = recordConsumer; } @Override public void write(Document document) { rec.startMessage(); rec.startField("message", 0); rec.addBinary(Binary.fromString(document.message)); rec.endField("message", 0); rec.endMessage(); } }; ParquetWriter<Document> writer = new ParquetWriter<Document>(new Path(parquetFile), writeSupport, CompressionCodecName.SNAPPY, ParquetWriter.DEFAULT_BLOCK_SIZE, ParquetWriter.DEFAULT_PAGE_SIZE, ParquetWriter.DEFAULT_PAGE_SIZE, true, false, ParquetProperties.WriterVersion.PARQUET_2_0); Document doc = new Document(); for(int i = 0; i < 100000; i++) { doc.message = UUID.randomUUID().toString(); writer.write(doc); } writer.close(); } private static class Document { String message; } {code} > DeltaByteArrayReader fails with ArrayIndexOutOfBoundsException when moving > across pages > --------------------------------------------------------------------------------------- > > Key: PARQUET-244 > URL: https://issues.apache.org/jira/browse/PARQUET-244 > Project: Parquet > Issue Type: Bug > Components: parquet-mr > Affects Versions: parquet-mr_1.6.0 > Reporter: Alosh Bennett > > DeltaByteArrayReader.readBytes() fails with ArrayIndexOutOfBoundsException > soon after it has processed a new page via initFromPage(). This is happening > because -- This message was sent by Atlassian JIRA (v6.3.4#6332)