Pau Alarcon  created PARQUET-1871:
-------------------------------------

             Summary: ProtoReader does not iterate over the parquet file 
correctly.
                 Key: PARQUET-1871
                 URL: https://issues.apache.org/jira/browse/PARQUET-1871
             Project: Parquet
          Issue Type: Bug
          Components: parquet-mr
    Affects Versions: 1.11.0
            Reporter: Pau Alarcon 


The `ProtoParquetReader` does not iterate over the parquet file correctly, but 
it gets stuck in the first element and keeps reading as many times as elements 
the file contained.

In my Scala example I am just reading from a local file that I know for sure it 
contains right data.

```

val hadoopCOnf = new Configuration()

val outfile: String = genTemporaryFile()

val r: ParquetReader[Event.Builder] = {
 ProtoParquetReader.builder[Event.Builder](new 
Path(outfile)).withConf(hadoopCOnf).build()
}

```

Notice that the proto schema that I am using is generated from 
[https://scalapb.github.io/]

The generated proto implements com.google.protobuf.GeneratedMessageV3.

See an example on how the ProtoParquetReader is created line(65 and 69): 
[https://github.com/monix/monix-connect/blob/master/parquet/src/test/scala/monix/connect/parquet/ProtoParquetFixture.scala#L69]

and here how it is used (notice that is defined for only one record, the same 
one for multiple records would fail) 
[https://github.com/monix/monix-connect/blob/master/parquet/src/test/scala/monix/connect/parquet/ProtoParquetSpec.scala#L83]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to