Pau Alarcon created PARQUET-1871:
-------------------------------------
Summary: ProtoReader does not iterate over the parquet file
correctly.
Key: PARQUET-1871
URL: https://issues.apache.org/jira/browse/PARQUET-1871
Project: Parquet
Issue Type: Bug
Components: parquet-mr
Affects Versions: 1.11.0
Reporter: Pau Alarcon
The `ProtoParquetReader` does not iterate over the parquet file correctly, but
it gets stuck in the first element and keeps reading as many times as elements
the file contained.
In my Scala example I am just reading from a local file that I know for sure it
contains right data.
```
val hadoopCOnf = new Configuration()
val outfile: String = genTemporaryFile()
val r: ParquetReader[Event.Builder] = {
ProtoParquetReader.builder[Event.Builder](new
Path(outfile)).withConf(hadoopCOnf).build()
}
```
Notice that the proto schema that I am using is generated from
[https://scalapb.github.io/]
The generated proto implements com.google.protobuf.GeneratedMessageV3.
See an example on how the ProtoParquetReader is created line(65 and 69):
[https://github.com/monix/monix-connect/blob/master/parquet/src/test/scala/monix/connect/parquet/ProtoParquetFixture.scala#L69]
and here how it is used (notice that is defined for only one record, the same
one for multiple records would fail)
[https://github.com/monix/monix-connect/blob/master/parquet/src/test/scala/monix/connect/parquet/ProtoParquetSpec.scala#L83]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)