M. Justin created PARQUET-1912:
----------------------------------

             Summary: HadoopInputFile.fromPath always causes exception on build
                 Key: PARQUET-1912
                 URL: https://issues.apache.org/jira/browse/PARQUET-1912
             Project: Parquet
          Issue Type: Bug
    Affects Versions: 1.11.1
            Reporter: M. Justin


The [{{ParquetReader.read(InputFile 
file)}}|https://www.javadoc.io/static/org.apache.parquet/parquet-hadoop/1.11.1/org/apache/parquet/hadoop/ParquetReader.html#read-org.apache.parquet.io.InputFile-]
 static factory method in {{parquet-hadoop}} creates a builder from an 
{{InputFile}}.  This method always throws an {{IllegalArgumentException}} when 
{{.build()}} is subsequently called.
{code:java}
            java.nio.Path parquetFile = getParquetFile();
            ParquetReader.read(HadoopInputFile.fromPath(new 
org.apache.hadoop.fs.Path(parquetFile.toUri()), new Configuration()))
                    .build();
{code}
{noformat}
java.lang.IllegalArgumentException: [BUG] Classes that extend Builder should 
override getReadSupport()

        at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
        at 
org.apache.parquet.hadoop.ParquetReader$Builder.getReadSupport(ParquetReader.java:310)
        at 
org.apache.parquet.hadoop.ParquetReader$Builder.build(ParquetReader.java:337)
{noformat}
The issue appears to be that the 
[{{build()}}|https://www.javadoc.io/static/org.apache.parquet/parquet-hadoop/1.11.1/org/apache/parquet/hadoop/ParquetReader.Builder.html#build--]
 method enforces that a 
[{{ReadSupport}}|https://www.javadoc.io/static/org.apache.parquet/parquet-hadoop/1.11.1/org/apache/parquet/hadoop/api/ReadSupport.html]
 value was set on the builder, but {{ParquetReader.read(InputFile file)}} 
doesn't take accept a {{ReadSupport}}, nor is there a way to set it after the 
builder has been created.

For context, my use case is reading Parquet files directly from Java.
h2. Expected behavior

I wouldn't expect a method to exist that always results in an exception being 
thrown. I would expect the {{ParquetReader.read(InputFile file)}} to be fixed, 
replaced, or removed.
h2. Workaround

I am able to achieve my goal by using 
[{{ParquetFileReader.open(InputFile)}}|https://www.javadoc.io/static/org.apache.parquet/parquet-hadoop/1.11.1/org/apache/parquet/hadoop/ParquetFileReader.html#open-org.apache.parquet.io.InputFile-]
 instead of {{ParquetReader.read(InputFile)}}.
{code:java}
            java.nio.Path parquetFile = getParquetFile();
            ParquetFileReader reader = ParquetFileReader.open(
                    HadoopInputFile.fromPath(new 
org.apache.hadoop.fs.Path(parquetFile.toUri()), new Configuration()));
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to