Re: can not use iceberg as a sql source in flink sql according to iceberg 0.12.0

Joshua Fan Sun, 26 Sep 2021 02:49:09 -0700

Hi
I found the reason why  this exception 'java.lang.NoClassDefFoundError:
org/apache/iceberg/shaded/org/apache/parquet/hadoop/ParquetInputFormat' was
raised. Actually, it was because of the absence of class
'org/apache/hadoop/mapreduce/lib/input/FileInputFormat'.  After I put
the hadoop-mapreduce-client-core-2.7.3.jar to the classpath. It goes well.
Thanks to Openinx.


Josh

Joshua Fan <joshuafat...@gmail.com> 于2021年9月26日周日 下午3:40写道：

> Hi openinx
>
> I do not get you. what do you mean by 'Looks like the line 112 in
> HadoopReadOptions is not the first line accessing the variables in
> ParquetInputFormat.'?
> The parquet file I want to read was wrote by iceberg table without any
> explicit specified, no file format and no parquet version was specified.
> I just want to read the parquet file by iceberg, when read, there was also
> no explicit file format and parquet version.
>
> OpenInx <open...@gmail.com> 于2021年9月23日周四 下午12:34写道：
>
>> Hi Joshua
>>
>> Can you check what's the parquet version you are using ?   Looks like the
>> line 112 in HadoopReadOptions is not the first line accessing the variables
>> in ParquetInputFormat.
>>
>> [image: image.png]
>>
>> On Wed, Sep 22, 2021 at 11:07 PM Joshua Fan <joshuafat...@gmail.com>
>> wrote:
>>
>>> Hi
>>> I am glad to use iceberg as table source in flink sql, flink version is
>>> 1.13.2, and iceberg version is 0.12.0.
>>>
>>> After changed the flink version from 1.12 to 1.13, and changed some
>>> codes in FlinkCatalogFactory, the project can be build successfully.
>>>
>>> First, I tried to write data into iceberg by flink sql, and it seems go
>>> well. And then I want to verify the data, so I want to read from iceberg
>>> table, I wrote a
>>> simple sql, like "select * from
>>> iceberg_catalog.catalog_database.catalog_table", the sql can be submitted,
>>> but the flink job kept restarting by 'java.lang.NoClassDefFoundError:
>>> org/apache/iceberg/shaded/org/apache/parquet/hadoop/ParquetInputFormat'.
>>> But, actually, ParquetInputFormat was in the 
>>> iceberg-flink-runtime-0.12.0.jar.
>>> Has no idea why this can happen.
>>> The full stack trace is below:
>>> java.lang.NoClassDefFoundError:
>>> org/apache/iceberg/shaded/org/apache/parquet/hadoop/ParquetInputFormat
>>>     at
>>> org.apache.iceberg.shaded.org.apache.parquet.HadoopReadOptions$Builder.<init>(HadoopReadOptions.java:112)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.iceberg.shaded.org.apache.parquet.HadoopReadOptions$Builder.<init>(HadoopReadOptions.java:97)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.iceberg.shaded.org.apache.parquet.HadoopReadOptions.builder(HadoopReadOptions.java:85)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.iceberg.parquet.Parquet$ReadBuilder.build(Parquet.java:793)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.iceberg.flink.source.RowDataIterator.newParquetIterable(RowDataIterator.java:135)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.iceberg.flink.source.RowDataIterator.newIterable(RowDataIterator.java:86)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.iceberg.flink.source.RowDataIterator.openTaskIterator(RowDataIterator.java:74)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.iceberg.flink.source.DataIterator.updateCurrentIterator(DataIterator.java:102)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.iceberg.flink.source.DataIterator.hasNext(DataIterator.java:84)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.iceberg.flink.source.FlinkInputFormat.reachedEnd(FlinkInputFormat.java:104)
>>> ~[iceberg-flink-runtime-0.12.0-qihoo.jar:?]
>>>     at
>>> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:89)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]
>>>     at
>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]
>>>     at
>>> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]
>>>     at
>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:269)
>>> ~[flink-dist_2.11-1.13.2.jar:1.13.2]
>>> You can see that the HadoopReadOptions can be found.
>>>
>>> Any help will be appricated. Thank you.
>>>
>>> Yours sincerely
>>>
>>> Josh
>>>
>>

Re: can not use iceberg as a sql source in flink sql according to iceberg 0.12.0

Reply via email to