Guozhen Yang created FLINK-32590:
------------------------------------
Summary: Fail to read flink parquet filesystem table stored in
hive metastore service.
Key: FLINK-32590
URL: https://issues.apache.org/jira/browse/FLINK-32590
Project: Flink
Issue Type: Bug
Components: Connectors / Hive, Formats (JSON, Avro, Parquet, ORC,
SequenceFile)
Affects Versions: 1.17.1
Reporter: Guozhen Yang
h2. Summary:
Fail to read flink parquet filesystem table stored in hive metastore service.
h2. The problem:
When I try to read a flink parquet filesystem table stored in hive metastore
service, I got the following exception.
{noformat}
java.lang.RuntimeException: One or more fetchers have encountered exception
at
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:261)
~[flink-connector-files-1.17.1.jar:1.17.1]
at
org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:169)
~[flink-connector-files-1.17.1.jar:1.17.1]
at
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:131)
~[flink-connector-files-1.17.1.jar:1.17.1]
at
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:417)
~[flink-dist-1.17.1.jar:1.17.1]
at
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:68)
~[flink-dist-1.17.1.jar:1.17.1]
at
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
~[flink-dist-1.17.1.jar:1.17.1]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:550)
~[flink-dist-1.17.1.jar:1.17.1]
at
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:231)
~[flink-dist-1.17.1.jar:1.17.1]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:839)
~[flink-dist-1.17.1.jar:1.17.1]
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:788)
~[flink-dist-1.17.1.jar:1.17.1]
at
org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:952)
~[flink-dist-1.17.1.jar:1.17.1]
at
org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:931)
~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:745)
~[flink-dist-1.17.1.jar:1.17.1]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
~[flink-dist-1.17.1.jar:1.17.1]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_345]
Caused by: java.lang.NoSuchMethodError:
shaded.parquet.org.apache.thrift.TBaseHelper.hashCode(J)I
at org.apache.parquet.format.ColumnChunk.hashCode(ColumnChunk.java:812)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at java.util.AbstractList.hashCode(AbstractList.java:541) ~[?:1.8.0_345]
at org.apache.parquet.format.RowGroup.hashCode(RowGroup.java:704)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at java.util.HashMap.hash(HashMap.java:340) ~[?:1.8.0_345]
at java.util.HashMap.put(HashMap.java:613) ~[?:1.8.0_345]
at
org.apache.parquet.format.converter.ParquetMetadataConverter.generateRowGroupOffsets(ParquetMetadataConverter.java:1411)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter.access$600(ParquetMetadataConverter.java:144)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1461)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter$3.visit(ParquetMetadataConverter.java:1437)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter$RangeMetadataFilter.accept(ParquetMetadataConverter.java:1207)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1437)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:583)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:777)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:658)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReader(ParquetVectorizedInputFormat.java:127)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.flink.formats.parquet.ParquetVectorizedInputFormat.createReader(ParquetVectorizedInputFormat.java:75)
~[flink-sql-parquet-1.17.1.jar:1.17.1]
at
org.apache.flink.connector.file.table.FileInfoExtractorBulkFormat.createReader(FileInfoExtractorBulkFormat.java:109)
~[flink-connector-files-1.17.1.jar:1.17.1]
at
org.apache.flink.connector.file.src.impl.FileSourceSplitReader.checkSplitOrStartNext(FileSourceSplitReader.java:112)
~[flink-connector-files-1.17.1.jar:1.17.1]
at
org.apache.flink.connector.file.src.impl.FileSourceSplitReader.fetch(FileSourceSplitReader.java:65)
~[flink-connector-files-1.17.1.jar:1.17.1]
at
org.apache.flink.connector.base.source.reader.fetcher.FetchTask.run(FetchTask.java:58)
~[flink-connector-files-1.17.1.jar:1.17.1]
at
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:162)
~[flink-connector-files-1.17.1.jar:1.17.1]
at
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:114)
~[flink-connector-files-1.17.1.jar:1.17.1]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_345]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_345]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
~[?:1.8.0_345]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
~[?:1.8.0_345]
... 1 more
{noformat}
h2. Possible reason:
When I start the cluster with the "-verbse:class" opt, I got classloading
message shown below.
{code:bash}
# how I start the cluster
FLINK_ENV_JAVA_OPTS='-verbose:class' bin/start-cluster.sh
{code}
{noformat}
[Loaded shaded.parquet.org.apache.thrift.TBaseHelper from
file:/Users/guozhenyang/Tools/flink-1.17.1/lib/flink-sql-connector-hive-3.1.3_2.12-1.17.1.jar]
[Loaded org.apache.parquet.format.ColumnChunk from
file:/Users/guozhenyang/Tools/flink-1.17.1/lib/flink-sql-parquet-1.17.1.jar]
{noformat}
I assume there maybe conflict between the libthrift libs contained in
_flink-sql-connector-hive-3.1.3_2.12-1.17.1.jar_ and
{_}flink-sql-parquet-1.17.1.jar{_}.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)