[
https://issues.apache.org/jira/browse/FLINK-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997938#comment-16997938
]
Rui Li commented on FLINK-13998:
--------------------------------
We managed to make reading work, but the bigger problem is with writing to Orc
tables. In Hive 2.0.x, the orc writer has to be used by a single thread. It
must be the same thread to create/use/close the writer, which doesn't fit
Flink's threading model. And the related error would be something like:
{noformat}
Caused by: java.lang.IllegalArgumentException: Owner thread expected
Thread[Source: Values(tuples=[[{ 3, _UTF-16LE'c' }]], values=[EXPR$0, EXPR$1])
-> SinkConversionToRow -> Sink: Unnamed (1/1),5,Flink Task Threads], got
Thread[Legacy Source Thread - Source: Values(tuples=[[{ 3, _UTF-16LE'c' }]],
values=[EXPR$0, EXPR$1]) -> SinkConversionToRow -> Sink: Unnamed (1/1),5,Flink
Task Threads]
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:119)
at org.apache.orc.impl.MemoryManager.checkOwner(MemoryManager.java:104)
at org.apache.orc.impl.MemoryManager.addWriter(MemoryManager.java:118)
at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:204)
at
org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:91)
at
org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:308)
at
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:101)
at
org.apache.flink.connectors.hive.HiveOutputFormatFactory$HiveOutputFormat.writeRecord(HiveOutputFormatFactory.java:205)
at
org.apache.flink.connectors.hive.HiveOutputFormatFactory$HiveOutputFormat.writeRecord(HiveOutputFormatFactory.java:177)
at
org.apache.flink.table.filesystem.SingleDirectoryWriter.write(SingleDirectoryWriter.java:52)
at
org.apache.flink.table.filesystem.FileSystemOutputFormat.writeRecord(FileSystemOutputFormat.java:120)
{noformat}
> Fix ORC test failure with Hive 2.0.x
> ------------------------------------
>
> Key: FLINK-13998
> URL: https://issues.apache.org/jira/browse/FLINK-13998
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Hive
> Reporter: Xuefu Zhang
> Assignee: Xuefu Zhang
> Priority: Major
> Fix For: 1.11.0
>
>
> Our test is using local file system, and orc in HIve 2.0.x seems having issue
> with that.
> {code}
> 06:54:43.156 [ORC_GET_SPLITS #0] ERROR org.apache.hadoop.hive.ql.io.AcidUtils
> - Failed to get files with ID; using regular API
> java.lang.UnsupportedOperationException: Only supported for DFS; got class
> org.apache.hadoop.fs.LocalFileSystem
> at
> org.apache.hadoop.hive.shims.Hadoop23Shims.ensureDfs(Hadoop23Shims.java:813)
> ~[hive-exec-2.0.0.jar:2.0.0]
> at
> org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedHdfsStatus(Hadoop23Shims.java:784)
> ~[hive-exec-2.0.0.jar:2.0.0]
> at
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:477)
> [hive-exec-2.0.0.jar:2.0.0]
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:890)
> [hive-exec-2.0.0.jar:2.0.0]
> at
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:875)
> [hive-exec-2.0.0.jar:2.0.0]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [?:1.8.0_181]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_181]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [?:1.8.0_181]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [?:1.8.0_181]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [?:1.8.0_181]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)