[ 
https://issues.apache.org/jira/browse/FLINK-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997938#comment-16997938
 ] 

Rui Li commented on FLINK-13998:
--------------------------------

We managed to make reading work, but the bigger problem is with writing to Orc 
tables. In Hive 2.0.x, the orc writer has to be used by a single thread. It 
must be the same thread to create/use/close the writer, which doesn't fit 
Flink's threading model. And the related error would be something like:
{noformat}
Caused by: java.lang.IllegalArgumentException: Owner thread expected 
Thread[Source: Values(tuples=[[{ 3, _UTF-16LE'c' }]], values=[EXPR$0, EXPR$1]) 
-> SinkConversionToRow -> Sink: Unnamed (1/1),5,Flink Task Threads], got 
Thread[Legacy Source Thread - Source: Values(tuples=[[{ 3, _UTF-16LE'c' }]], 
values=[EXPR$0, EXPR$1]) -> SinkConversionToRow -> Sink: Unnamed (1/1),5,Flink 
Task Threads]
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:119)
        at org.apache.orc.impl.MemoryManager.checkOwner(MemoryManager.java:104)
        at org.apache.orc.impl.MemoryManager.addWriter(MemoryManager.java:118)
        at org.apache.orc.impl.WriterImpl.<init>(WriterImpl.java:204)
        at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.<init>(WriterImpl.java:91)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:308)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:101)
        at 
org.apache.flink.connectors.hive.HiveOutputFormatFactory$HiveOutputFormat.writeRecord(HiveOutputFormatFactory.java:205)
        at 
org.apache.flink.connectors.hive.HiveOutputFormatFactory$HiveOutputFormat.writeRecord(HiveOutputFormatFactory.java:177)
        at 
org.apache.flink.table.filesystem.SingleDirectoryWriter.write(SingleDirectoryWriter.java:52)
        at 
org.apache.flink.table.filesystem.FileSystemOutputFormat.writeRecord(FileSystemOutputFormat.java:120)
{noformat}

> Fix ORC test failure with Hive 2.0.x
> ------------------------------------
>
>                 Key: FLINK-13998
>                 URL: https://issues.apache.org/jira/browse/FLINK-13998
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Hive
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>            Priority: Major
>             Fix For: 1.11.0
>
>
> Our test is using local file system, and orc in HIve 2.0.x seems having issue 
> with that. 
> {code}
> 06:54:43.156 [ORC_GET_SPLITS #0] ERROR org.apache.hadoop.hive.ql.io.AcidUtils 
> - Failed to get files with ID; using regular API
> java.lang.UnsupportedOperationException: Only supported for DFS; got class 
> org.apache.hadoop.fs.LocalFileSystem
>       at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.ensureDfs(Hadoop23Shims.java:813) 
> ~[hive-exec-2.0.0.jar:2.0.0]
>       at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.listLocatedHdfsStatus(Hadoop23Shims.java:784)
>  ~[hive-exec-2.0.0.jar:2.0.0]
>       at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:477) 
> [hive-exec-2.0.0.jar:2.0.0]
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:890)
>  [hive-exec-2.0.0.jar:2.0.0]
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:875)
>  [hive-exec-2.0.0.jar:2.0.0]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_181]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_181]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_181]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_181]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_181]
>       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to