[
https://issues.apache.org/jira/browse/HIVE-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482232#comment-14482232
]
Gopal V commented on HIVE-10114:
--------------------------------
Patch LGTM - +1.
Tested this off 10Tb and it does handle the exit in the middle cleanly, so that
the system does not get stuck when a query gets cancelled.
{code}
2015-04-06 16:51:09,536 WARN [ORC_GET_SPLITS #8] hdfs.DFSClient: DFS
chooseDataNode: got # 1 IOException, will wait for 743.0372945716757 msec.
2015-04-06 16:51:09,538 WARN [ORC_GET_SPLITS #1] ipc.Client: interrupted
waiting to send rpc request to server
java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1046)
at org.apache.hadoop.ipc.Client.call(Client.java:1441)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
at
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
at
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:360)
at
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316)
at
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:237)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:924)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:836)
at
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:702)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
> Split strategies for ORC
> ------------------------
>
> Key: HIVE-10114
> URL: https://issues.apache.org/jira/browse/HIVE-10114
> Project: Hive
> Issue Type: Improvement
> Affects Versions: 1.2.0
> Reporter: Prasanth Jayachandran
> Assignee: Prasanth Jayachandran
> Attachments: HIVE-10114.1.patch, HIVE-10114.2.patch,
> HIVE-10114.3.patch, HIVE-10114.4.patch, HIVE-10114.5.patch
>
>
> ORC split generation does not have clearly defined strategies for different
> scenarios (many small orc files, few small orc files, many large files etc.).
> Few strategies like storing the file footer in orc split, making entire file
> as a orc split already exists. This JIRA to make the split generation
> simpler, support different strategies for various use cases (BI, ETL, ACID
> etc.) and to lay the foundation for HIVE-7428.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)