[jira] [Commented] (HIVE-10114) Split strategies for ORC

Gopal V (JIRA) Mon, 06 Apr 2015 16:57:34 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482232#comment-14482232
 ]


Gopal V commented on HIVE-10114:
--------------------------------

Patch LGTM  - +1.

Tested this off 10Tb and it does handle the exit in the middle cleanly, so that 
the system does not get stuck when a query gets cancelled.

{code}
2015-04-06 16:51:09,536 WARN [ORC_GET_SPLITS #8] hdfs.DFSClient: DFS 
chooseDataNode: got # 1 IOException, will wait for 743.0372945716757 msec.
2015-04-06 16:51:09,538 WARN [ORC_GET_SPLITS #1] ipc.Client: interrupted 
waiting to send rpc request to server
java.lang.InterruptedException
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
        at java.util.concurrent.FutureTask.get(FutureTask.java:191)
        at 
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1046)
        at org.apache.hadoop.ipc.Client.call(Client.java:1441)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
        at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
        at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:360)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:237)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:924)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:836)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:702)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
{code}

> Split strategies for ORC
> ------------------------
>
>                 Key: HIVE-10114
>                 URL: https://issues.apache.org/jira/browse/HIVE-10114
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 1.2.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-10114.1.patch, HIVE-10114.2.patch, 
> HIVE-10114.3.patch, HIVE-10114.4.patch, HIVE-10114.5.patch
>
>
> ORC split generation does not have clearly defined strategies for different 
> scenarios (many small orc files, few small orc files, many large files etc.). 
> Few strategies like storing the file footer in orc split, making entire file 
> as a orc split already exists. This JIRA to make the split generation 
> simpler, support different strategies for various use cases (BI, ETL, ACID 
> etc.) and to lay the foundation for HIVE-7428.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10114) Split strategies for ORC

Reply via email to