[jira] Commented: (HIVE-439) merge small files whenever possible

Zheng Shao (JIRA) Thu, 18 Jun 2009 19:19:32 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12721622#action_12721622
 ]


Zheng Shao commented on HIVE-439:
---------------------------------

@hive.439.3.patch:
I don't understand how the ConditionalTask gets the resolver and resolverCtx at 
execution time.

Are they serialized together with the ConditionalTask?
If so, we need to mark those classes as serializable and move them to 
ConditionalWork, right?
Otherwise it kind of breaks the implicit contract that everything that needs to 
be serialized is in the Work instead of the Task.


> merge small files whenever possible
> -----------------------------------
>
>                 Key: HIVE-439
>                 URL: https://issues.apache.org/jira/browse/HIVE-439
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: hive.439.1.patch, hive.439.2.patch, hive.439.3.patch
>
>
> There are cases when the input to a Hive job are thousands of small files. In 
> this case, there is a mapper for each file. Most of the overhead for spawning 
> all these mappers can be avoided if these small files are combined into fewer 
> larger files.
> The problem can also be addressed by having a mapper span multiple blocks as 
> in:
> https://issues.apache.org/jira/browse/HIVE-74
> Bit, it also makes sense in HIVE to merge files whenever possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-439) merge small files whenever possible

Reply via email to