[ 
https://issues.apache.org/jira/browse/HCATALOG-448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HCATALOG-448:
--------------------------------------

    Attachment: hcatalog-448-for-0.4-codebase.patch

Attaching a patch which was tested with 0.4 code base. 

Sample script used for testing:
================================
a = load 'hit_data' using org.apache.hcatalog.pig.HCatLoader();
b = filter a by (load_date == '20120515');
store b into 'duplicate_hit_data' using org.apache.hcatalog.pig.HCatStorer();

Results:
========

1. Without patch:
-----------------
Job took 27 minutes to process 2,551,157 records and writing 3,993,443,494 
bytes to HDFS


2. With Patch:
-----------------
Job took 2 minutes to process 2,551,157 records and writing 3,993,443,494 bytes 
to HDFS


                
> HCatStorer performance is 4x slower in HCat 0.4 than HCat 0.2
> -------------------------------------------------------------
>
>                 Key: HCATALOG-448
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-448
>             Project: HCatalog
>          Issue Type: Bug
>    Affects Versions: 0.4.1
>            Reporter: Rohini Palaniswamy
>            Assignee: Mithun Radhakrishnan
>            Priority: Critical
>         Attachments: hcatalog-448-for-0.4-codebase.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to