[ 
https://issues.apache.org/jira/browse/PIG-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396245#comment-13396245
 ] 

Jie Li commented on PIG-410:
----------------------------

Conducted a micro benchmark with 1GB tpch data processed by one map task:

{code}
LineItems = LOAD '$input/lineitem' USING PigStorage('|') AS (orderkey:int, 
partkey:int, suppkey:int, linenumber:int, quantity:double, 
extendedprice:double, discount:double, tax:double, returnflag:chararray, 
linestatus:chararray, shipdate:chararray, commitdate:chararray, 
receiptdate:chararray, shipinstruct:chararray, shipmode:chararray, 
comment:chararray);

SubLineItems = FILTER LineItems BY shipdate == '2012-09-02';

STORE SubLineItems INTO '$output/Q1out';
{code}

The map task takes about one minute, and if we remove the types in the schema, 
the time drops to about 25 seconds. The equivalent Hive map task takes about 
only 17 seconds.

A side note is that the FILTER is not pushed above FOREACH (may need a separate 
jira to inspect), but the main issue here is whether we can delay type 
conversion. Seems there are related efforts going on PIG-2359, PIG-2633?
                
> PERFORMANCE: delay type conversion
> ----------------------------------
>
>                 Key: PIG-410
>                 URL: https://issues.apache.org/jira/browse/PIG-410
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>
> Currently, any time user declares types for loaded data, we insert a generate 
> after the load to produce data of the right type. It would be more efficient 
> to daley conversion to the point where each individual field is used.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to