Steve Ogden created PIG-3720:
--------------------------------

             Summary: Nested concats of binary conditionals take 1/2 hour to 
parse
                 Key: PIG-3720
                 URL: https://issues.apache.org/jira/browse/PIG-3720
             Project: Pig
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.10.0
            Reporter: Steve Ogden
            Priority: Minor


This statement takes over 1/2 hour to parse. Seems to be related to the 
conditionals. Removing them and just running the nested concats, it parses fast:

fact_tsgsrtd_dim_hash = foreach tsgsrtd generate checksum,
        UPPER(
                CONCAT((no_of_rics == '\\N' ? '0' : no_of_rics),
                CONCAT(request_start_dttm,
                CONCAT(request_end_dttm,
                CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list),
                CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype),
                CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list),
                CONCAT((frequency == '\\N' ? 'UNKNOWN' : frequency),
                CONCAT((points == '\\N' ? '0' : points),
                CONCAT((multiplier == '\\N' ? '0' : multiplier),
                CONCAT((qos == '\\N' ? 'UNKNOWN' : qos),
                CONCAT((pe == '\\N' ? '0' : pe),
                (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ? 'RTD' 
: (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN')))
                ))))))))))));


I noticed it I split it, do half the conditionals in one relation, then take 
the results of that and create another relation and do the other half of the 
conditionals, it parses in less than a minute:

fact_tsgsrtd_cat1 = foreach tsgsrtd generate checksum, points, multiplier, qos, 
pe, event_type,
                CONCAT(CONCAT((no_of_rics == '\\N' ? '0' : 
no_of_rics),'.000000000'),
                CONCAT(request_start_dttm,
                CONCAT(request_end_dttm,
                CONCAT((adjs_list == '\\N' ? 'UNKNOWN' : adjs_list),
                CONCAT((event_datatype == '\\N' ? 'UNKNOWN' : event_datatype),
                CONCAT((facts_list == '\\N' ? 'UNKNOWN' : facts_list),
                (frequency == '\\N' ? 'UNKNOWN' : frequency)
                )))))) as cat1;

fact_tsgsrtd_dim_hash = foreach fact_tsgsrtd_cat1 generate checksum,
        UPPER(
                CONCAT(cat1,
                CONCAT((points == '\\N' ? '0' : points),
                CONCAT((multiplier == '\\N' ? '0' : multiplier),
                CONCAT((qos == '\\N' ? 'UNKNOWN' : qos),
                CONCAT(CONCAT((pe == '\\N' ? '0' : pe), '.0000'),
                (event_type == 'GSREQ' ? 'GS' : (event_type == 'RICREQ' ? 'RTD' 
: (event_type == 'TSREQ' ? 'TS' : 'UNKNOWN')))
                )))))) as ts_dim_hash;




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to