[ https://issues.apache.org/jira/browse/PIG-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
siddhartha Pattanaik updated PIG-3119: -------------------------------------- Hadoop Flags: (was: Incompatible change) > REGEX_EXTRACT_ALL custom with aggregation function > -------------------------------------------------- > > Key: PIG-3119 > URL: https://issues.apache.org/jira/browse/PIG-3119 > Project: Pig > Issue Type: Bug > Components: build, grunt > Affects Versions: 0.9.1 > Environment: OS -version > ================================ > Linux version 2.6.18-194.3.1.el5 (mockbu...@builder10.centos.org) (gcc > version 4.1.2 20080704 (Red Hat 4.1.2-48)) > software installed > ======================= > hadoop-1.0.4 > pig-0.9.1 > Hardware details > ==================================== > processor : 0 > vendor_id : GenuineIntel > cpu family : 6 > model : 26 > model name : Intel(R) Xeon(R) CPU X5560 @ 2.80GHz > stepping : 4 > cpu MHz : 2800.098 > cache size : 8192 KB > fpu : yes > fpu_exception : yes > cpuid level : 11 > Reporter: siddhartha Pattanaik > Priority: Critical > Labels: newbie > Fix For: 0.9.1 > > Attachments: starwar_log1.txt > > Original Estimate: 276h > Remaining Estimate: 276h > > Hi , > I have a use case in my project requirement, > The i/p file consist of the following pattern:- > 192.168.90.36 - - [16/May/2012:16:00:11 -0700] "GET > /img/explore/encyclopedia/characters/yoda_card.jpg HTTP/1.1" 200 22620 > "http://www.starwars.com/explore/encyclopedia/characters/2/featured/" > "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.0; Trident/5.0)" > "Wookie-Cookie=474ca6b302a46696a1ec55f4b656f8c3; > __utma=181359608.119611689.1337206567.1337206567.1337206567.1; > __utmb=181359608.79.9.1337209104786; __utmc=181359608; > __utmz=181359608.1337206567.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); > JSESSIONID=aHX_NQheRq08" "-" 0 > I want to run a aggregate function along with regex_extract_all to extract > the desired data. > Even though the i/p file is parsing.I have issue with aggregate function > working on it. > Please find the below pig script:- > ***************Ip_adress-count************************ > Ip_adress_count.pig > > A = LOAD 'starwar_log1' USING TextLoader AS (line:chararray); > B = FOREACH A GENERATE FLATTEN (REGEX_EXTRACT_ALL(line,'^(\\S+) (\\S+) (\\S+) > \\[([\\w:/]+\\s[+\\-]\\d{4})\\] "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)" > "([^"]*)" (\\S+) ') ) AS > ( > remoteAddr: chararray, > remoteLogname: chararray, > user: chararray, > time: chararray, > request: chararray, > status: int, > bytes_string: chararray, > referrer: chararray, > Mozilla: chararray, > wookie_cookie: chararray, > browser3: chararray, > acess_status:int > ); > C = group B by remoteAddr; > D = foreach C generate COUNT(B) as ip_adress_count; > E = order D by ip_adress_count; > F = STORE E INTO ‘ip_adress_count/' using PigStorage(','); > Expected O/p > =========================== > ip_adress_count > remoteAddr,ip_adress_count > 192.168.90.36,19 > 192.168.90.37,1 > There is no parsing issue but the aggregate function count() is not working > over the regex_extract_all function for regular expression. > Please do the need.The requirement is I need the count of the ip adresses > from the ip data. > thanks, > siddharth > contact -8763666372 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira