[ 
https://issues.apache.org/jira/browse/SPARK-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985055#comment-13985055
 ] 

Michael Armbrust commented on SPARK-1661:
-----------------------------------------

Thanks for your report.  This JIRA is for reporting bugs with Spark and its 
components.  Shark is a separate project and issues with older versions of 
Shark should probably be filed on the Shark issue tracker.

However, I did add a test to make sure the RegexSerDe was working with Spark 
SQL (which is a nearly from scratch rewrite of Shark, that will be included in 
the 1.0 release of Spark as an Alpha component).  If you find you are still 
having problems with Spark SQL, please reopen this issue.

New spark tests: https://github.com/apache/spark/pull/595

> the result of querying table created with RegexSerDe is all null
> ----------------------------------------------------------------
>
>                 Key: SPARK-1661
>                 URL: https://issues.apache.org/jira/browse/SPARK-1661
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core, SQL
>    Affects Versions: 0.9.0
>         Environment: linux 2.6.32-358.el6.x86_64,Hive 12.0,shark 0.9.0,Hadoop 
> 2.2.0
>            Reporter: likunjian
>              Labels: HQL, hadoop, hive, regex, shark
>         Attachments: log.txt
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> the result of querying table created with RegexSerDe is all null
> when i query the table created with 
> org.apache.hadoop.hive.contrib.serde2.RegexSerDe by shark,the columns in the 
> result is all null
> select * from access_log where logdate='2014-04-28' limit 10;
> OK
> ip      host    time    method  request protocol        status  size    
> referer cookieuid       requesttime     session httpxrequestedwith      agent 
>   upstreamresponsetime    logdate
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL  
>   NULL    NULL    NULL    NULL    NULL    2014-04-28
> Time taken: 4.362 seconds
> my regex is
>  ^([^ ]*) [^ ]* ([^ ]*) \\[([^\]]*)\\] \"([^ ]*) ([^ ]*) ([^ ]*)\" (-|[0-9]*) 
> (-|[0-9]*) \"(\.\+\?|-)\" ([^ ]*) ([^ ]*) ([^ ]*) \"(\.\+\?|-)\" 
> \"(\.\+\?|-)\" \"(\.\+\?|-)\"$
> nginx log example:
> 42.49.44.61 - www.xxxx.comm [20/Apr/2014:23:58:03 +0800] "GET /xxxxx/296837 
> HTTP/1.1" 200 3871 "http://www.xxxxx.com/xxxxx/296837"; - 0.015 
> 63hbb4om2cvtjs0f7d969n1uf4 "com.xxxxx.browser" "Mozilla/5.0 (Linux; U; xxxxx 
> 4.1.2; zh-cn; ZTE N919 Build/JZO54K) AppleWebKit/534.30 (KHTML, like Gecko) 
> Version/4.0 Mobile Safari/534.30" "0.015"
> 111.121.176.149 - www.xxxx.comm [20/Apr/2014:23:58:03 +0800] "GET 
> /xxxxx/264904 HTTP/1.1" 200 3827 
> "http://m.baidu.com/s?from=2001a&bd_page_type=1&word=%E8%8E%B2%E8%97%95%E6%80%8E%E6%A0%B7%E5%8D%A4%E6%89%8D%E5%A5%BD%E5%90%83";
>  - 0.015 ft7tr4b06b23ub9lnugdf4gcq3 "-" "Mozilla/5.0 (Linux; U; xxxxx 4.1.2; 
> zh-CN; 8190Q Build/JZO54K) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 
> UCBrowser/9.5.2.394 U3/0.8.0 Mobile Safari/533.1" "0.015"
> 222.209.97.169 - www.xxxx.comm [20/Apr/2014:23:58:04 +0800] "GET / HTTP/1.1" 
> 200 3188 "http://m.idea123.cn/food.html"; - 0.014 - "-" "Lenovo S890/S100 
> Linux/3.0.13 xxxxx/4.0.3 Release/12.12.2011 Browser/AppleWebKit534.30 
> Profile/MIDP-2.0 Configuration/CLDC-1.1 Mobile Safari/534.30" "0.014"
> 59.36.84.241 - www.xxxx.comm [20/Apr/2014:23:58:05 +0800] "GET 
> /app/xxxxx/topic/view.php?id=138555 HTTP/1.1" 200 3151 "-" - 0.009 - "-" 
> "Mozilla/5.0 (Linux; U; xxxxx 2.3.7; zh-cn; TD500 Build/GWK74) 
> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30" 
> "0.009"
> 113.242.39.81 - www.xxxx.comm [20/Apr/2014:23:58:07 +0800] "GET /xxxxx/419691 
> HTTP/1.1" 200 4174 "http://www.xxxx.comm/xxxxx/all/308?p=3"; - 0.013 
> 1n579ukg1gho7i7mr3q8ic8j97 "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 
> 10_5_7; en-us) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 
> Safari/530.17; 360browser(securitypay,securityinstalled); 
> 360(xxxxx,uppayplugin); 360 Aphone Browser (5.3.1)" "0.013"
> Very strange, I execute a query in Hive is normal. I really do not 
> understand. . .  :-(
> OK
> ip      host    time    method  request protocol        status  size    
> referer cookieuid       requesttime     session httpxrequestedwith      agent 
>   upstreamresponsetime    logdate
> 14.151.40.117   www.xxxx.com  10/Apr/2014:23:58:01 +0800      POST    
> /xxxx.jsp?appid=4&appkey=573bbd2fbd1a6bac082ff4727d952ba3&format=json&sessionid=1397145480&vc=24&vn=v3.5.1&loguid=&deviceid=0f607264fc6318a92b9e13c65db7cd3c%7C02C105A6-6DC8-43D5-879E-46AD603AC34E%7C2096145A-114C-4B6E-BE91-1AC740D9BD21&channel=appstore&method=Update.forceUpdate
>       HTTP/1.1        200     88      -  0.052    -       -       xxxxx xxxxx 
> iPhone Client API 0.005   2014-04-11
> 112.91.89.149   www.xxxx.com  10/Apr/2014:23:58:02 +0800      POST    
> /xxxx.jsp?appid=2&appkey=9ef269eec4f7a9d07c73952d06b5413f&format=json&sessionid=1397141523139&vc=53&vn=3.5.2&loguid=2888069&deviceid=xxxxx354244055617944&uuid=c7d71164-bae7-4d51-8032-ec9cfafb5e7e&channel=default&method=Info.getinfoV3&virtual=O1wq33EpQ%2Fa%2B8NxjEsy57ZezKBefR85F4L%2BXPZlcSETtw5Fl%2FdQFuLuNUg4Co9zHJyw2jPJipOR3%0Acrc59PUeFvM5hU82AdDQGjIkXa%2FLMWtxuJYz6fJBHLxkQRPWUVCdpeENwrnYlvgAY6DqM9G%2Fh5g1%0AbZamnAgUMERY1iZzPLk%3D%0A
>  HTTP/1.1        200     2008    xxxxx xxxxx xxxxx POST      -       0.033    
>    -       -       Mozilla/5.0 (xxxxx) xxxxx/20100101 xxxxx/1.0.0       0.033 
>   2014-04-11
> 113.133.68.221  www.xxxx.com  10/Apr/2014:23:58:02 +0800      POST    
> /xxxx.jsp?appid=2&appkey=9ef269eec4f7a9d07c73952d06b5413f&format=json&sessionid=1397145345830&vc=53&vn=3.5.2&loguid=4377880&deviceid=xxxxx355369055653422&uuid=546e07f7-6439-48d4-8d6d-e5dc0001e569&channel=91_v352&method=Ad.getAd_iMocha&virtual=
>        HTTP/1.1        200     88      xxxxx xxxxx xxxxx POST      -  0.127   
>  -       -       Mozilla/5.0 (xxxxx) xxxxx/20100101 xxxxx/1.0.0       0.127   
> 2014-04-11
> this is my HQL:
> CREATE external TABLE access_log (
>   ip STRING,
>   host STRING,
>   time STRING,
>   method STRING,
>   request STRING,
>   protocol STRING,
>   status STRING,
>   size STRING,
>   referer STRING,
>   cookieuid STRING,
>   requestTime STRING,
>   session STRING,
>   httpXRequestedWith STRING,
>   agent STRING,
>   upstreamResponseTime STRING
> )
> partitioned by (logdate string)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
> WITH SERDEPROPERTIES  (
> "input.regex" = "^([^ ]*) - ([^ ]*) \\[([^\]]*)\\] \"([^ ]*) ([^ ]*) ([^ 
> ]*)\" (-|[0-9]*) (-|[0-9]*) \"(\.\+\?|-)\" ([^ ]*) ([^ ]*) ([^ ]*) 
> \"(\.\+\?|-)\" \"(\.\+\?|-)\" \"(\.\+\?|-)\"$",
> "output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s %10$s 
> %11$s %12$s %13$s %14$s %15$s"
> )
> STORED AS TEXTFILE;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to