[
https://issues.apache.org/jira/browse/HADOOP-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554655
]
stack commented on HADOOP-2480:
-------------------------------
Yeah, why not use the TableOutputFormat and run lots of reducers so lots of
clients going against hbase?
Creating a table instance per map task is expensive. Do it once in the init.
You have hard-coded paths in your code to udanax/temp. You might want to
remove that.
> [Hbase Shell] Log Analysis Examples
> -----------------------------------
>
> Key: HADOOP-2480
> URL: https://issues.apache.org/jira/browse/HADOOP-2480
> Project: Hadoop
> Issue Type: New Feature
> Components: contrib/hbase
> Affects Versions: 0.16.0
> Environment: All
> Reporter: Edward Yoon
> Assignee: Edward Yoon
> Priority: Trivial
> Fix For: 0.16.0
>
> Attachments: v01.patch, v02.patch
>
>
> I made an apache log fetcher, log analyzer, social network analyzer using
> map/reduce on hbase table for large scale .
> - 5 Terra Bytes Logs will be used. You can see at here :
> http://shell.hadoop.co.kr/PHPClient.php
> *Access_log Entry*
> ||Example Data Element||Description||
> |208.177.157.164|IP address of the client requesting the web page|
> |-|Identity of the client; typically blank for modern browsers, which hide
> this information|
> |-|User name with which the client was authenticated; typically always blank
> unless authentication is required to access the page|
> |[15/Aug/2004:10:59:38 -0800] |Time the request was made|
> |"GET http://www.hadoop.co.kr/ HTTP/1.1"|The HTTP request made by the client.
> Typically in the form of method (GET in this example), resource (the URL
> requested), and protocol (HTTP/1.1 in this example)|
> |200|Status code for the request. 200 means it was successfully handled|
> |-|Number of bytes transferred to the client in response to this request|
> |"-"|The URL of the referrer; that is, the URL of the page (or element within
> the page) from which the request URL was obtained|
> |"Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)" |User agent identifier of
> the client making the request|
> *Table schema*
> * url family is a historical page-move vector of client.
> * row by url is a user by document matrix.
> ** cell can be a numeric value of document visit frequency or a incoming
> value from specified web.
> * ... etc.
> {code}
> ip <row> http url
> -------------------------------------------------------------------
> ip http:agent <agent> url:URL <referrer>
> http:protocol <protocol> ...
> http:method <method>
> http:code <response code>
> http:bytesize <bytesize>
> {code}
> *Log models and Applications*
> * Next Page Recommendation
> * Page Network Analysis
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.