Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/DataProcessingBenchmarks

------------------------------------------------------------------------------
  SQL > select ipaddress, count(*) from access_log group by ipaddress order by 
count(*) desc limit 0,100;
  [[BR]]''σ ,,count. ipaddress,, (τ ,,count,, (γ ,,count(ipaddress). 
ipaddress,, (access_log)))''
  
- ||<bgcolor="#E5E5E5">||<bgcolor="#E5E5E5">!MySql 5.0.27 
||<bgcolor="#E5E5E5">Hadoop-0.15.0 
(commodity)||<bgcolor="#E5E5E5">Hadoop-0.15.0 
(commodity)||<bgcolor="#E5E5E5">Hadoop-0.15.0 (High-Performance Server)||
- ||<bgcolor="#E5E5E5">Data ||B-tree disk table (MyISAM)||Text files 
(access_log)||Text files (access_log)||Text files (access_log)||
- ||<bgcolor="#E5E5E5">Machine ||1 ||40||1000||2 (* 4 processor)||
- ||<bgcolor="#E5E5E5">Rows ||3,700,000 ||54,805,260||54,805,260||54,805,260||
- ||<bgcolor="#E5E5E5">Results ||100 ||100||100||100||
- ||<bgcolor="#E5E5E5">Time  ||3.715 sec ||1317.19 sec||112.03 sec||1244.21 
sec||
- 
  ==== MapReduce Flow ====
  
   * Map was used for extract the IP address of the client requesting the web 
page.
   * Reduce was used for summation.
   * 1 more Map/Reduce was used for sort by count.
  
+ ==== Benchmarks ====
+ 
+ ===== 1.5 GB access_log on 10 node cluster =====
+ 
[http://wiki.apache.org/lucene-hadoop-data/attachments/DataProcessingBenchmarks/attachments/C__Users_udanax_Desktop_test-10.png]
+ 
+ ||<bgcolor="#E5E5E5">||<bgcolor="#E5E5E5">!MySql 5.0.27 
||<bgcolor="#E5E5E5">Hadoop-0.15.2 ||<bgcolor="#E5E5E5">Hadoop-0.15.2 
||<bgcolor="#E5E5E5">Hadoop-0.15.2 ||<bgcolor="#E5E5E5">Hadoop-0.15.2 
||<bgcolor="#E5E5E5">Hadoop-0.15.2 ||
+ ||<bgcolor="#E5E5E5">Data ||B-tree disk table (MyISAM)||Text files 
(access_log)||Text files (access_log)||Text files (access_log)||Text files 
(access_log)||Text files (access_log)||
+ ||<bgcolor="#E5E5E5">Machine ||1 || 2|| 4|| 6|| 8|| 10||
+ ||<bgcolor="#E5E5E5">Rows ||3,700,000 
||5,914,669||5,914,669||5,914,669||5,914,669||5,914,669||
+ ||<bgcolor="#E5E5E5">Results ||100 ||100||100||100||100||100||
+ ||<bgcolor="#E5E5E5">Time  ||3.715 sec ||172.30 sec||108.01 sec||77.41 
sec||66.30 sec||60.78 sec||
+ 
- ==== MapReduce Results ====
- {{{
- ------------------------------------
- * Top 100 connector list :
- +--------------+-------------------+
- | Count        | Ip Address        |
- +--------------+-------------------+
- | 374932       | 121.165.51.179    |
- | 357615       | 121.150.85.42     |
- | 304878       | 211.204.83.50     |
- | ...          | ...               |
- | 72154        | 211.210.164.215   |
- | 72083        | 122.44.149.231    |
- | 71646        | 124.49.150.145    |
- | 70915        | 211.48.70.247     |
- +--------------+-------------------+
- Processing time : 112.03 sec
- }}}
  
  === EM Algorithm ===
   * Finds maximum likelihood estimates of parameters in probabilistic models.

Reply via email to