Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/DataProcessingBenchmarks ------------------------------------------------------------------------------ SQL > select ipaddress, count(*) from access_log group by ipaddress order by count(*) desc limit 0,100; [[BR]]''Ï ,,count. ipaddress,, (Ï ,,count,, (γ ,,count(ipaddress). ipaddress,, (access_log)))'' - ||<bgcolor="#E5E5E5">||<bgcolor="#E5E5E5">!MySql 5.0.27 ||<bgcolor="#E5E5E5">Hadoop-0.15.0 (commodity)||<bgcolor="#E5E5E5">Hadoop-0.15.0 (commodity)||<bgcolor="#E5E5E5">Hadoop-0.15.0 (High-Performance Server)|| - ||<bgcolor="#E5E5E5">Data ||B-tree disk table (MyISAM)||Text files (access_log)||Text files (access_log)||Text files (access_log)|| - ||<bgcolor="#E5E5E5">Machine ||1 ||40||1000||2 (* 4 processor)|| - ||<bgcolor="#E5E5E5">Rows ||3,700,000 ||54,805,260||54,805,260||54,805,260|| - ||<bgcolor="#E5E5E5">Results ||100 ||100||100||100|| - ||<bgcolor="#E5E5E5">Time ||3.715 sec ||1317.19 sec||112.03 sec||1244.21 sec|| - ==== MapReduce Flow ==== * Map was used for extract the IP address of the client requesting the web page. * Reduce was used for summation. * 1 more Map/Reduce was used for sort by count. + ==== Benchmarks ==== + + ===== 1.5 GB access_log on 10 node cluster ===== + [http://wiki.apache.org/lucene-hadoop-data/attachments/DataProcessingBenchmarks/attachments/C__Users_udanax_Desktop_test-10.png] + + ||<bgcolor="#E5E5E5">||<bgcolor="#E5E5E5">!MySql 5.0.27 ||<bgcolor="#E5E5E5">Hadoop-0.15.2 ||<bgcolor="#E5E5E5">Hadoop-0.15.2 ||<bgcolor="#E5E5E5">Hadoop-0.15.2 ||<bgcolor="#E5E5E5">Hadoop-0.15.2 ||<bgcolor="#E5E5E5">Hadoop-0.15.2 || + ||<bgcolor="#E5E5E5">Data ||B-tree disk table (MyISAM)||Text files (access_log)||Text files (access_log)||Text files (access_log)||Text files (access_log)||Text files (access_log)|| + ||<bgcolor="#E5E5E5">Machine ||1 || 2|| 4|| 6|| 8|| 10|| + ||<bgcolor="#E5E5E5">Rows ||3,700,000 ||5,914,669||5,914,669||5,914,669||5,914,669||5,914,669|| + ||<bgcolor="#E5E5E5">Results ||100 ||100||100||100||100||100|| + ||<bgcolor="#E5E5E5">Time ||3.715 sec ||172.30 sec||108.01 sec||77.41 sec||66.30 sec||60.78 sec|| + - ==== MapReduce Results ==== - {{{ - ------------------------------------ - * Top 100 connector list : - +--------------+-------------------+ - | Count | Ip Address | - +--------------+-------------------+ - | 374932 | 121.165.51.179 | - | 357615 | 121.150.85.42 | - | 304878 | 211.204.83.50 | - | ... | ... | - | 72154 | 211.210.164.215 | - | 72083 | 122.44.149.231 | - | 71646 | 124.49.150.145 | - | 70915 | 211.48.70.247 | - +--------------+-------------------+ - Processing time : 112.03 sec - }}} === EM Algorithm === * Finds maximum likelihood estimates of parameters in probabilistic models.