Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/DataProcessingBenchmarks ------------------------------------------------------------------------------ * EM algorithm performance analysis * Lanczos algorithm performance analysis - === Group/Sort Processing Benchmarks === + === Group/Sort Processing === + + * Finds the most connected networks. + * After [https://issues.apache.org/jira/browse/HADOOP-2480 HADOOP-2480] done, Hbase will be join to benchmarks. SQL > select ipaddress, count(*) from access_log group by ipaddress order by count(*) desc limit 0,100; [[BR]]''Ï ,,count. ipaddress,, (Ï ,,count,, (γ ,,count(ipaddress). ipaddress,, (access_log)))'' - - * After [https://issues.apache.org/jira/browse/HADOOP-2480 HADOOP-2480] done, Hbase will be join to benchmarks. ||<bgcolor="#E5E5E5">||<bgcolor="#E5E5E5">!MySql 5.0.27 ||<bgcolor="#E5E5E5">Hadoop-0.15.0 || ||<bgcolor="#E5E5E5">Data ||B-tree disk table (MyISAM)||Text files (access_log)|| @@ -23, +24 @@ ||<bgcolor="#E5E5E5">Results ||100 ||100|| ||<bgcolor="#E5E5E5">Time ||3.715 sec ||112.03 sec|| - ==== Processing Flow ==== + ==== MapReduce Flow ==== * Map was used for extract the IP address of the client requesting the web page. * Reduce was used for summation. * 1 more Map/Reduce was used for sort by count. - ==== Processing Results ==== + ==== MapReduce Results ==== {{{ ------------------------------------ * Top 100 connector list : @@ -48, +49 @@ Processing time : 112.03 sec }}} + === EM Algorithm === + * Finds maximum likelihood estimates of parameters in probabilistic models. + * Alternates between expectation (E) step and maximization (M) step. + + ==== MapReduce Flow ==== +