Hi all,

  I'm trying to implement a clustering algorithm on hadoop. Among
  other things, there're a lot of matrix multiplications. LINA
  (http://wiki.apache.org/lucene-hadoop/Lina) is probably going to be
  a perfect fit here, but I can't afford to wait. Btw, I can't find
  HADOOP-1655 any more, what's going on?

  Using the ordinary matrix product (sum of row by column products
  gives one element from the resulting matrix), the easiest way to
  formulate this computation is to have one row and one column sent to
  a mapper and the output would be one element from the resulting
  matrix. Reducer can take this element and put it into the correct
  position in the output file.

  I need your advice on how to design input file(s) and how to make
  input splits then. I'd like to have matrices in separate files
  (they'll be used for more than one multiplication, and it's cleaner
  to have them separate).

  I guess then I'd have to use MultiFileSplit and MultiFileInputFormat
  somehow. Is it possible at all to send two records (one row and
  one column, or two rows if the other matrix is column-oriented
  ordered) from two input splits to a single mapper? Or should I look
  for an alternative way to multiply matrixes?


-- 
regards,
Milan

Reply via email to